Guide
Standard deviation explained without the math notation
What it measures, why the formula has two versions, and how to read the result without a stats degree.
Standard deviationis one number that summarises how spread out a set of values is. A small standard deviation means the values are tightly clustered around the mean. A large one means they’re scattered. That’s the whole concept. Everything else — the formula, the n-1 vs n debate, the bell curve — is just machinery to make the spread comparable across different datasets and sample sizes.
What it actually measures
Imagine two classes that each have a mean test score of 75.
- Class A: scores are 73, 74, 75, 76, 77. Standard deviation: 1.6.
- Class B: scores are 55, 65, 75, 85, 95. Standard deviation: 15.8.
Same mean, very different distributions. Class A is uniform; Class B has a wide range. The standard deviation captures that difference in a single number.
The unit of standard deviation is the same unit as the data. Test scores in points → standard deviation in points. Heights in inches → standard deviation in inches. This makes the number directly interpretable.
How to compute it (in three steps)
- Find the mean. Sum the values, divide by the count.
- Compute each value’s squared distance from the mean. For Class A above, the mean is 75. The squared distances are: (73-75)² = 4, (74-75)² = 1, (75-75)² = 0, (76-75)² = 1, (77-75)² = 4. Sum: 10.
- Divide by n-1 (sample) or n (population), then take the square root. Class A: 10/(5-1) = 2.5; sqrt(2.5) = 1.58.
The squaring is what makes large deviations dominate small ones — a deviation of 4 contributes 16 to the sum; a deviation of 1 contributes 1. The square root at the end puts the result back into the original unit.
n vs n-1: why two formulas exist
If you have all the data (every value in the population), divide by n. If you have asampledrawn from a larger population and you want to estimate that population’s standard deviation, divide by n-1. The n-1 version is called the sample standard deviation; the n version is the population standard deviation.
Why does the sample version use n-1? Because the sample mean is itself a point of the sample, the sample underestimates the true population spread (the data points are, on average, closer to their own mean than to the true population mean). Dividing by n-1 inflates the estimate just enough to correct the bias on average. This is called Bessel’s correction.
Practically: if you’re computing standard deviation from a sample (which is what most real-world calculations are doing), use n-1. Excel’s STDEV.Sand Python’s statistics.stdevuse n-1 by default. Excel’s STDEV.P and numpy.std use n by default. Picking the wrong function silently changes results by a few percent on small samples.
How to read the number
Once you have the standard deviation, here’s the usable intuition:
- ~68% of values lie within ±1 standard deviation of the mean (for roughly normal distributions).
- ~95% lie within ±2 standard deviations.
- ~99.7% lie within ±3 standard deviations.
This “68-95-99.7 rule” (also called the empirical rule) holds for any approximately bell-shaped distribution. For Class A above: mean 75, SD 1.58. The interval [73.4, 76.6] should contain about 68% of values — and looking at the actual numbers, three of the five (60%) fall in that range. Close enough for a sample of five.
For non-normal distributions (heavily skewed data, bimodal data, outlier-laden data) the empirical rule doesn’t apply cleanly. In those cases, percentiles or the interquartile range describe spread better than standard deviation does.
When standard deviation is the wrong tool
Three cases:
- Outliers. One extreme value dramatically inflates the standard deviation. Income data is a classic example — a single billionaire in a sample of a thousand people pulls the standard deviation far higher than any intuitive notion of typical spread. Use the interquartile range or median absolute deviation instead.
- Skewed distributions.When most values are small and a few are very large (or vice versa), the mean and standard deviation together don’t describe the shape. Report percentiles or quartiles.
- Categorical data.Standard deviation requires a numeric scale where distance has meaning. You can’t compute a meaningful standard deviation of the values [“red”, “blue”, “green”].
Variance: standard deviation’s cousin
Varianceis the same calculation without the final square root. It’s in squared units (points², inches²), which is harder to interpret directly but easier to work with mathematically — variances can be added across independent sources, whereas standard deviations cannot. In practice, you compute variance and report standard deviation.
Quick worked example
Dataset: [4, 8, 6, 5, 3, 7]
- Mean: (4+8+6+5+3+7) / 6 = 5.5
- Squared deviations: (4-5.5)² = 2.25, (8-5.5)² = 6.25, (6-5.5)² = 0.25, (5-5.5)² = 0.25, (3-5.5)² = 6.25, (7-5.5)² = 2.25
- Sum of squared deviations: 17.5
- Sample variance (÷ n-1): 17.5 / 5 = 3.5
- Sample standard deviation: √3.5 = 1.87
Sanity check with our statistics calculator — it computes mean, median, both versions of standard deviation, and percentiles in one pass.
The pragmatic bottom line
Standard deviation answers “how spread out is this data?” in the data’s own units. For roughly-normal data, the 68/95/99.7 rule lets you translate the number into a quick mental picture. For skewed or outlier-heavy data, fall back to percentiles. And always check whether the tool you’re using is applying the n-1 (sample) or n (population) divisor — the difference is small but real.
Sources: NIST/SEMATECH e-Handbook of Statistical Methods, §1.3.5.6 (Standard Deviation). Bessel’s correction original publication: F. W. Bessel, Astronomische Nachrichten, 1819.
Related
Published May 16, 2026