Glossary
Variance
Standard deviation squared
By Buğra SözeriPublished Updated
Variance is the mean of squared deviations from the mean. For dataset [4, 8, 6, 5, 3, 7] with mean 5.5: squared deviations are 2.25, 6.25, 0.25, 0.25, 6.25, 2.25; sum 17.5; sample variance (÷ n-1) = 3.5.
Variance is in squared units (kilograms², dollars², seconds²) which makes it awkward to interpret directly. Take the square root and you get the standard deviation, which is in the original units. The two carry the same information; variance is what you compute, standard deviation is what you report.
Why bother with variance at all? Because variances are additive across independent sources of variation. If X and Y are independent, Var(X + Y) = Var(X) + Var(Y) — a property that standard deviations don’t have. This is what makes variance the natural unit for analysis-of-variance (ANOVA), error propagation, and most theoretical statistics.
For the working-statistician’s intuition behind variance and standard deviation, see the standard deviation explained guide.
Population vs sample variance — the N vs N−1 choice: when computing variance over an entire population, divide the sum of squared deviations by N. When estimating population variance from a sample, divide by N−1 instead — known as Bessel’s correction. The correction compensates for the fact that the sample mean is closer to the data than the (unknown) population mean would be, which biases the raw sum of squared deviations downward. At small sample sizes the difference is material; at large N it’s negligible. R, pandas, and Excel’s VAR() default to N−1; NumPy’s np.var() defaults to N (overridable with ddof=1). Read the docs before quoting a variance — the silent factor-of-N/(N−1) discrepancy regularly causes “but the numbers don’t match” bug reports.
Numerical pitfalls in computing variance: the textbook formula Var = E[X²] − (E[X])² is mathematically correct but numerically unstable — for tightly-clustered data with a large mean (e.g., temperatures in Kelvin, financial prices), it computes the difference of two nearly-equal large numbers and loses precision catastrophically. Welford’s online algorithm (1962) and the more recent Chan-Golub-LeVeque parallel variant compute variance in a single pass without subtraction of large near-equal terms and are the modern standard. NumPy and pandas implement these under the hood; rolling-your-own with the textbook formula on production data is a known footgun. Related: sample standard deviation, mean, statistics calculator.
Worked example
Suppose you own a stock portfolio with two independent positions: stock A has annual variance of returns Var(A) = 0.04 (so SD = 20%), stock B has Var(B) = 0.09 (SD = 30%). Held individually, A is less risky. Combine them 50/50: Var(0.5·A + 0.5·B) = 0.25·Var(A) + 0.25·Var(B) = 0.01 + 0.0225 = 0.0325, so portfolio SD ≈ √0.0325 ≈ 18%. The diversified portfolio has lower variance than either component alone — Markowitz’s 1952 insight in one line of arithmetic. Note: this works only because the variances added (assumes independence). If A and B were perfectly correlated, Var(0.5A + 0.5B) = 0.25·Var(A) + 0.25·Var(B) + 2·0.25·Cov(A,B) = 0.0325 + 2·0.25·0.20·0.30 = 0.0625, SD = 25% — the average of the two SDs, no diversification benefit.
When and why it matters
Variance is the operational unit of risk in finance (portfolio theory), error in experimental physics (combining measurement uncertainties via root-sum-of-squares), quality control (Six Sigma targets variance reduction rather than mean reduction because mean shifts are easy to dial in, variance shifts require process redesign), and machine learning (the bias-variance tradeoff: models with high variance overfit, models with high bias underfit). When you read “process capability index” Cpk in manufacturing or “tracking error” in fund performance reports, you’re reading a variance-derived statistic. Reporting standard deviation gives intuition; reporting variance gives a quantity that adds across sources — both are needed for fluent statistical communication. Reference: NIST/SEMATECH e-Handbook — Measures of Scale.
Frequently asked questions
- What is variance in statistics?
- Variance is the mean of squared deviations from the mean: population variance = sum of (xi minus mu)^2 divided by N, or sample variance = sum of (xi minus x-bar)^2 divided by (n minus 1). It quantifies how spread out values are, in squared units of the original data.
- Why is variance expressed in squared units?
- Squaring the deviations makes them all positive (so negatives and positives do not cancel) and weights large deviations heavily. The downside is that variance is in squared units (e.g. kg squared), which is why standard deviation -- the square root of variance -- is more commonly reported in interpretable units.
- What is the difference between variance and standard deviation?
- Variance is the mean squared deviation; standard deviation is its square root, restoring the original units. Variance has the useful property of being additive across independent variables; standard deviation does not add linearly, making variance preferred in statistical derivations.
Related
Published May 16, 2026 · Last reviewed May 31, 2026