The (arithmetic) mean is the sum of all values in a dataset divided by the count of values. For the set [2, 4, 6, 8], the mean is (2+4+6+8)/4 = 5.

When is the mean not a good measure of centre?

The mean is pulled toward outliers. A dataset of salaries like [$30k, $35k, $40k, $2M] has a mean near $526k, which misrepresents the typical salary. The median (midpoint) is more informative for skewed distributions.

What is the difference between mean, median, and mode?

Mean is the average; median is the middle value when sorted; mode is the most frequent value. For symmetric, bell-shaped distributions they are nearly equal; for skewed data they diverge significantly.

What is the difference between population mean and sample mean?

The population mean (μ) is calculated over every member of the group of interest; the sample mean (x̄) is calculated over a subset. Sample mean is used as an estimate of the population mean when the full population is unavailable.

Glossary

Mean

The arithmetic average

By Buğra SözeriPublished May 16, 2026Updated May 31, 2026

Mean (specifically the arithmetic mean) is the sum of a set of values divided by their count. For the dataset [4, 8, 6, 5, 3, 7]: sum 33, count 6, mean 5.5. It’s the most common form of “average” in everyday speech and the default returned by AVG(), numpy.mean, statistics.mean, and every other library function called “mean.”

Important properties: it’s sensitive to outliers (one extreme value pulls it dramatically), it has the same units as the underlying data, and it’s a property of a set — not of any individual element. The mean of [1, 1, 1, 100] is 25.75; only one of the four values is even close.

Other “means” exist for specific contexts: the geometric mean (nth root of the product, used for compounding rates), the harmonic mean (reciprocal of the mean of reciprocals, used for averaging rates), the weighted mean (some values count more than others). When someone says “mean” without qualification, they mean the arithmetic mean.

Use the statistics calculator for any of these or for the median, mode, variance, and standard deviation in a single pass.

When the mean is the wrong summary: for income, wealth, response times, and any heavily right-skewed distribution, the mean sits well above the median and misrepresents the “typical” observation. The standard newsroom example is national income — average US household income is dragged upward by the top 1%, so the mean is a poor proxy for what most households actually earn. The median is the honest one-number summary for skewed data; the mean is honest for symmetric data. Reporting both, or reporting the full quartile picture, is usually the right move. The median, IQR, and a histogram together give a faithful read in almost every case.

The geometric vs arithmetic mean trap in finance: averaging annual returns by adding and dividing (arithmetic mean) overstates compound growth — a portfolio that returns +50% then −50% has an arithmetic mean of 0% but ends 25% poorer than it started. The geometric mean (multiplicative average) returns −13.4% per year, which is the figure that actually compounds to the observed outcome. Quoted “average annual return” in fund prospectuses is almost always the geometric (CAGR) mean for this reason; quoted “expected return” in academic finance is usually the arithmetic mean. They are not the same number and the difference matters for any horizon longer than a year. Related: harmonic mean, weighted average.

Worked example

You measure API response times in milliseconds across ten requests: [42, 48, 51, 39, 55, 47, 44, 50, 46, 980]. Sum = 1402, count = 10, mean = 140.2 ms. The median is 47.5 ms. The mean is “true” arithmetically but utterly misleading as a description of typical performance — nine of ten requests were under 56 ms, and a single 980 ms outlier (a slow database query) has tripled the mean. Quoting “average response time 140 ms” on a status page would correctly summarise the sum of work performed but misrepresent user experience. The right report is something like “p50 = 47 ms, p95 = 980 ms” — which preserves both the typical case and the tail. That two-number summary is why every modern observability product (Datadog, Honeycomb, Grafana) defaults to percentile views over means for latency.

When and why it matters

Choosing the right summary statistic prevents bad decisions. A/B test analyses on conversion rate use the mean correctly (proportions are bounded and roughly symmetric); engineering SLOs on latency should never use the mean (heavy-tailed distributions). Salary surveys, house prices, and customer-lifetime-value distributions are heavily right-skewed — the median is the honest centre. Test scores and physical measurements (height, blood pressure) are roughly symmetric — the mean is fine. The diagnostic question to ask: if I doubled my largest observation, would my summary number meaningfully change? If yes, you have a skewed distribution and the mean is misleading you. Reference: NIST/SEMATECH e-Handbook of Statistical Methods.

Frequently asked questions

What is the mean?: The (arithmetic) mean is the sum of all values in a dataset divided by the count of values. For the set [2, 4, 6, 8], the mean is (2+4+6+8)/4 = 5.
When is the mean not a good measure of centre?: The mean is pulled toward outliers. A dataset of salaries like [$30k, $35k, $40k, $2M] has a mean near $526k, which misrepresents the typical salary. The median (midpoint) is more informative for skewed distributions.
What is the difference between mean, median, and mode?: Mean is the average; median is the middle value when sorted; mode is the most frequent value. For symmetric, bell-shaped distributions they are nearly equal; for skewed data they diverge significantly.
What is the difference between population mean and sample mean?: The population mean (μ) is calculated over every member of the group of interest; the sample mean (x̄) is calculated over a subset. Sample mean is used as an estimate of the population mean when the full population is unavailable.

Published May 16, 2026 · Last reviewed May 31, 2026