Skip to content

Glossary

Median

The middle of a sorted dataset

By Published Updated

Median is the middle value of a sorted dataset. For [3, 4, 5, 6, 7, 8] (six values), the median is 5.5 (the mean of the two middle values). For [3, 4, 5, 6, 7] (five values), the median is 5. Half the data is below, half above.

The median is famously robust: a single extreme outlier can’t pull it. The median of [1, 2, 3, 4, 5] is 3. The median of [1, 2, 3, 4, 1000] is still 3. This makes the median the right summary for skewed distributions — income, house prices, response times, file sizes — where one or two extreme values would dominate the arithmetic mean.

It’s the same as the 50th percentile. Computing it directly (sort, take the middle) is O(n log n); for very large datasets a quickselect algorithm finds the median in expected O(n).

Use the statistics calculator for the median, mean, mode, and quartiles in a single pass.

Why “median household income” is the figure economists actually quote: the US Census Bureau reports median household income because it is the honest one-number summary of a heavily right-skewed distribution. As of the 2023 ACS, US median household income was around $80,000 while the mean was around $112,000 — the gap is the influence of the top of the distribution on the arithmetic mean. The same gap shows up in house prices (NAR reports medians, not means), salary surveys, and response-time monitoring in software (every observability dashboard quotes p50 = median, not average latency). When a number is “average” without qualification on skewed data, ask which average — the difference between the two is often the story.

The median is a robust estimator of location — but with a cost: the median has a 50% breakdown point (you would need to corrupt more than half the data to move the median arbitrarily far), whereas the mean has a 0% breakdown point (a single infinity moves the mean to infinity). The trade-off is statistical efficiency: under a clean normal distribution the median’s confidence interval is about 25% wider than the mean’s for the same sample size. So for clean, symmetric data the mean is more informative; for messy real-world data the median is safer. Trimmed means (drop the top and bottom 5%) and the Hodges-Lehmann estimator sit on the spectrum between the two. Related: mean, percentile, IQR.

Worked example

Eleven home sale prices in a suburb (in thousands of dollars): [280, 295, 310, 320, 340, 355, 360, 380, 410, 450, 2400]. Sort the list (already sorted). With n = 11, the median is the 6th value: $355,000. The mean is sum/11 = 5900/11 ≈ $536,000. The single $2.4M waterfront mansion has pushed the mean ~$180,000 above any of the ten realistic comparables — a buyer using “average price” to gauge the neighbourhood would overestimate by 50%. The median ignores that outlier entirely. Add a 12th sale at $370,000: now n = 12, the median is the mean of the 6th and 7th values, (355 + 360)/2 = $357,500. The technique generalises: any single extreme observation in a large sample moves the median by at most one position in the sorted list.

When and why it matters

Median is the right metric whenever the distribution has a long tail and you want a number that represents “the typical case.” That covers most operational metrics that humans care about — household income, house prices, time-to-resolution on support tickets, page-load times, file sizes in a repo, words per sentence in a document corpus. When monitoring software latency, the standard practice is to dashboard p50 (median) and p99 together: the median tells you whether the typical user is happy; p99 tells you whether the tail is acceptable. Reporting just an average can mask a bimodal distribution (two clusters of users with very different experiences) that the median plus a few percentiles reveals immediately. Reference: US Census Bureau — Income in the United States: 2023.

Frequently asked questions

What is the median?
The median is the middle value of a dataset when sorted in order. For an odd count of values it is the centre value; for an even count it is the average of the two middle values.
When should I use median instead of mean?
Use median for skewed distributions or data with outliers — income, house prices, response times. A single extreme value cannot shift the median more than one rank, whereas it can move the mean dramatically.
What is the difference between median and percentile?
The median is the 50th percentile — the point at which half the data falls below and half above. Any percentile divides the data similarly: the 90th percentile is the value below which 90% of observations fall.
How do I find the median of an even-sized dataset?
Sort the values, take the two middle elements, and average them. For [3, 7, 10, 14], the two middle values are 7 and 10, so the median is (7+10)/2 = 8.5.

Related

Published May 16, 2026 · Last reviewed May 31, 2026