Glossary

Latency

Time between request and response

Latency is the time between a request being sent and the response arriving. In networked systems it’s measured in milliseconds; in distributed systems sometimes in microseconds; in user-perceived latency, in the hundreds of milliseconds where humans start to notice.

Three measurements every engineer should know about a service’s latency:

Mean (average) latency. Usually misleading. A single slow outlier drags it up.
Median (p50) latency. The typical request’s experience. More honest than mean.
Tail latencies (p95, p99, p99.9). The 95th, 99th, 99.9th percentile of response times. p99 means “1% of requests are slower than this.” For user-facing systems, p99 captures the experience of unlucky users.

Why tails matter: at scale, every user hits the tail eventually. A service with 100ms p50 and 5000ms p99 has fast typical performance and occasional 5-second freezes. A user making 100 requests in a session will likely hit the tail at least once.

Sources of latency in a typical HTTP request:

DNS: 1-50 ms first lookup, ~0 cached.
TCP handshake: 1 round-trip time (RTT).
TLS handshake: 1-2 additional RTTs.
Server processing: highly variable, from microseconds to seconds.
Network propagation: ~5ms NY-to-Chicago, ~70ms NY-to-London, ~150ms NY-to-Sydney. Lower bound is light-speed.

For real-world API performance, the percentile distribution matters far more than the mean. Reporting only mean latency is one of the classic ways monitoring dashboards mislead.

Published May 16, 2026

Latency

Related