Glossary
Latency
Time between request and response
Latency is the time between a request being sent and the response arriving. In networked systems it’s measured in milliseconds; in distributed systems sometimes in microseconds; in user-perceived latency, in the hundreds of milliseconds where humans start to notice.
Three measurements every engineer should know about a service’s latency:
- Mean (average) latency. Usually misleading. A single slow outlier drags it up.
- Median (p50) latency. The typical request’s experience. More honest than mean.
- Tail latencies (p95, p99, p99.9). The 95th, 99th, 99.9th percentile of response times. p99 means “1% of requests are slower than this.” For user-facing systems, p99 captures the experience of unlucky users.
Why tails matter: at scale, every user hits the tail eventually. A service with 100ms p50 and 5000ms p99 has fast typical performance and occasional 5-second freezes. A user making 100 requests in a session will likely hit the tail at least once.
Sources of latency in a typical HTTP request:
- DNS: 1-50 ms first lookup, ~0 cached.
- TCP handshake: 1 round-trip time (RTT).
- TLS handshake: 1-2 additional RTTs.
- Server processing: highly variable, from microseconds to seconds.
- Network propagation: ~5ms NY-to-Chicago, ~70ms NY-to-London, ~150ms NY-to-Sydney. Lower bound is light-speed.
For real-world API performance, the percentile distribution matters far more than the mean. Reporting only mean latency is one of the classic ways monitoring dashboards mislead.
Related
Published May 16, 2026