Skip to content

Glossary

Latency

Time between request and response

Latency is the time between a request being sent and the response arriving. In networked systems it’s measured in milliseconds; in distributed systems sometimes in microseconds; in user-perceived latency, in the hundreds of milliseconds where humans start to notice.

Three measurements every engineer should know about a service’s latency:

  • Mean (average) latency. Usually misleading. A single slow outlier drags it up.
  • Median (p50) latency. The typical request’s experience. More honest than mean.
  • Tail latencies (p95, p99, p99.9). The 95th, 99th, 99.9th percentile of response times. p99 means “1% of requests are slower than this.” For user-facing systems, p99 captures the experience of unlucky users.

Why tails matter: at scale, every user hits the tail eventually. A service with 100ms p50 and 5000ms p99 has fast typical performance and occasional 5-second freezes. A user making 100 requests in a session will likely hit the tail at least once.

Sources of latency in a typical HTTP request:

  • DNS: 1-50 ms first lookup, ~0 cached.
  • TCP handshake: 1 round-trip time (RTT).
  • TLS handshake: 1-2 additional RTTs.
  • Server processing: highly variable, from microseconds to seconds.
  • Network propagation: ~5ms NY-to-Chicago, ~70ms NY-to-London, ~150ms NY-to-Sydney. Lower bound is light-speed.

For real-world API performance, the percentile distribution matters far more than the mean. Reporting only mean latency is one of the classic ways monitoring dashboards mislead.

Related

Published May 16, 2026