Skip to content

Glossary

Percent encoding

The %XX escape mechanism in URLs

Percent encoding (also called URL encoding) is the mechanism URLs use to represent characters that aren’t legal in the URL grammar, or that have reserved meanings. Defined in RFC 3986 §2.

The scheme: each byte to escape is written as % followed by two hex digits representing the byte value. Space becomes %20. Question mark becomes %3F. Forward slash becomes %2F. Pound sign becomes %23.

For non-ASCII characters (umlauts, accents, CJK, emoji), the character is first UTF-8-encoded to a byte sequence, then each byte is percent-encoded. The single character “é” (U+00E9) becomes the bytes 0xC3 0xA9 in UTF-8, percent-encoded as %C3%A9.

Three character classes worth knowing:

  • Unreserved — A-Z, a-z, 0-9, and -_.~. Never escaped.
  • Reserved — characters with syntactic meaning (:/?#[]@!$'()*+,;=). Escaped when they appear in data that shouldn’t be parsed as URL syntax.
  • Other — everything else (spaces, non-ASCII, control characters). Always escaped.

Encode or decode any string in our URL encoder, which handles UTF-8 correctly (some legacy implementations treat strings as ISO-8859-1 and produce different output for the same input).

Related

Published May 16, 2026