Skip to content

Glossary

Percent encoding

The %XX escape mechanism in URLs

By Published Updated

Percent encoding (also called URL encoding) is the mechanism URLs use to represent characters that aren’t legal in the URL grammar, or that have reserved meanings. Defined in RFC 3986 §2.

The scheme: each byte to escape is written as % followed by two hex digits representing the byte value. Space becomes %20. Question mark becomes %3F. Forward slash becomes %2F. Pound sign becomes %23.

For non-ASCII characters (umlauts, accents, CJK, emoji), the character is first UTF-8-encoded to a byte sequence, then each byte is percent-encoded. The single character “é” (U+00E9) becomes the bytes 0xC3 0xA9 in UTF-8, percent-encoded as %C3%A9.

Three character classes worth knowing:

  • Unreserved — A-Z, a-z, 0-9, and -_.~. Never escaped.
  • Reserved — characters with syntactic meaning (:/?#[]@!$'()*+,;=). Escaped when they appear in data that shouldn’t be parsed as URL syntax.
  • Other — everything else (spaces, non-ASCII, control characters). Always escaped.

Encode or decode any string in our URL encoder, which handles UTF-8 correctly (some legacy implementations treat strings as ISO-8859-1 and produce different output for the same input).

The space-as-plus footnote: in the path and fragment of a URL, a space encodes as %20. But in the query string of an application/x-www-form-urlencoded body, spaces encode as + — a form-specific convention that predates the modern URL spec. JavaScript’s encodeURIComponent() always emits %20; the older escape() (deprecated) emitted +. For form submissions and most URL libraries, both representations decode correctly to a space, but mixing them in a manually-built URL breaks string-equality checks. The modern advice: use the standard URL APIs (WHATWG URLSearchParams in browsers, url.URL in Node) and let the implementation pick the right encoding for the context.

Double-encoding — the most common production bug: if a value passes through two encoders without an intervening decoder, the original % from the first pass becomes %25, and the user sees gibberish like %2520 instead of %20. The root cause is almost always one layer of the system assuming its input is plain text when it’s already URL-encoded. The fix is to draw a clear boundary: input is plain text up to the URL builder, percent-encoded only in URL form, decoded back to plain text the moment it leaves the URL context. Related: UTF-8, ASCII. Reference: RFC 3986 §2.1 — Percent-Encoding.

Worked example

Encode the search query café & thé into a URL query string. Step one — UTF-8 each character: c a f é <space> & <space> t h é becomes the bytes 63 61 66 C3 A9 20 26 20 74 68 C3 A9. Step two — apply percent-encoding rules: unreserved chars (c a f t h) stay; multibyte UTF-8 sequences and reserved chars (&) escape. Result: caf%C3%A9%20%26%20th%C3%A9. Full URL: https://example.com/search?q=caf%C3%A9%20%26%20th%C3%A9. On the receiving end, the server decodes by reversing both steps: replace each %XX with its byte value to get the UTF-8 bytes, then decode UTF-8 to recover café & thé. If the server treats the bytes as Latin-1 instead of UTF-8, “café” comes through as “café” — the classic mojibake.

When and why it matters

Every URL constructed by string concatenation is a potential injection or routing bug. A user-supplied filename like ../../../etc/passwd embedded raw into a URL becomes a path-traversal attempt; percent-encoded as ..%2F..%2F..%2Fetc%2Fpasswd it’s safe as a single segment but may decode at the wrong layer and reintroduce the traversal. Search queries with & or # in them silently truncate if not encoded. Modern HTTP frameworks (Express, FastAPI, ASP.NET) handle this automatically when you use their query-parameter builders; the bugs cluster in hand-built redirect URLs, log-correlation IDs, and signed-URL generators where developers concatenate strings. The defensive habit: never use + for string concatenation when building a URL — always go through URLSearchParams or your framework’s equivalent. Reference: WHATWG URL Standard — Percent-encoded bytes.

Try the calculator

Percent-encode a string for safe use in a URL, or reverse the encoding to read it back.

Open the URL encoder →

Frequently asked questions

What is percent encoding?
Percent encoding (URL encoding) is the scheme defined in RFC 3986 for representing reserved, unsafe, or non-ASCII characters in a URL as a percent sign followed by two hexadecimal digits -- for example, a space becomes %20.
When is percent encoding applied in practice?
Browsers automatically percent-encode characters such as spaces, &, =, and non-ASCII letters when building a URL. Form submissions encode the query string (%3D for =, %26 for &) so the server can parse key-value pairs unambiguously.
What is the difference between percent encoding and Base64 encoding?
Percent encoding escapes individual characters that are illegal or reserved in URLs while keeping the rest intact; it is compact for mostly-ASCII input. Base64 encodes arbitrary binary data into a safe 64-character alphabet but increases size by about 33%, making it unsuitable for URLs in most cases.

Related

Published May 16, 2026 · Last reviewed May 31, 2026