You've seen it in your browser's address bar a hundred times: %20 where a space should be, or %3A standing in for a colon. If you've ever wondered what those percent signs and two-digit hex codes actually mean — and why URLs can't just contain normal characters — you're asking exactly the right question. URL encoding, formally called percent encoding, is one of those invisible standards that makes the web function reliably. Understanding how it works demystifies a pattern you encounter daily and becomes genuinely useful the moment you start building web applications, working with APIs, or debugging broken links.

Key Takeaways
  • Percent encoding replaces any character that isn't allowed raw in a URL with a % sign followed by two hexadecimal digits representing that character's byte value.
  • Space encodes as %20 in URL paths and API query strings, but as + in HTML form submissions using application/x-www-form-urlencoded.
  • Unreserved characters (A–Z, a–z, 0–9, -, _, ., ~) never need encoding — they pass through URLs unchanged.
  • Unicode characters are first converted to UTF-8 bytes, then each byte is percent-encoded separately — so "é" becomes %C3%A9, not a single code.
  • Don't encode structural characters like ://, /, ?, &, and # when they're serving their structural role — only encode them when they appear inside data values.
Advertisement

Why URL Encoding Exists

URLs were defined by RFC 3986, which restricts valid URL characters to a limited subset of ASCII. The reason is fundamental: URLs travel across many different systems — web servers, proxies, routers, browsers — and not all of them handle every character the same way. Spaces are used as delimiters in HTTP headers. Non-ASCII characters like accented letters, Chinese characters, or emoji have no standard single-byte representation. Special symbols like &, =, and ? already carry structural meaning inside a URL's query string. If any of these characters appeared raw inside a URL, parsers would misinterpret them, truncate the address, or fail entirely.

The solution is percent encoding: any character that isn't part of the allowed set gets replaced with a safe placeholder — a percent sign followed by the character's numeric value in hexadecimal. This way the URL remains a plain ASCII string that every system can safely pass along, and only the final recipient needs to decode it back into the original data. It's a classic approach to escaping: represent unsafe characters using only the safe characters available.

How Percent Encoding Works

The mechanics are straightforward. Every character has an ASCII (or UTF-8) byte value. To percent-encode a character, take that byte value, express it as two hexadecimal digits, and prepend a %. That's the entire rule.

Consider the space character. Its ASCII decimal value is 32. In hexadecimal, 32 is 20. So a space becomes %20. The colon character has ASCII value 58, which is 3A in hex, so it becomes %3A. A forward slash is ASCII 47 = hex 2F, giving %2F. The percent sign itself is ASCII 37 = hex 25, so a literal percent in data must be written as %25 — otherwise the decoder would interpret what follows it as an encoding sequence.

The hex digits in percent-encoded sequences are case-insensitive by the RFC, so %2F and %2f are equivalent. In practice, uppercase is more common and is what most encoding functions produce.

Unreserved vs Reserved Characters

RFC 3986 divides URL characters into two categories that determine encoding behavior. Unreserved characters — the letters A through Z and a through z, the digits 0 through 9, and the four symbols hyphen (-), underscore (_), period (.), and tilde (~) — are always safe in any part of a URL and never need encoding. If you see them in a URL, they mean exactly what they look like.

Reserved characters are a different story. Characters like :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, and = have defined structural roles within URLs. They act as delimiters between a URL's components — the colon separates scheme from host, slashes separate path segments, the question mark introduces the query string, and so on. When you need to include one of these characters as a literal data value inside a URL (for example, an email address in a query parameter, or a URL-within-a-URL), you must percent-encode it so parsers don't mistake it for structure.

Form Encoding vs URL Encoding

There's a closely related but distinct encoding format called application/x-www-form-urlencoded, which HTML forms use when submitted with method="GET". It's almost identical to percent encoding, with one key difference: spaces are represented as + rather than %20. This is why a Google search for "hello world" produces a URL like ?q=hello+world rather than ?q=hello%20world.

In practice, both + and %20 for spaces are widely understood by web servers, but they are technically different formats. When building API requests programmatically — rather than submitting an HTML form — you should generally use %20 for spaces, as APIs typically expect strict RFC 3986 percent encoding rather than the form-encoding variant. Mixing the two up is a common source of bugs when manually constructing query strings.

Unicode and Multi-Byte Encoding

ASCII covers 128 characters, which handles English text fine but leaves out the vast majority of the world's writing systems. URL encoding handles this through an extra step: non-ASCII characters are first converted to their UTF-8 byte sequence, and then each individual byte in that sequence is percent-encoded separately.

Take the letter "é" (e with an acute accent, Unicode code point U+00E9). In UTF-8, this character is encoded as two bytes: 0xC3 and 0xA9. Percent-encoding each byte gives %C3%A9. So the word "café" becomes caf%C3%A9 in a URL. For characters further outside the Basic Multilingual Plane — emoji, for instance — the UTF-8 representation uses three or four bytes, resulting in three or four percent-encoded sequences. The thumbs-up emoji (U+1F44D) encodes as %F0%9F%91%8D in a URL.

This multi-step process — Unicode → UTF-8 bytes → percent encode each byte — is handled automatically by every encoding function in every major programming language, so you rarely need to perform the conversion manually. But understanding the underlying mechanics explains why non-ASCII characters produce such long encoded strings.

Real-World Examples

You encounter URL encoding constantly without necessarily noticing it. A Google search for "New York weather" produces a URL containing ?q=New+York+weather (form encoding with + for spaces). An API call like https://api.example.com/users?name=John%20Doe&city=New%20York uses %20 for spaces in a strict query string. When a file server serves an image named "my photo 2026.jpg", the browser requests it as my%20photo%202026.jpg. A URL containing a hash symbol as data — like a color code #FF5733 passed as a parameter — must encode the # as %23, otherwise the browser interprets everything after the literal # as a page fragment identifier rather than part of the query.

These aren't edge cases — they're the everyday reality of web URLs. Every space in a page title that ends up in a URL, every special character in a search query, every non-ASCII letter in an internationalized domain path: all of it goes through percent encoding before it travels across the network.

Encoding in JavaScript and Python

JavaScript provides two main encoding functions with an important distinction. encodeURIComponent() encodes everything except unreserved characters — it's the right choice for encoding individual parameter values. encodeURI() encodes everything except unreserved characters and the structural reserved characters (:, /, ?, #, etc.) — it's designed for encoding a complete URL while preserving its structure. So encodeURIComponent('hello world') returns 'hello%20world', and encodeURIComponent('path/to/file') returns 'path%2Fto%2Ffile' with slashes encoded. Use encodeURIComponent for values, encodeURI for complete URLs.

In Python, urllib.parse.quote('hello world') returns 'hello%20world'. By default it leaves slashes unencoded; pass safe='' to encode them too. urllib.parse.urlencode({'name': 'John Doe', 'city': 'New York'}) produces 'name=John+Doe&city=New+York' using the form-encoding format. For strict percent encoding of query values, use quote directly rather than urlencode. Decoding is the mirror: JavaScript's decodeURIComponent() and Python's urllib.parse.unquote() convert encoded strings back to their original form.

When NOT to Encode

The most common mistake with URL encoding is over-encoding — encoding characters that are serving their structural role. The :// in https:// should never be encoded. The / separating path segments, the ? introducing the query string, the & separating query parameters, the # introducing a fragment — all of these must remain as literal characters for the URL to parse correctly. Encode them and you break the URL's structure entirely.

The rule is clean and easy to remember: encode data values, not structural delimiters. When you're building a URL programmatically, construct the skeleton first — scheme, host, path separators, query delimiters — and then encode each individual data value before inserting it. Never encode the whole assembled URL as a unit or you'll encode the structural characters right along with everything else. Build the URL piece by piece, encoding each piece, and assemble the structure around it.

Character Decimal Hex URL Encoding Used In
Space 32 20 %20 (or +) Queries, filenames
# hash 35 23 %23 When # is data
% percent 37 25 %25 Literal % in data
& ampersand 38 26 %26 Query values containing &
/ slash 47 2F %2F Path values with /
: colon 58 3A %3A Timestamps as data
= equals 61 3D %3D When = is value
? question 63 3F %3F When ? is value
@ at 64 40 %40 Emails in URLs