HTML URL Encoding: A Practical, Byte-Level Guide

You notice it the first time a link \"works on my machine\" but breaks for a customer: a space turns into a truncated query, an ampersand spawns a phantom parameter, or an emoji becomes a replacement character on the server. I’ve shipped enough web products to treat URL encoding as part of correctness, not polish. If you pass user input through links, forms, redirects, API calls, or SPA routing, you’re doing URL encoding whether you meant to or not—and small mistakes tend to surface as data loss, broken navigation, or security bugs.\n\nI’m going to show you how URL encoding really works (byte-level, not hand-wavy), where the rules change depending on which URL component you’re touching, and how to do it safely in HTML and JavaScript using modern primitives. I’ll also cover the form-specific \“+ vs %20\” behavior, common double-encoding traps, and a few server-side decoding patterns I rely on in production.\n\n## What URL encoding really is (and what it isn’t)\nA URL is an address that tells a browser (or any client) how to locate a resource. The key point: URLs have a defined grammar, and only a subset of characters can appear literally in each position. URL encoding—more precisely percent-encoding—is how we represent bytes that can’t appear directly.\n\nPercent-encoding replaces a byte with a percent sign followed by two hexadecimal digits:\n\n- (space) is often encoded as %20.\n- $ is encoded as %24.\n\nBut here’s the thing I see confused all the time: URL encoding is not HTML escaping.\n\n- URL encoding is about making data safe inside a URL component.\n- HTML escaping is about making text safe inside HTML markup.\n\nExample: if you render a query string in an HTML attribute, you may need both:\n\n- URL encoding for the query value\n- HTML escaping for the attribute context (notably & becomes & in HTML)\n\nIf you mix those up, you’ll either break navigation or create injection risk.\n\n## URL syntax and why components matter\nA typical URL looks like this:\n\nscheme://prefix.domain:port/path/filename?query#fragment\n\nExample:\n\nhttps://www.example.com:443/products/coffee-mug?color=navy&size=12oz#reviews\n\nThe URL’s structure is not just trivia. Different components allow different characters, and encoding rules depend on the component.\n\n- Scheme: usually fixed (https, http). You rarely encode here.\n- Host (domain): special rules (IDNs, punycode), not typical percent-encoding.\n- Path: / is a delimiter. Encoding a / changes meaning.\n- Query: & and = are delimiters. Raw & inside a value will split parameters.\n- Fragment: client-side only (not sent in HTTP requests), but still parsed by the browser.\n\nIf you encode the wrong thing at the wrong layer—like percent-encoding an entire URL string as if it were a query value—you get broken links that are hard to debug.\n\n## Reserved vs unreserved characters (practical view)\nIn day-to-day web work, I think about characters in three buckets:\n\n1) Unreserved characters: safe to appear literally in most URL components.\n\n- Letters: A-Z, a-z\n- Digits: 0-9\n- Marks: -, ., , ~\n\n2) Reserved characters: meaningful delimiters depending on where they appear.\n\n- General delimiters: :, /, ?, #, [, ], @\n- Sub-delimiters: !, $, &, \u0027, (, ), , +, ,, ;, =\n\n3) Everything else: spaces, quotes, control characters, and most non-ASCII bytes must be encoded.\n\nHere’s a quick table of commonly encountered reserved characters and their percent-encoded forms (uppercase hex is conventional):\n\n

Character
Encoded
\n
—
—
\n
!
%21

\n

%2A
\n
\u0027
%27
\n
(
%28
\n

)
%29
\n
;
%3B
\n
:
%3A

\n
@
%40
\n
&
%26
\n
=
%3D
\n
+
%2B
\n
$
%24
\n
,
%2C
\n
/
%2F
\n
?
%3F
\n
#
%23
\n
[
%5B
\n
]
%5D
\n\nAnd a few \“this will bite you in HTML and logs\” characters:\n\n
Character
Encoded
\n
—
—
\n
space
%20
\n
\"
%22
\n
<
%3C
\n
>
%3E
\n
%
%25
\n
{
%7B
\n
}
%7D
\n

%7C
\n
\\
%5C
\n\nOne nuance: you’ll sometimes see older guidance that treats ~ as \“unsafe.\” Modern URL parsing generally accepts it unescaped, but encoding it won’t break correctness. I only keep it literal for readability.\n\n## Percent-encoding is byte encoding (UTF-8 in practice)\nPercent-encoding happens at the byte level. That’s why you’ll see multiple %.. sequences for a single visible character outside basic ASCII.\n\nExample: the string café in UTF-8 is bytes:\n\n- c = 0x63\n- a = 0x61\n- f = 0x66\n- é = 0xC3 0xA9\n\nSo percent-encoded, it becomes:\n\n- caf%C3%A9\n\nSame idea for emoji: one glyph may become 4 bytes (or more), which becomes four %.. sequences.\n\nThis is why \“encode each character’s ASCII code\” is an incomplete mental model. For modern web apps, assume:\n\n- Your strings are Unicode.\n- Your URLs are interpreted as UTF-8 for percent-encoded sequences.\n\nIf you ever see mojibake (garbled characters) in query strings, it’s often because something decoded bytes with the wrong charset, or because data was encoded twice.\n\n## HTML context: links, attributes, and form submissions\nMost URL encoding mistakes I debug happen at the HTML boundary.\n\n### 1) Links ()\nIf you’re building a link with query parameters, you need to handle two parsers:\n\n- The URL parser\n- The HTML parser\n\nIn HTML, & is special because it starts an entity. In attributes, you should escape it as &.\n\nHere’s a concrete example. Suppose the user searches for rock & roll.\n\n- Query value (URL encoding): rock%20%26%20roll\n- Query string in HTML attribute (HTML escaping): q=rock%20%26%20roll is fine, but the separator & between params must be &.\n\nExample HTML:\n\n Search\n\nIf you accidentally output & literally and your templating engine doesn’t escape correctly, browsers may still \“work\” in many cases, but you’re relying on forgiving parsing. I don’t.\n\n### 2) Forms (application/x-www-form-urlencoded)\nHTML forms have their own encoding mode.\n\nWhen a form uses the default encoding (application/x-www-form-urlencoded):\n\n- Spaces are encoded as + (not %20).\n- Many characters are percent-encoded.\n\nThat means you’ll see:\n\n- search=coffee+beans\n\nThis \“+ means space\” rule is specific to form encoding and query strings processed as form data. In other URL contexts, + is just a plus sign unless you interpret it as form encoding.\n\nPractical guidance I follow:\n\n- When constructing URLs programmatically, use %20 semantics via standard URL tools.\n- When parsing incoming form-encoded payloads or query strings, rely on your framework’s parser (it will map + to space).\n\nIf you need file uploads or binary content, use:\n\n- multipart/form-data\n\nIn that case, URL encoding rules are no longer the main event; boundary encoding and MIME parts are.\n\n## JavaScript in 2026: the right primitives for the job\nIn modern JavaScript runtimes (browsers, Node.js, Deno, Bun), you should default to URL and URLSearchParams rather than manual string concatenation.\n\n### encodeURI vs encodeURIComponent\nI still see these misused.\n\n- encodeURI(...) is intended for an entire URL and does not encode reserved delimiters like ?, &, =, /, #.\n- encodeURIComponent(...) is intended for a single component value and encodes reserved characters.\n\nIf you run encodeURI(‘https://example.com/?q=rock & roll‘), the & may remain, which is a bug if & is part of the value.\n\n### My default: URL + URLSearchParams\nThis is readable, correct, and handles encoding for you.\n\n // Run in a browser console or Node.js\n const base = ‘https://shop.example.com/products‘;\n const url = new URL(base);\n\n url.searchParams.set(‘category‘, ‘coffee & tea‘);\n url.searchParams.set(‘size‘, ‘12oz‘);\n url.searchParams.set(‘notes‘, ‘gift for Zoë‘);\n\n console.log(url.toString());\n // https://shop.example.com/products?category=coffee+%26+tea&size=12oz&notes=gift+for+Zo%C3%AB\n\nNotice the output uses + for spaces in the query string representation. That’s normal for URLSearchParams serialization, which uses form-style rules for the query string.\n\nIf you need %20 specifically (rare), you can post-process, but I only do that for compatibility with a known broken downstream system.\n\n### Encoding path segments safely\nPath encoding is trickier because / has meaning. If you have user-generated path segments (slugs, IDs, filenames), encode each segment, not the whole path.\n\n const productName = ‘Acrylic Mug / Navy‘;\n const safeSegment = encodeURIComponent(productName);\n\n const url = new URL(‘https://shop.example.com/‘);\n url.pathname = /products/${safeSegment};\n\n console.log(url.toString());\n // https://shop.example.com/products/Acrylic%20Mug%20%2F%20Navy\n\nIf you had used encodeURI on the whole pathname, the / inside the name might have remained and turned into an unintended extra path level.\n\n### Traditional vs modern construction\n
Task
Traditional approach
Modern approach I use
\n
—
—
—
\n
Query params
String concat + manual encoding
URLSearchParams
\n
Full URL join
Manual slash handling
new URL(relative, base)
\n
Path segments
Hand-rolled replace rules
encodeURIComponent per segment

\n\n## Server-side decoding and validation (patterns I trust)\nEncoding is only half the story. On the server you decode, validate, and then treat the decoded value as data (not as a URL fragment again).\n\n### Node.js: parse with URL\n // Minimal Node.js HTTP example\n import http from ‘node:http‘;\n\n http\n .createServer((req, res) => {\n const url = new URL(req.url, ‘http://localhost‘);\n const q = url.searchParams.get(‘q‘) ?? ‘‘;\n\n // Basic validation example\n if (q.length > 200) {\n res.writeHead(413, { ‘content-type‘: ‘text/plain; charset=utf-8‘ });\n res.end(‘Query too long‘);\n return;\n }\n\n res.writeHead(200, { ‘content-type‘: ‘application/json; charset=utf-8‘ });\n res.end(JSON.stringify({ q }));\n })\n .listen(3000, () => {\n console.log(‘Listening on http://localhost:3000‘);\n });\n\nIf you’re using a framework (Express, Fastify, Hono, etc.), it will parse query strings for you. My rule is: decode exactly once, at the boundary.\n\n### Python: urllib.parse\n # Run: python3 server.py\n from http.server import BaseHTTPRequestHandler, HTTPServer\n from urllib.parse import urlparse, parseqs\n import json\n\n class Handler(BaseHTTPRequestHandler):\n def doGET(self):\n parsed = urlparse(self.path)\n params = parseqs(parsed.query, keepblankvalues=True)\n q = params.get(‘q‘, [‘‘])[0]\n\n if len(q) > 200:\n self.sendresponse(413)\n self.sendheader(‘Content-Type‘, ‘text/plain; charset=utf-8‘)\n self.endheaders()\n self.wfile.write(b‘Query too long‘)\n return\n\n body = json.dumps({‘q‘: q}).encode(‘utf-8‘)\n self.sendresponse(200)\n self.sendheader(‘Content-Type‘, ‘application/json; charset=utf-8‘)\n self.endheaders()\n self.wfile.write(body)\n\n HTTPServer((‘127.0.0.1‘, 8000), Handler).serveforever()\n\n### Go: net/url\n // Run: go run main.go\n package main\n\n import (\n "encoding/json"\n "net/http"\n )\n\n type Response struct {\n Q string json:"q"\n }\n\n func handler(w http.ResponseWriter, r http.Request) {\n q := r.URL.Query().Get("q")\n if len(q) > 200 {\n w.WriteHeader(http.StatusRequestEntityTooLarge)\n w.Write([]byte("Query too long"))\n return\n }\n\n w.Header().Set("Content-Type", "application/json; charset=utf-8")\n json.NewEncoder(w).Encode(Response{Q: q})\n }\n\n func main() {\n http.HandleFunc("/", handler)\n http.ListenAndServe("127.0.0.1:8080", nil)\n }\n\n### Validation: what I actually check\nDecoding gives you a string. It doesn’t give you safety. On endpoints that accept user-controlled text, I usually enforce:\n\n- Length caps (stops abuse and accidental mega URLs)\n- Character allow-lists for IDs and slugs\n- Normalization rules (trim, collapse whitespace) when it makes sense\n- Contextual rules (an email field is not a URL field)\n\nIf you accept a \“redirect\” parameter, treat it as high-risk and validate it as a relative path or a known allow-list of origins.\n\n## Mistakes I still see (and how I avoid them)\n### 1) Double-encoding\nSymptom: % becomes %25 and your server receives %2520 instead of %20.\n\nCause: encoding a value that is already encoded.\n\nFix: define a single boundary where encoding happens. For example:\n\n- UI stores raw text\n- URL builder encodes once\n- Server decodes once\n\nIf you must handle both encoded and raw input (migration scenarios), do it explicitly and log which branch was used.\n\n### 2) Encoding the entire URL as a component\nSymptom: https://example.com/path?x=1 turns into https%3A%2F%2Fexample.com%2Fpath%3Fx%3D1 and becomes unusable as a navigable link.\n\nCorrect use: only encode a full URL when you’re placing it as a parameter value inside another URL:\n\n- /outbound?target=\n\n### 3) Mixing HTML escaping and URL encoding\nSymptom: links look right in source but break, or your HTML becomes invalid.\n\nRule I follow:\n\n- Build the URL using URL primitives (or component encoding).\n- Render it into HTML with normal HTML escaping (so & becomes &).\n\nIf you’re using a template engine or a component framework, it usually escapes attributes for you. Still, you should know what it does so you don’t \“escape twice.\”\n\n### 4) Treating + as space everywhere\n+ means space in application/x-www-form-urlencoded decoding. Outside that context, + is a literal plus.\n\nIf you manually decode query strings, you must decide which semantics you’re applying. My advice: don’t manually decode query strings unless you have no alternative.\n\n### 5) Decoding too early (security and routing bugs)\nDecoding before routing can create path confusion.\n\nExample: a request path contains %2F (encoded /). If you decode it early, it can turn into an extra path segment, which can:\n\n- bypass route checks\n- interact badly with static file serving\n- enable traversal-style bugs in poorly designed handlers\n\nI let the framework/router handle path parsing, and I only decode what I need after the route is selected.\n\n## Performance, observability, and debugging\nURL encoding itself is rarely your bottleneck. In typical apps, encoding and decoding costs are usually in the microseconds to low hundreds of microseconds per operation, and still well under 1 ms per request unless you’re doing it in hot loops on very large strings.\n\nWhere performance and reliability do show up:\n\n- Avoid repeated encode/decode passes in render loops.\n- Prefer built-ins (URL, URLSearchParams, standard library parsers) over custom regex rules.\n- Log safely: raw user input in URLs can include control characters once decoded. I keep logs structured (JSON) and log both a decoded value (for humans) and an encoded form (for exact reproduction).\n\nA pattern I like in production is to log three things when a URL bug is reported:\n\n- the raw request target (req.url / r.URL.String() / similar)\n- the parsed view (searchParams, path segments)\n- a \“rebuild\” of the URL from parsed parts (to catch double-encoding or normalization surprises)\n\nIt sounds obvious, but this catches a ton of issues quickly: if the \“raw\” and the \“rebuilt\” diverge, you’ve found your seam.\n\n## Component-by-component encoding playbook\nWhen people say \“URL encode it\” they usually mean one of four different things. I keep a mental matrix of component → what must be encoded → which tool to use.\n\n### The practical matrix\n

Component
Delimiters that have meaning there
What I encode
What I usually do
\n
—
—
—
—
\n
Path segment
/ separates segments
user-controlled segment text
encodeURIComponent(segment) per segment
\n
Whole path
/ and maybe ; params (rare)
generally nothing wholesale
build from segments; avoid global encode
\n
Query key/value
& separates pairs, = separates key/value
each key and each value
URLSearchParams or encodeURIComponent
\n
Fragment
# starts fragment; inside fragment rules depend on your app
fragment payload (often app-defined)
treat as its own mini-URL; encode per subcomponent

\n\n### A concrete example: building a product URL\nSay you have: a human-readable product name, and optional filter UI state.\n\n- Path segment: product name (human-readable)\n- Query: filters (structured data)\n\nHere’s the strategy I use:\n\n1) Create a stable identifier for the product (ID) and keep that as the primary lookup.\n2) Use the product name as decorative slug if you want SEO/readability, but encode it correctly.\n3) Store complex filters as query params (or a compact encoded blob) rather than cramming them into the path.\n\n const productId = ‘p10492‘;\n const productName = ‘Mug: “Café” / Navy‘;\n\n const url = new URL(‘https://shop.example.com/‘);\n url.pathname = /products/${encodeURIComponent(productId)}/${encodeURIComponent(productName)};\n\n url.searchParams.set(‘sort‘, ‘price:asc‘);\n url.searchParams.set(‘q‘, ‘rock & roll‘);\n\n console.log(url.toString());\n\nI encode the ID too, not because it’s necessary (it usually is alphanumeric), but because it makes the rule consistent: path segments get encoded.\n\n## The \"+\" vs \"%20\" nuance (and why it keeps surprising people)\nHere’s the simplest way I’ve found to remember this without memorizing specs:\n\n- %20 is the generic percent-encoding for a space byte.\n- + is a special convention used by form encoding (application/x-www-form-urlencoded).\n\nSo you can see all of these in the wild:\n\n- /search?q=coffee%20beans\n- /search?q=coffee+beans\n\nBoth are commonly accepted by servers that parse query strings as form data. But not every system interprets + as space, especially if it’s treating the query as a raw string rather than form fields.\n\nMy rule of thumb:\n\n- If I’m using URLSearchParams, I accept that it serializes spaces as + and I let it.\n- If I’m manually encoding a query value (rare), I use encodeURIComponent, which yields %20.\n- If I’m working with a strict downstream system and it treats + literally, I avoid URLSearchParams.toString() as the final step and use a custom serializer (but I treat that as a compatibility shim, not a default).\n\n## A URL encoding decision tree I actually use\nIf you want one \“do this when you’re in a hurry\” flow, this is mine:\n\n1) Am I creating a URL from parts?\n – Yes → use new URL(base) + url.pathname + url.searchParams.\n\n2) Am I encoding a single value that will live inside a URL component?\n – Query value, path segment, fragment sub-value → encodeURIComponent(value).\n\n3) Am I encoding an entire URL to send as data inside another URL?\n – Yes → encodeURIComponent(fullUrlString).\n\n4) Am I putting that URL into HTML?\n – Yes → let the template engine escape HTML (attribute escaping). Ensure & becomes &.\n\n5) Am I decoding data?\n – Only at the boundary, once. If I’m manually decoding, I wrap decodeURIComponent in try/catch because malformed input throws.\n\n## Real-world HTML scenarios (with the tiny details that matter)\n### 1) Rendering links with multiple parameters\nIf I’m generating HTML server-side (or even building an HTML string in a tool), I keep the order: build URL → render attribute.\n\n- Build: ensure query values are encoded\n- Render: ensure & separators are HTML-escaped as &\n\nExample output I want to see:\n\n Next page\n\nIf I see this instead:\n\n Next page\n\nI treat it as a smell. Some browsers will still navigate. Some HTML parsers will not. Some tools will rewrite it. Some crawlers will misinterpret it. I prefer boring correctness.\n\n### 2) mailto: and tel: links (still URLs, still need encoding)\nPeople forget these are URLs too. The encoding rules differ a bit because the \“payload\” has its own mini-syntax.\n\nFor mailto:, you’ll often include subject and body as query parameters. Those values must be percent-encoded. Example conceptual output:\n\n- mailto:[email protected]?subject=Order%20Issue&body=Hi%2C%20I%20need%20help...\n\nIn HTML, if you include multiple parameters, the & still must become &.\n\nFor tel:, keep it simple: avoid spaces and punctuation where possible; if you include them, test across mobile browsers and apps. (Many will be forgiving, but you’re at the mercy of OS dialer parsing.)\n\n### 3) Data attributes and deferred navigation\nSometimes I don’t want to put the full target in href (e.g., because I’m attaching analytics, or because navigation is handled by a router). In that case I’ll store raw values in data- attributes and build the URL in JavaScript using URL.\n\nThe key is not to store half-encoded values. Either store raw (preferred) or store fully encoded for a specific purpose, but don’t mix.\n\nExample pattern (conceptually):\n\n- data-q="rock & roll" (raw)\n- JS: url.searchParams.set(‘q‘, el.dataset.q)\n\n## SPA routing, fragments, and \"URL-looking\" strings\nSPAs add a twist: you might be encoding for more than one parser.\n\n### Hash routers: /#/route?x=y\nWith hash-based routing, everything after # is not sent to the server. That doesn’t mean you can ignore encoding—it just means the browser and your router are the ones parsing it.\n\nTwo common footguns I see:\n\n1) Treating the fragment like it’s free-form text\n – It’s not. ?, &, and = might have meaning to your router. Encode sub-values.\n\n2) Double-parsing the same substring\n – For example, you parse the fragment as a URL, then parse a nested query inside it again, and accidentally decode twice.\n\nA strategy that keeps me sane: define exactly one grammar for your fragment. Either:\n\n- Fragment is a path only: #/settings/profile\n- Or fragment is \“path + query\”: #/search?q=coffee%20beans\n\nThen use one consistent parser in one place.\n\n### Nested URLs (a URL inside a URL)\nThis comes up constantly with return-to flows and outbound tracking. Example: \“after login, return to this page\”.\n\nCorrect idea:\n\n- /login?returnTo=\n\nIf you forget to encode the nested URL, the outer query breaks at the first & inside the inner URL, and you get phantom parameters (or truncated values).\n\nOn the decoding side, I decode once, validate, and then treat it as a URL/path (not as raw text). More on validation in the redirect section below.\n\n## Redirect parameters: correctness + security in one place\nRedirects are where encoding bugs and security bugs meet. If you accept a parameter like next, returnTo, redirect, or continue, you are handling attacker-controlled navigation.\n\n### My safe default: allow only relative redirects\nIf the user supplies /account/settings, great. If they supply https://evil.example/, I reject it or fall back to a safe page.\n\nIn Node.js, a pattern I use looks like this:\n\n function safeRedirectTarget(input) {\n if (typeof input !== ‘string‘) return ‘/‘;\n if (!input.startsWith(‘/‘)) return ‘/‘;\n // Prevent protocol-relative URLs like //evil.example\n if (input.startsWith(‘//‘)) return ‘/‘;\n // Optional: constrain to a route prefix\n return input;\n }\n\nIf I truly need to allow absolute URLs (rare), I do an explicit allow-list check by origin:\n\n function safeAbsoluteRedirect(input, allowedOrigins) {\n try {\n const u = new URL(input);\n if (!allowedOrigins.includes(u.origin)) return null;\n return u.toString();\n } catch {\n return null;\n }\n }\n\nThe key: I validate after decoding (so the check sees the true characters) and before redirecting.\n\n### Don’t forget encoding when generating redirect links\nWhen I generate the login link itself, I encode the nested return URL:\n\n const returnTo = ‘/orders?filter=late&page=2‘;\n const login = new URL(‘https://app.example.com/login‘);\n login.searchParams.set(‘returnTo‘, returnTo);\n\nThen when I render into HTML, I let HTML escaping handle & properly.\n\n## International text, IDNs, and Unicode normalization\nURL encoding meets internationalization in two main places:\n\n1) Non-ASCII in path/query (percent-encoded as UTF-8 bytes)\n2) Non-ASCII in hostnames (IDNs, which are typically represented as punycode internally)\n\n### Query and path: UTF-8 is the practical baseline\nIf I set a query param to Zoë or café, I expect %C3%AB and %C3%A9 sequences to appear when serialized. That’s normal.\n\nOne subtle issue: Unicode normalization. Two strings can look identical but be different byte sequences (for example, composed vs decomposed accents). If you use user input as identifiers, you can get mismatches.\n\nWhat I do in practice:\n\n- For human-facing text: I preserve what the user typed.\n- For keys/identifiers: I normalize (often NFC) and apply an allow-list (or use opaque IDs).\n\n### Hostnames: don’t percent-encode the domain\nIf you’re dealing with international domain names, the right handling is not \“percent-encode the host\”. The host has its own transformation rules. Modern URL libraries typically handle this when you construct a URL object.\n\nIf you’re building URLs as strings by hand, you’re more likely to get this wrong. Yet another reason I prefer URL.\n\n## When NOT to URL-encode (yes, really)\nI’m pro-encoding, but I’m also pro-clarity. There are cases where encoding is the wrong move.\n\n### 1) Don’t encode structural delimiters you need\nIf you encode / in the path, you can break routing semantics. If you encode ? or & in a URL you meant to be navigable, you can turn a query string into literal text.\n\nThis is why \“encode the whole URL\” is almost never what you want. The only common case where I encode a full URL string is when I’m treating it as a value inside another URL.\n\n### 2) Don’t decode and re-encode repeatedly\nEach pass is a chance to:\n\n- throw on malformed sequences\n- normalize or change representation\n- double-encode %\n\nI try to keep values in their raw form in memory and in storage, and only encode at the moment I embed them into a URL.\n\n### 3) Don’t use URLs as a general-purpose data store\nYes, you can store complex state in query strings. But beyond a point, you pay in:\n\n- readability\n- caching complexity\n- length limits\n- accidental data leaks (URLs end up in logs, referrers, screenshots)\n\nIf the state is sensitive or large, I move it server-side and keep the URL as a short opaque key.\n\n## Encoding edge cases that bite in production\nThese are the ones I see after launches, not in tutorials.\n\n### 1) A literal percent sign (%)\nIf your value includes % (like \“50% off\”), it must be encoded as %25 in the URL component. If it isn’t, a decoder may treat it as the start of an escape sequence and throw or corrupt the value.\n\n### 2) Malformed percent-escapes\ndecodeURIComponent(‘%E0%A4‘) throws. Attackers and broken clients will send malformed sequences.\n\nIf you manually decode user input, wrap decoding in try/catch and decide your failure mode:\n\n- reject with 400\n- replace with a sentinel value\n- log and continue with raw (rare; usually unsafe/confusing)\n\n### 3) # vanishing in requests\nFragments are not sent to the server. If a customer says \“the server didn’t receive my parameter\” and you see it after #, that’s why.\n\nI’ve debugged this exact issue with marketing links: someone puts tracking info in the fragment and expects backend analytics to see it. It won’t, unless you explicitly capture it client-side and forward it.\n\n### 4) Invisible characters\nUsers can paste tabs, newlines, and non-breaking spaces. In a URL, these can be normalized, dropped, or encoded in surprising ways depending on the tool.\n\nMy mitigation is simple:\n\n- normalize/trim user input when it’s meant to be a search query\n- reject control characters in identifiers\n- keep logs structured so odd whitespace doesn’t break the log format\n\n## Security considerations (URL encoding is not a security boundary)\nEncoding prevents parsing ambiguity; it does not automatically prevent injection. Here are the security-adjacent rules I actually follow.\n\n### 1) Never trust schemes in user-provided URLs\nIf you accept a URL from a user (like a \“website\” field) and later render it into an , validate the scheme.\n\nWhat I allow in most apps:\n\n- http:\n- https:\n\nWhat I reject or handle carefully:\n\n- javascript: (XSS vector)\n- data:\n- file:\n- custom schemes\n\nEven if you encode the string, a browser will still treat javascript:alert(1) as a navigable URL if you place it in href. The fix is validation, not encoding.\n\n### 2) HTML attributes need HTML escaping\nIf you hand-roll HTML strings, you must HTML-escape attribute values. URL encoding does not replace that.\n\nA safe mental model:\n\n- URL encoding makes values safe for URL syntax.\n- HTML escaping makes the final string safe for HTML syntax.\n\n### 3) Open redirects are a business logic bug\nI covered this earlier, but it’s worth repeating: if you do res.redirect(req.query.next), you’ve created an open redirect unless you validate. Encoding doesn’t fix it.\n\n### 4) SSRF: when the server fetches URLs\nIf your server takes a user-supplied URL and fetches it (webhook tester, link preview, import-from-URL), you have SSRF risk. Encoding doesn’t fix it. You need controls like:\n\n- allow-list of domains\n- blocking private IP ranges\n- timeouts and size limits\n- safe DNS resolution behavior\n\n## Testing URL encoding (what I automate)\nI like tests here because bugs are annoying and regressions are common when refactoring routing or link builders.\n\n### A small but high-value test set\nI keep a table of \“nasty\” inputs and expected round-trips:\n\n- spaces: coffee beans\n- ampersand: rock & roll\n- plus: a+b (must stay plus unless intended as space)\n- percent: 50%\n- slash in segment: A/B\n- unicode: Zoë, café, 🙂\n- quotes: \"hello\"\n- hash/question: what?#\n\nThen I assert two properties:\n\n1) Building a URL from parts never produces an ambiguous string (no raw delimiters inside values).\n2) Parsing the URL returns exactly the original raw values.\n\nIn JavaScript, that property test looks like this in spirit:\n\n const inputs = [‘coffee beans‘, ‘rock & roll‘, ‘a+b‘, ‘50%‘, ‘A/B‘, ‘Zoë‘, ‘café‘, ‘🙂‘, ‘"hello"‘, ‘what?#‘];\n\n for (const value of inputs) {\n const url = new URL(‘https://example.test/search‘);\n url.searchParams.set(‘q‘, value);\n\n const parsed = new URL(url.toString());\n const roundTrip = parsed.searchParams.get(‘q‘);\n\n if (roundTrip !== value) throw new Error(Mismatch: ${value} -> ${roundTrip});\n }\n\nThat’s not fancy, but it catches the core failures.\n\n### Test the HTML boundary too\nIf you generate HTML strings (emails, server-rendered templates, CMS output), I also test that the rendered href is valid HTML and navigates correctly when parsed.\n\nAt minimum, I grep rendered HTML for raw & inside href attributes when I expect query strings. If I see ?a=1&b=2 in raw HTML, I know I’m missing &.\n\n## Troubleshooting: how I debug broken URLs fast\nWhen a URL is broken, I do this in roughly this order:\n\n1) Identify the component that’s wrong\n – path? query? fragment? nested URL?\n\n2) Check for delimiter leakage\n – If a query value contains raw & or =, that’s the culprit.\n\n3) Check for double-encoding\n – Look for %25 and patterns like %2520.\n\n4) Compare raw vs parsed\n – Raw: the string in HTML or the request target\n – Parsed: what URL / framework sees\n\n5) Look for a stray #\n – Anything after it won’t reach the server.\n\n6) Try decoding once (carefully)\n – If decodeURIComponent throws, you likely have malformed escapes or partial encoding.\n\n7) Reproduce with a minimal example\n – Reduce to one parameter and one special character. Most bugs become obvious when the noise is removed.\n\n## Expansion Strategy\nAdd new sections or deepen existing ones with:\n\n- Deeper code examples: More complete, real-world implementations\n- Edge cases: What breaks and how to handle it\n- Practical scenarios: When to use vs when NOT to use\n- Performance considerations: Before/after comparisons (use ranges, not exact numbers)\n- Common pitfalls: Mistakes developers make and how to avoid them\n- Alternative approaches: Different ways to solve the same problem\n\n## If Relevant to Topic\n\n- Modern tooling and AI-assisted workflows (for infrastructure/framework topics)\n- Comparison tables for Traditional vs Modern approaches\n- Production considerations: deployment, monitoring, scaling\n\n## Practical cheat sheet (what I want my past self to memorize)\nHere are the rules I actually reach for when I’m tired and shipping:\n\n- If it’s a query value: use URLSearchParams or encodeURIComponent(value).\n- If it’s a path segment: encodeURIComponent(segment) per segment (never encode the whole path blindly).\n- If it’s an entire URL inside another URL: encodeURIComponent(fullUrlString).\n- If it’s going into HTML: still HTML-escape attributes; & becomes &.\n- If it’s a redirect target: validate it (relative-only or allow-list). Encoding is not validation.\n- If something looks double-encoded: hunt for %25 and trace where encoding happens.\n\nIf you take only one habit from all of this, take this one: build URLs from parts using URL primitives, and treat encoding/decoding as boundary logic. That one habit prevents most of the \“it works locally\” URL bugs I’ve seen in real systems.

You maybe like,

Related Posts