fix: percent-encode x-next-cache-tags header value (non-ASCII dynamic route params)#93167
Closed
ornakash wants to merge 5 commits into
Closed
fix: percent-encode x-next-cache-tags header value (non-ASCII dynamic route params)#93167ornakash wants to merge 5 commits into
ornakash wants to merge 5 commits into
Conversation
Any App Router dynamic route whose matched params value contains a
non-ASCII character (Hebrew, Arabic, Chinese, emoji, ...) returned 500
on every request when served by the minimal-mode render path
(NEXT_PRIVATE_MINIMAL_MODE=1, i.e. Vercel's edge runtime):
TypeError: Invalid character in header content ["x-next-cache-tags"]
code: 'ERR_INVALID_CHAR'
Add encodeCacheTagsHeaderValue which percent-encodes runs of non-ASCII
characters via encodeURIComponent while leaving printable ASCII
(including ',', '/', and '%') unchanged, and apply it at the four HTTP
write sites in the app-page and app-route templates. Internal tag
storage and tag comparison (getImplicitTags, revalidatePath,
areTagsExpired) still see the raw string, so cache invalidation
semantics are unchanged.
Fixes vercel#93142.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
unstubbable
added a commit
that referenced
this pull request
May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal `x-next-cache-tags` HTTP header on ISR responses. Node's `validateHeaderValue` rejects any byte outside `\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes. This change introduces a single `encodeCacheTag` helper and applies it at every public boundary — `validateTags` (which `cacheTag()`, `unstable_cache()`, and `fetch` tags all funnel through), `getImplicitTags` for path-derived tags, and `revalidatePath` / `revalidateTag` / `updateTag` for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach `encodeURIComponent` intact, and it is idempotent on already-encoded `%xx` sequences, so callers can pass either the raw or the encoded form interchangeably. PR #93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a `decodeURIComponent` round-trip that silently mangles literal `%xx` characters in tag values. PR #93167 encodes only at the `setHeader` sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync. fixes #93142 closes #93139 closes #93167 Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com> Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable
added a commit
that referenced
this pull request
May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal `x-next-cache-tags` HTTP header on ISR responses. Node's `validateHeaderValue` rejects any byte outside `\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes. This change introduces a single `encodeCacheTag` helper and applies it at every public boundary — `validateTags` (which `cacheTag()`, `unstable_cache()`, and `fetch` tags all funnel through), `getImplicitTags` for path-derived tags, and `revalidatePath` / `revalidateTag` / `updateTag` for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach `encodeURIComponent` intact, and it is idempotent on already-encoded `%xx` sequences, so callers can pass either the raw or the encoded form interchangeably. PR #93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a `decodeURIComponent` round-trip that silently mangles literal `%xx` characters in tag values. PR #93167 encodes only at the `setHeader` sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync. fixes #93142 closes #93139 closes #93167 Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com> Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable
added a commit
that referenced
this pull request
May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal `x-next-cache-tags` HTTP header on ISR responses. Node's `validateHeaderValue` rejects any byte outside `\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes. This change introduces a single `encodeCacheTag` helper and applies it at every public boundary — `validateTags` (which `cacheTag()`, `unstable_cache()`, and `fetch` tags all funnel through), `getImplicitTags` for path-derived tags, and `revalidatePath` / `revalidateTag` / `updateTag` for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach `encodeURIComponent` intact, and it is idempotent on already-encoded `%xx` sequences, so callers can pass either the raw or the encoded form interchangeably. PR #93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a `decodeURIComponent` round-trip that silently mangles literal `%xx` characters in tag values. PR #93167 encodes only at the `setHeader` sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync. fixes #93142 closes #93139 closes #93167 Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com> Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable
added a commit
that referenced
this pull request
May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal `x-next-cache-tags` HTTP header on ISR responses. Node's `validateHeaderValue` rejects any byte outside `\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes. This change introduces a single `encodeCacheTag` helper and applies it at every public boundary — `validateTags` (which `cacheTag()`, `unstable_cache()`, and `fetch` tags all funnel through), `getImplicitTags` for path-derived tags, and `revalidatePath` / `revalidateTag` / `updateTag` for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach `encodeURIComponent` intact, and it is idempotent on already-encoded `%xx` sequences, so callers can pass either the raw or the encoded form interchangeably. PR #93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a `decodeURIComponent` round-trip that silently mangles literal `%xx` characters in tag values. PR #93167 encodes only at the `setHeader` sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync. fixes #93142 closes #93139 closes #93167 Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com> Co-authored-by: Or Nakash <ornakash@gmail.com>
pull Bot
pushed a commit
to mikeyhodl/next.js
that referenced
this pull request
May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal `x-next-cache-tags` HTTP header on ISR responses. Node's `validateHeaderValue` rejects any byte outside `\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes. This change introduces a single `encodeCacheTag` helper and applies it at every public boundary — `validateTags` (which `cacheTag()`, `unstable_cache()`, and `fetch` tags all funnel through), `getImplicitTags` for path-derived tags, and `revalidatePath` / `revalidateTag` / `updateTag` for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach `encodeURIComponent` intact, and it is idempotent on already-encoded `%xx` sequences, so callers can pass either the raw or the encoded form interchangeably. PR vercel#93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a `decodeURIComponent` round-trip that silently mangles literal `%xx` characters in tag values. PR vercel#93167 encodes only at the `setHeader` sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync. fixes vercel#93142 closes vercel#93139 closes vercel#93167 Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com> Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable
added a commit
that referenced
this pull request
May 18, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal `x-next-cache-tags` HTTP header on ISR responses. Node's `validateHeaderValue` rejects any byte outside `\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes. This change introduces a single `encodeCacheTag` helper and applies it at every public boundary — `validateTags` (which `cacheTag()`, `unstable_cache()`, and `fetch` tags all funnel through), `getImplicitTags` for path-derived tags, and `revalidatePath` / `revalidateTag` / `updateTag` for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach `encodeURIComponent` intact, and it is idempotent on already-encoded `%xx` sequences, so callers can pass either the raw or the encoded form interchangeably. PR #93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a `decodeURIComponent` round-trip that silently mangles literal `%xx` characters in tag values. PR #93167 encodes only at the `setHeader` sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync. fixes #93142 closes #93139 closes #93167 Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com> Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable
added a commit
that referenced
this pull request
May 18, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal `x-next-cache-tags` HTTP header on ISR responses. Node's `validateHeaderValue` rejects any byte outside `\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes. This change introduces a single `encodeCacheTag` helper and applies it at every public boundary — `validateTags` (which `cacheTag()`, `unstable_cache()`, and `fetch` tags all funnel through), `getImplicitTags` for path-derived tags, and `revalidatePath` / `revalidateTag` / `updateTag` for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach `encodeURIComponent` intact, and it is idempotent on already-encoded `%xx` sequences, so callers can pass either the raw or the encoded form interchangeably. PR #93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a `decodeURIComponent` round-trip that silently mangles literal `%xx` characters in tag values. PR #93167 encodes only at the `setHeader` sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync. fixes #93142 closes #93139 closes #93167 Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com> Co-authored-by: Or Nakash <ornakash@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #93142.
What
Any App Router dynamic route whose matched
paramsvalue contains a non-ASCII character (Hebrew, Arabic, Chinese, emoji, …) returns 500 on every request when served by the minimal-mode render path (NEXT_PRIVATE_MINIMAL_MODE=1, which is what Vercel's edge runtime uses):The minimal-mode branch of the app-page template writes the matched route path into the internal
x-next-cache-tagsHTTP header (consumed by the edge cache to index ISR responses by tag). HTTP header values are limited to\t\x20-\x7ein Node, so any byte outside that range triggersvalidateHeaderValueand crashes ISR. The only user-side workaround today isexport const dynamic = 'force-dynamic', which disables ISR for the affected route.Non-minimal
next startis not affected because the tag header is stripped from the response before it's written — the crash is specific to the code path that keeps it for the edge.How
Added
encodeCacheTagsHeaderValueinpackages/next/src/server/lib/cache-tags-header.ts. It percent-encodes runs of non-ASCII characters usingencodeURIComponentwhile leaving the printable-ASCII range (including,,/, and%) byte-for-byte unchanged. This preserves the comma-separated format and the contents of user-supplied tags.Applied the helper at the three minimal-mode write sites inside
packages/next/src/build/templates/app-page.ts(the twores.setHeader(NEXT_CACHE_TAGS_HEADER, tags)calls for PPR and non-PPR branches, plus the shared header-iteration loop that re-emits cached headers viares.appendHeader). Also applied at the equivalent site inpackages/next/src/build/templates/app-route.tswhere the tag list is kept on the clonedHeadersbag beforesendResponse.Deliberately not encoding at the
headers[NEXT_CACHE_TAGS_HEADER] = cacheTagsassignment sites in thebuild/andexport/paths: those bags are stored by the cache handler / in.metafiles and are later read back and fed to the patched write sites. Keeping storage raw preserves the invariant that cache-key comparisons (getImplicitTags,revalidatePath,areTagsExpired) see the same string they always have, so nothing changes for internal cache invalidation semantics.Open question for maintainers
On the invalidation side, when
revalidatePath('/מידע')fires and the tag list is forwarded to downstream cache infrastructure (Vercel edge, customcacheHandler), should that wire representation also be encoded? The failure mode this PR fixes is server-side — the request never reached any cache — so there's no behavior change for existing deployments. If the edge wants consistency between the stored tag (as seen in the header) and the invalidation tag, that's a follow-up. Happy to extend if you point me at the right call site.Reproduction
/מידע, 200 on/hello): https://nextjs-hebrew-slug-repro.vercel.appNEXT_PRIVATE_MINIMAL_MODE=1 pnpm startthencurl -I http://localhost:3000/%D7%9E%D7%99%D7%93%D7%A2Before / after
With
NEXT_PRIVATE_MINIMAL_MODE=1set locally:curl -I /hellocurl -I /%D7%9E%D7%99%D7%93%D7%A2ERR_INVALID_CHARx-next-cache-tags: _N_T_/layout,_N_T_/[slug]/layout,_N_T_/[slug]/page,_N_T_/%D7%9E%D7%99%D7%93%D7%A2Tests
Added
packages/next/src/server/lib/cache-tags-header.test.tscovering: ASCII passes through unchanged,,///%are preserved, Hebrew & CJK are encoded to UTF-8 percent-form, emoji (surrogate pair) round-trips viaencodeURIComponent, mixed ASCII + non-ASCII tag lists, and the output matches Node's header-validation character class (^[\t\x20-\x7e]*$).All 6 unit tests pass.