Skip to content

fix: percent-encode x-next-cache-tags header value (non-ASCII dynamic route params)#93167

Closed
ornakash wants to merge 5 commits into
vercel:canaryfrom
ornakash:fix/x-next-cache-tags-non-ascii
Closed

fix: percent-encode x-next-cache-tags header value (non-ASCII dynamic route params)#93167
ornakash wants to merge 5 commits into
vercel:canaryfrom
ornakash:fix/x-next-cache-tags-non-ascii

Conversation

@ornakash

Copy link
Copy Markdown
Contributor

Fixes #93142.

What

Any App Router dynamic route whose matched params value contains a non-ASCII character (Hebrew, Arabic, Chinese, emoji, …) returns 500 on every request when served by the minimal-mode render path (NEXT_PRIVATE_MINIMAL_MODE=1, which is what Vercel's edge runtime uses):

TypeError: Invalid character in header content ["x-next-cache-tags"]
  code: 'ERR_INVALID_CHAR'

The minimal-mode branch of the app-page template writes the matched route path into the internal x-next-cache-tags HTTP header (consumed by the edge cache to index ISR responses by tag). HTTP header values are limited to \t\x20-\x7e in Node, so any byte outside that range triggers validateHeaderValue and crashes ISR. The only user-side workaround today is export const dynamic = 'force-dynamic', which disables ISR for the affected route.

Non-minimal next start is not affected because the tag header is stripped from the response before it's written — the crash is specific to the code path that keeps it for the edge.

How

Added encodeCacheTagsHeaderValue in packages/next/src/server/lib/cache-tags-header.ts. It percent-encodes runs of non-ASCII characters using encodeURIComponent while leaving the printable-ASCII range (including ,, /, and %) byte-for-byte unchanged. This preserves the comma-separated format and the contents of user-supplied tags.

Applied the helper at the three minimal-mode write sites inside packages/next/src/build/templates/app-page.ts (the two res.setHeader(NEXT_CACHE_TAGS_HEADER, tags) calls for PPR and non-PPR branches, plus the shared header-iteration loop that re-emits cached headers via res.appendHeader). Also applied at the equivalent site in packages/next/src/build/templates/app-route.ts where the tag list is kept on the cloned Headers bag before sendResponse.

Deliberately not encoding at the headers[NEXT_CACHE_TAGS_HEADER] = cacheTags assignment sites in the build/ and export/ paths: those bags are stored by the cache handler / in .meta files and are later read back and fed to the patched write sites. Keeping storage raw preserves the invariant that cache-key comparisons (getImplicitTags, revalidatePath, areTagsExpired) see the same string they always have, so nothing changes for internal cache invalidation semantics.

Open question for maintainers

On the invalidation side, when revalidatePath('/מידע') fires and the tag list is forwarded to downstream cache infrastructure (Vercel edge, custom cacheHandler), should that wire representation also be encoded? The failure mode this PR fixes is server-side — the request never reached any cache — so there's no behavior change for existing deployments. If the edge wants consistency between the stored tag (as seen in the header) and the invalidation tag, that's a follow-up. Happy to extend if you point me at the right call site.

Reproduction

Before / after

With NEXT_PRIVATE_MINIMAL_MODE=1 set locally:

unpatched patched
curl -I /hello 200 200
curl -I /%D7%9E%D7%99%D7%93%D7%A2 500 ERR_INVALID_CHAR 200, x-next-cache-tags: _N_T_/layout,_N_T_/[slug]/layout,_N_T_/[slug]/page,_N_T_/%D7%9E%D7%99%D7%93%D7%A2

Tests

Added packages/next/src/server/lib/cache-tags-header.test.ts covering: ASCII passes through unchanged, ,///% are preserved, Hebrew & CJK are encoded to UTF-8 percent-form, emoji (surrogate pair) round-trips via encodeURIComponent, mixed ASCII + non-ASCII tag lists, and the output matches Node's header-validation character class (^[\t\x20-\x7e]*$).

All 6 unit tests pass.

ornakash and others added 4 commits April 23, 2026 23:27
Any App Router dynamic route whose matched params value contains a
non-ASCII character (Hebrew, Arabic, Chinese, emoji, ...) returned 500
on every request when served by the minimal-mode render path
(NEXT_PRIVATE_MINIMAL_MODE=1, i.e. Vercel's edge runtime):

    TypeError: Invalid character in header content ["x-next-cache-tags"]
      code: 'ERR_INVALID_CHAR'

Add encodeCacheTagsHeaderValue which percent-encodes runs of non-ASCII
characters via encodeURIComponent while leaving printable ASCII
(including ',', '/', and '%') unchanged, and apply it at the four HTTP
write sites in the app-page and app-route templates. Internal tag
storage and tag comparison (getImplicitTags, revalidatePath,
areTagsExpired) still see the raw string, so cache invalidation
semantics are unchanged.

Fixes vercel#93142.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
unstubbable added a commit that referenced this pull request May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable added a commit that referenced this pull request May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable added a commit that referenced this pull request May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable added a commit that referenced this pull request May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
pull Bot pushed a commit to mikeyhodl/next.js that referenced this pull request May 7, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR vercel#93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR vercel#93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes vercel#93142
closes vercel#93139
closes vercel#93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
@ornakash ornakash deleted the fix/x-next-cache-tags-non-ascii branch May 7, 2026 23:23
unstubbable added a commit that referenced this pull request May 18, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable added a commit that referenced this pull request May 18, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators May 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

App Router: non-ASCII dynamic route params crash with ERR_INVALID_CHAR on x-next-cache-tags header

1 participant