Skip to content

Encode non-ASCII characters in cache tags at construction#93601

Merged
unstubbable merged 1 commit into
canaryfrom
hl/non-ascii-cache-tags
May 7, 2026
Merged

Encode non-ASCII characters in cache tags at construction#93601
unstubbable merged 1 commit into
canaryfrom
hl/non-ascii-cache-tags

Conversation

@unstubbable

@unstubbable unstubbable commented May 7, 2026

Copy link
Copy Markdown
Contributor

When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …) it gets written into the internal x-next-cache-tags HTTP header on ISR responses. Node's validateHeaderValue rejects any byte outside \t\x20-\x7e, so the response crashes with ERR_INVALID_CHAR. On Vercel deploys stale-if-error masks the 500 from clients, but revalidation itself keeps failing and the cache stops refreshing for affected routes.

This change introduces a single encodeCacheTag helper and applies it at every public boundary — validateTags (which cacheTag(), unstable_cache(), and fetch tags all funnel through), getImplicitTags for path-derived tags, and revalidatePath / revalidateTag / updateTag for invalidation inputs. The encoder matches runs of out-of-class code units so surrogate pairs reach encodeURIComponent intact, and it is idempotent on already-encoded %xx sequences, so callers can pass either the raw or the encoded form interchangeably.

PR #93139 already encodes path-derived tags at construction, but it misses every user-supplied tag entry point and uses a decodeURIComponent round-trip that silently mangles literal %xx characters in tag values. PR #93167 encodes only at the setHeader sites, which leaves storage and invalidation diverging and requires every new write site to remember the encoding step. The canonical-form-at-the-boundary approach taken here covers all entry points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167


Co-authored-by: @swarnava
Co-authored-by: @ornakash

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Tests Passed

Commit: 0f96095

@unstubbable unstubbable changed the base branch from canary to graphite-base/93601 May 7, 2026 21:14
@unstubbable unstubbable force-pushed the hl/non-ascii-cache-tags branch from 8606599 to f9de323 Compare May 7, 2026 21:14
@unstubbable unstubbable changed the base branch from graphite-base/93601 to hl/fix-unstable-cache-revalidation May 7, 2026 21:15

unstubbable commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

@unstubbable unstubbable marked this pull request as ready for review May 7, 2026 21:41
@unstubbable unstubbable requested a review from gnoff May 7, 2026 21:44

@gnoff gnoff left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is landable but we should consider perturbing encoded keys so they cannot be matched synonymous key inputs

unstubbable commented May 7, 2026

Copy link
Copy Markdown
Contributor Author

Merge activity

  • May 7, 10:42 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • May 7, 10:43 PM UTC: Graphite couldn't merge this PR because it failed for an unknown reason (unsigned commits detected).

@unstubbable unstubbable changed the base branch from hl/fix-unstable-cache-revalidation to graphite-base/93601 May 7, 2026 22:42
@unstubbable unstubbable changed the base branch from graphite-base/93601 to canary May 7, 2026 22:42
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
@unstubbable unstubbable force-pushed the hl/non-ascii-cache-tags branch from f9de323 to 0f96095 Compare May 7, 2026 22:46
@unstubbable unstubbable enabled auto-merge (squash) May 7, 2026 22:56
@unstubbable unstubbable merged commit 9e18303 into canary May 7, 2026
184 of 185 checks passed
@unstubbable unstubbable deleted the hl/non-ascii-cache-tags branch May 7, 2026 23:09
james-elicx added a commit to cloudflare/vinext that referenced this pull request May 8, 2026
Tags with non-ASCII characters (Hebrew, Arabic, CJK, emoji) can crash ISR
responses with ERR_INVALID_CHAR when written to x-next-cache-tags, and
silently break invalidation when storage form diverges from the form
revalidateTag/revalidatePath produce.

Add encodeCacheTag/encodeCacheTags helper (matches behavior of
vercel/next.js#93601) and apply it at every public boundary: cacheTag,
revalidateTag, revalidatePath, updateTag, unstable_cache options.tags,
fetch next.tags, and the path-derived _N_T_ tag builders.

Closes #1138
james-elicx added a commit to cloudflare/vinext that referenced this pull request May 8, 2026
…#1143)

Tags with non-ASCII characters (Hebrew, Arabic, CJK, emoji) can crash ISR
responses with ERR_INVALID_CHAR when written to x-next-cache-tags, and
silently break invalidation when storage form diverges from the form
revalidateTag/revalidatePath produce.

Add encodeCacheTag/encodeCacheTags helper (matches behavior of
vercel/next.js#93601) and apply it at every public boundary: cacheTag,
revalidateTag, revalidatePath, updateTag, unstable_cache options.tags,
fetch next.tags, and the path-derived _N_T_ tag builders.

Closes #1138
unstubbable added a commit that referenced this pull request May 18, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable added a commit that referenced this pull request May 18, 2026
When a cache tag contains a non-ASCII character (Hebrew, CJK, emoji, …)
it gets written into the internal `x-next-cache-tags` HTTP header on ISR
responses. Node's `validateHeaderValue` rejects any byte outside
`\t\x20-\x7e`, so the response crashes with `ERR_INVALID_CHAR`. On
Vercel deploys stale-if-error masks the 500 from clients, but
revalidation itself keeps failing and the cache stops refreshing for
affected routes.

This change introduces a single `encodeCacheTag` helper and applies it
at every public boundary — `validateTags` (which `cacheTag()`,
`unstable_cache()`, and `fetch` tags all funnel through),
`getImplicitTags` for path-derived tags, and `revalidatePath` /
`revalidateTag` / `updateTag` for invalidation inputs. The encoder
matches runs of out-of-class code units so surrogate pairs reach
`encodeURIComponent` intact, and it is idempotent on already-encoded
`%xx` sequences, so callers can pass either the raw or the encoded form
interchangeably.

PR #93139 already encodes path-derived tags at construction, but it
misses every user-supplied tag entry point and uses a
`decodeURIComponent` round-trip that silently mangles literal `%xx`
characters in tag values. PR #93167 encodes only at the `setHeader`
sites, which leaves storage and invalidation diverging and requires
every new write site to remember the encoding step. The
canonical-form-at-the-boundary approach taken here covers all entry
points and keeps storage, comparison, and the wire in sync.

fixes #93142
closes #93139
closes #93167

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
unstubbable added a commit that referenced this pull request May 19, 2026
…93918)

Backports:

- #93617
- #93601

---------

Co-authored-by: Swarnava Sengupta <swarnava.sengupta@vercel.com>
Co-authored-by: Or Nakash <ornakash@gmail.com>
@github-actions github-actions Bot locked as resolved and limited conversation to collaborators May 22, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

App Router: non-ASCII dynamic route params crash with ERR_INVALID_CHAR on x-next-cache-tags header

2 participants