Turbopack: switch chunk/asset hashes from hex to base40 encoding#91137
Merged
Conversation
6aa3f4c to
a8e5979
Compare
Contributor
Tests Passed |
Merging this PR will not alter performance
Comparing Footnotes
|
Contributor
|
Allow CI Workflow Run
Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer |
sokra
commented
Mar 10, 2026
sokra
commented
Mar 10, 2026
sokra
commented
Mar 10, 2026
bced220 to
70a7b11
Compare
Contributor
Stats from current PR✅ No significant changes detected📊 All Metrics📖 Metrics GlossaryDev Server Metrics:
Build Metrics:
Change Thresholds:
⚡ Dev Server
📦 Dev Server (Webpack) (Legacy)📦 Dev Server (Webpack)
⚡ Production Builds
📦 Production Builds (Webpack) (Legacy)📦 Production Builds (Webpack)
📦 Bundle SizesBundle Sizes⚡ TurbopackClient Main Bundles: **408 kB** → **408 kB** ✅ -19 B80 files with content-based hashes (individual files not comparable between builds) Server Middleware
Build DetailsBuild Manifests
📦 WebpackClient Main Bundles
Polyfills
Pages
Server Edge SSR
Middleware
Build DetailsBuild Manifests
Build Cache
🔄 Shared (bundler-independent)Runtimes
📝 Changed Files (2 files)Files with changes:
View diffspages-api.runtime.dev.jsDiff too large to display pages.runtime.dev.jsDiff too large to display 📎 Tarball URL |
mischnic
reviewed
Mar 12, 2026
mischnic
reviewed
Mar 12, 2026
mischnic
approved these changes
Mar 12, 2026
lukesandberg
approved these changes
Mar 12, 2026
Use a base40 alphabet (0-9 a-z _ - ~ .) for hash encoding in output filenames instead of hexadecimal. This produces shorter hashes while maintaining equivalent collision resistance: - 16 hex chars → 13 base40 chars (~64 bits) - 8 hex chars → 7 base40 chars (~32 bits) - 5 hex chars → 4 base40 chars (~20 bits) The alphabet is URL-safe (RFC 3986 unreserved) and filesystem-safe on all OSes including case-insensitive filesystems. Internal manifests and identifiers remain hex-encoded.
- Extract shared `encode_base40_fixed<N>` generic helper to eliminate duplication between `encode_base40` and `encode_base40_128` - Export `BASE40_LEN_64` and `BASE40_LEN_128` constants for the full hash string widths - Add comments at truncation sites documenting the approximate bit-strength of each truncated hash - Extract `short_hash` variable in asset_path methods to avoid repeating the truncation slice in both format! arms - Update module doc comment to mention base40 encoding
Match chunk filename hash length (13 base40 chars ≈ 69 bits) for asset filenames instead of the previous 7-char truncation.
…variable names containing invalid identifier characters (`-`, `~`, `.`), causing syntax errors in generated WASM loader code.
This commit fixes the issue reported at turbopack/crates/turbopack-wasm/src/lib.rs:29
**Bug explanation:**
The `wasm_edge_var_name` function generates a JavaScript variable name of the form `wasm_{hash}`. This variable name is interpolated directly into JavaScript code in `loader.rs` at two call sites (lines 51 and 78), appearing in expressions like:
```js
const { exports } = await __turbopack_wasm__(wasmPath, () => wasm_abc123, imports);
```
and:
```js
const mod = await __turbopack_wasm_module__(wasmPath, () => wasm_abc123);
```
The hash algorithm was changed from `Xxh3Hash128Hex` to `Xxh3Hash128Base40`. The base40 alphabet is `0123456789abcdefghijklmnopqrstuvwxyz_-~.`, which includes `-`, `~`, and `.` — characters that are NOT valid JavaScript identifier characters. A 25-character base40 hash has approximately an 87% chance of containing at least one of these three characters, making this effectively a guaranteed failure.
For example, a generated variable name like `wasm_abc-def~ghi.jkl` would produce invalid JavaScript:
```js
() => wasm_abc-def~ghi.jkl // SyntaxError: `-` is subtraction, `~` is bitwise NOT, `.` is property access
```
**Fix explanation:**
Reverted the hash algorithm for `wasm_edge_var_name` back to `Xxh3Hash128Hex`, which only produces `0-9a-f` characters — all valid in JavaScript identifiers. Other base40 usages (for filenames, version strings, output asset paths) are correct and left unchanged, as the base40 alphabet is URL-safe and filesystem-safe for those contexts.
Co-authored-by: Vercel <vercel[bot]@users.noreply.github.com>
Co-authored-by: sokra <tobias.koppers@googlemail.com>
…to base64 - Move ContentHashing enum from turbopack-browser to turbopack-core so both Browser and NodeJs chunking contexts can use it - Rename content_hashing -> chunk_content_hashing in BrowserChunkingContext - Add separate asset_content_hashing field to both BrowserChunkingContext and NodeJsChunkingContext - Rename builder method use_content_hashing() -> chunk_content_hashing() - Add builder method asset_content_hashing() - Switch all version hashes from base40 to base64 encoding since version identifiers don't need URL/filesystem safety - Add encode_base64 helper for u64 -> base64 encoding
asset_content_hashing is always needed for asset paths, so make it a
plain ContentHashing instead of Option<ContentHashing>. Defaults to
ContentHashing::Direct { length: 13 } (69 bits of collision resistance).
Regenerate all Turbopack snapshot test files to match new base40 hash format. Update test helper regexes (stripTestHash, stripVercelPngHash) and inline patterns from [0-9a-f] to [0-9a-z_.~-] to match the base40 character set.
…age hashes Co-Authored-By: Claude <noreply@anthropic.com>
…oded comment
- Remove `.asset_content_hashing(ContentHashing::Direct { length: 13 })` from
BrowserChunkingContext builder since 13 is already the default
- Remove hardcoded "13 base40 chars" comments from asset_path() in both
BrowserChunkingContext and NodeJsChunkingContext since length is now dynamic
Replace hardcoded `BASE40_LEN_64 = 13` and `BASE40_LEN_128 = 25` with a const fn `digits_for_bits()` that computes the number of base-N digits needed to represent all values of a given bit width. Static assertions verify the computed values match expectations.
fe36984 to
f37a37f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Switch Turbopack's hash encoding for chunk and asset output filenames from hexadecimal (base16) to base40, using the alphabet `0-9 a-z _ - ~ .`. Version hashes (used for HMR update comparison, not filenames) use base64 instead.
Why?
Base40 encodes the same number of bits in fewer characters than hex, producing shorter output filenames. All 40 characters are RFC 3986 unreserved (URL-safe) and safe on case-insensitive filesystems (macOS HFS+/APFS, Windows NTFS).
Hash truncation lengths are reduced proportionally to maintain equivalent collision resistance:
How?
New encoding module (`turbo-tasks-hash/src/base40.rs`):
New base64 encoding (`turbo-tasks-hash/src/base64.rs`):
New `HashAlgorithm` variants (`turbo-tasks-hash/src/lib.rs`):
`ContentHashing` moved to `turbopack-core`:
Separate chunk vs asset content hashing:
Version hashes switched to base64:
Other callers updated (15 files across turbopack and next-core):
Exception — `wasm_edge_var_name` (`turbopack-wasm/src/lib.rs`):
Scope — NOT changed: