Turbopack: switch from base40 to base38 hash encoding (remove ~ and . from charset)#91832
Conversation
Failing test suitesCommit: 26bf3a1 | About building and testing Next.js
Expand output● next-config-ts-tsconfig-extends-esm › should support tsconfig extends (ESM)
Expand output● nx-handling › should work for pages page ● nx-handling › should work for pages API ● nx-handling › should work with app page ● nx-handling › should work with app route ● Test suite failed to run |
Merging this PR will improve performance by 4.51%
Performance Changes
Comparing Footnotes
|
Stats from current PR✅ No significant changes detected📊 All Metrics📖 Metrics GlossaryDev Server Metrics:
Build Metrics:
Change Thresholds:
⚡ Dev Server
📦 Dev Server (Webpack) (Legacy)📦 Dev Server (Webpack)
⚡ Production Builds
📦 Production Builds (Webpack) (Legacy)📦 Production Builds (Webpack)
📦 Bundle SizesBundle Sizes⚡ TurbopackClient Main Bundles
Server Middleware
Build DetailsBuild Manifests
📦 WebpackClient Main Bundles
Polyfills
Pages
Server Edge SSR
Middleware
Build DetailsBuild Manifests
Build Cache
🔄 Shared (bundler-independent)Runtimes
📎 Tarball URL |
…harset The `~` and `.` characters in base40-encoded filenames (e.g. `turbopack-0c3o1svijj_~~.js`, `0...f7~att2_2.js`) are blocked by Nginx hardening rules and enterprise WAF configurations, causing 403 Forbidden errors on deployment. Remove these characters to produce filenames that are safe for all common web server configurations. Fixes #91678 Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Regenerated all snapshot output files using UPDATE=1 to reflect the new base38 hash encoding (charset `0-9a-z_-`, dropping `~` and `.`). Co-Authored-By: Claude <noreply@anthropic.com>
Static asset filenames embedded in the inline snapshot tests now use base38-encoded hashes instead of base40. Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
241d01b to
26bf3a1
Compare
| /// information loss. | ||
| pub const BASE40_LEN_64: usize = digits_for_bits(BASE, 64); | ||
| pub const BASE38_LEN_64: usize = digits_for_bits(BASE, 64); |
There was a problem hiding this comment.
In my opinion we should just increase from 64bit (13 chars) to 84bit (16 chars).
That would be an increase of 252x meaning 751 billion IDs until a 1% chance of collision.
Then we could probably omit the 128bit manifest.
|
so we avoid uppercase characters due to case insensitive file systems, but URLs are case sensitive, so using upper case character is fine we just don't want to produce two outputs that differ only by case. This is essentially a hash collision... do we need to care about it? |
What?
Fixes a regression from #91137
Switch Turbopack's hash encoding charset from base40 (
0-9 a-z _ - ~ .) to base38 (0-9 a-z _ -), removing the~and.characters. Pure rename/charset change — no structural changes.Why?
The
~and.characters in base40-encoded filenames are blocked by standard Nginx hardening rules (block_common_exploits.conf) and enterprise WAF configurations, causing 403 Forbidden errors when applications are deployed behind security-hardened infrastructure.Examples of problematic filenames:
turbopack-0c3o1svijj_~~.js—~~flagged as directory traversal / injection0...f7~att2_2.js—...flagged as path traversal attempt0q~2copru0zy0.css—~filtered by some WAF rulesetsPrevious hex-only filenames (e.g.
turbopack-01ca012029ca2e66.js) had no such issues.Fixes #91678
How?
Charset change (
turbo-tasks-hash/src/base38.rs, renamed frombase40.rs):0123456789abcdefghijklmnopqrstuvwxyz_-BASE40_*→BASE38_*,encode_base40→encode_base3838^13 > 2^64,38^25 > 2^128)HashAlgorithmenum variants (turbo-tasks-hash/src/lib.rs):Xxh3Hash64Base40→Xxh3Hash64Base38Xxh3Hash128Base40→Xxh3Hash128Base38Bit computation comment (
turbopack-core/src/ident.rs):7 base38 chars ≈ 37 bits→≈ 36 bits(log2(38) × 7 = 36.7)Test regex patterns (15 test files):
[0-9a-z_.~-]→[0-9a-z_-]to match the new charset