Skip to content

[backport] Turbopack: switch from base40 to base38 hash encoding#93932

Merged
mischnic merged 1 commit into
next-16-2from
mischnic/backport-base38
May 19, 2026
Merged

[backport] Turbopack: switch from base40 to base38 hash encoding#93932
mischnic merged 1 commit into
next-16-2from
mischnic/backport-base38

Conversation

@mischnic

@mischnic mischnic commented May 19, 2026

Copy link
Copy Markdown
Member

Backport of #91832

For #93790 and #92711

… from charset) (#91832)

Switch Turbopack's hash encoding charset from base40 (`0-9 a-z _ - ~ .`)
to base38 (`0-9 a-z _ -`), removing the `~` and `.` characters. Pure
rename/charset change — no structural changes.

The `~` and `.` characters in base40-encoded filenames are blocked by
standard Nginx hardening rules (`block_common_exploits.conf`) and
enterprise WAF configurations, causing **403 Forbidden errors** when
applications are deployed behind security-hardened infrastructure.

Examples of problematic filenames:
- `turbopack-0c3o1svijj_~~.js` — `~~` flagged as directory traversal /
injection
- `0...f7~att2_2.js` — `...` flagged as path traversal attempt
- `0q~2copru0zy0.css` — `~` filtered by some WAF rulesets

Previous hex-only filenames (e.g. `turbopack-01ca012029ca2e66.js`) had
no such issues.

Fixes #91678

**Charset change** (`turbo-tasks-hash/src/base38.rs`, renamed from
`base40.rs`):
- Alphabet reduced from 40 to 38 characters:
`0123456789abcdefghijklmnopqrstuvwxyz_-`
- All constants and functions renamed: `BASE40_*` → `BASE38_*`,
`encode_base40` → `encode_base38`
- Hash lengths unchanged: 13 chars for 64-bit, 25 chars for 128-bit
(`38^13 > 2^64`, `38^25 > 2^128`)
- Content hash length stays at 13 (68.2 bits vs 69.2 bits with base40 —
negligible)

**`HashAlgorithm` enum variants** (`turbo-tasks-hash/src/lib.rs`):
- `Xxh3Hash64Base40` → `Xxh3Hash64Base38`
- `Xxh3Hash128Base40` → `Xxh3Hash128Base38`

**Bit computation comment** (`turbopack-core/src/ident.rs`):
- Updated `7 base38 chars ≈ 37 bits` → `≈ 36 bits` (log2(38) × 7 = 36.7)

**Test regex patterns** (15 test files):
- Updated `[0-9a-z_.~-]` → `[0-9a-z_-]` to match the new charset

---------

Co-authored-by: Tobias Koppers <sokra@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
@mischnic mischnic changed the title Turbopack: switch from base40 to base38 hash encoding (remove ~ and .… [backport] Turbopack: switch from base40 to base38 hash encoding May 19, 2026
@github-actions

github-actions Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Tests Passed

Commit: 1bc82d0

@mischnic mischnic requested a review from sokra May 19, 2026 09:05
@mischnic mischnic enabled auto-merge (squash) May 19, 2026 09:12
@mischnic mischnic merged commit 7e16e07 into next-16-2 May 19, 2026
250 of 255 checks passed
@mischnic mischnic deleted the mischnic/backport-base38 branch May 19, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants