perf: replace the url crate with iri-string (-210 KB)#9794
Closed
Boshen wants to merge 2 commits into
Closed
Conversation
`url` pulls in `idna` -> `idna_adapter` -> ICU4X (~21 crates, ~210 KB of the shipped binary) for internationalized-domain-name handling that rolldown never exercises. `iri-string` is a zero-dependency RFC 3986 URI parser that treats the host as opaque text (no IDNA/ICU), so it covers the three `Url` call sites with a real, spec-compliant parser instead of hand-rolled string handling: - file_url.rs: parse `file://` components (authority/path/query/fragment). - normalize_binding_options.rs: validate `sourcemapBaseUrl`, append trailing `/`. - process_code_and_sourcemap.rs: the normalized base ends with `/`, so joining the relative map filename is a plain append. Removes `url` from the binary; `idna`/ICU remain only behind the `jsonschema` test dependency. Measured url -> iri-string delta (aarch64 LTO release): -210 KB (23,675,392 -> 23,460,448 bytes), i.e. ~93% of removing `url` entirely, with a real parser rather than hand-rolled parsing.
✅ Deploy Preview for rolldown-rs canceled.
|
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
This was referenced Jun 17, 2026
Member
Author
graphite-app Bot
pushed a commit
that referenced
this pull request
Jun 17, 2026
…29 KB) (#9811) ## Disable `idna`'s ICU backend instead of removing `url` `url` is pulled in **only** by rolldown crates, and it drags the entire IDNA/Unicode stack along: ``` url -> idna -> idna_adapter -> ICU4X (icu_normalizer / icu_properties + *_data + zerovec / yoke / tinystr / ... — ~20 crates) ``` That ships **~129 KB** of Unicode tables in `librolldown_binding.node` purely for internationalized-domain-name handling (`münchen.de` → `xn--mnchen-3ya.de`) that rolldown never needs. `url` mandates `idna` for special-scheme hosts (incl. `file:`) and has **no feature** to turn it off. ### This PR — the least-invasive option [`idna_adapter`](https://docs.rs/crate/idna_adapter) is the indirection layer that lets `idna` pick a Unicode backend. Its **1.0.0** release is the official *no-ICU* backend (zero dependencies); **1.1+** switched to ICU4X. Pinning it to `=1.0.0` makes `idna` resolve to the ASCII-only implementation and drops the entire ICU4X stack from the build — while **`url` and its WHATWG behaviour stay exactly as-is (zero new code)**. - `Cargo.toml`: `idna_adapter = "=1.0.0"` in `[workspace.dependencies]`, referenced from `rolldown_plugin_vite_resolve` (the crate behind the `url` edge). - The exact `=1.0.0` requirement makes any ICU-backed version **unresolvable**, so a blanket `cargo update` can't silently pull it back in — verified: `cargo update -p idna_adapter --precise 1.2.2` is rejected with `failed to select a version for the requirement idna_adapter = "=1.0.0"`. - Excluded from `cargo-shear` (`[package.metadata.cargo-shear]`) since it's a build-only pin we never import. ### Trade-off Per the [`idna_adapter` docs](https://docs.rs/crate/idna_adapter/latest#:~:text=Turning%20off%20IDNA%20support), the no-Unicode backend rejects non-ASCII domain inputs and skips UTS-46 enforcement on Punycode labels. rolldown only ever feeds `url` ASCII (file paths / sourcemap URLs), so this is a no-op in practice. ### Impact (darwin-arm64, fat-LTO release, same-machine A/B) | build | `rolldown-binding.node` | saved | |---|---|---| | baseline (`url` + `idna` + ICU4X) | 23,409,856 | — | | **this PR (no-ICU `idna_adapter`)** | **23,277,696** | **−132,160 bytes (−129 KB)** | The `icu_*` / `zerovec` / `yoke` / `tinystr` / … crates are gone from the build (`idna` + ICU remain in `Cargo.lock` only behind the `jsonschema` **test** dependency). The −132,160-byte saving is byte-for-byte the same as the "keep `url`" row measured in #9794, confirming the official `idna_adapter 1.0.0` matches a vendored ASCII stub. ### Relationship to #9790 / #9794 Both #9790 (hand-rolled parser, −226 KB) and #9794 (`iri-string`, −210 KB) removed `url` entirely and explicitly flagged this as the safe, minimal alternative. This captures ~57% of the maximum saving with **no behaviour change and no URL-parsing code to own**, so it supersedes both — they're closed in favour of this.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Do we need the
urlcrate? No — a 0-dep RFC 3986 parser covers it.urlis pulled in only by rolldown crates, and it drags the entire IDNA/Unicode stack along:That's ~210 KB of the shipped
librolldown_binding.nodefor internationalized-domain-name handling (münchen.de-> punycode) that rolldown never needs — andurlhas no feature to disableidna.iri-stringis a zero-dependency, RFC 3986/3987 URI parser. It treats the host as opaque text, so it has no IDNA/ICU at all, while still being a real, spec-compliant parser (validation + relative resolution) rather than hand-rolled string slicing.What we use
urlfor, and the replacementiri-stringfile_url.rsfile://-> pathUriStrcomponents (authority/path/query/fragment)normalize_binding_options.rssourcemapBaseUrl+ trailing/UriStr::newvalidation + append/process_code_and_sourcemap.rs/, so plain append10 new unit tests, all passing; matches the
output/sourcemap-base-urlfixture.Impact (aarch64, LTO release, same-worktree A/B)
url+idna+ ICU4X)idna_adapterASCII stub (keepsurl)[patch]iri-string(this PR)iri-stringcaptures 93% of removingurlentirely (−210 of −226 KB), only 16 KB behind hand-rolling, but the tricky parts (file://parsing, RFC-3986 resolution) are done by a real parser. It removesurl+idna+ the ICU4X stack from the binary (idna/ICU remain only behind thejsonschematest dep).Trade-off
iri-stringis RFC 3986 (stricter thanurl's WHATWG) — it rejects a few malformed inputsurlwould auto-normalize (e.g. raw unencoded spaces). For the well-formedfile:///sourcemap URLs rolldown actually sees they agree, andsourcemapBaseUrlvalidation just gets marginally stricter.Relationship to #9790
This is the recommended alternative to #9790 (which hand-rolls the parsing): near-identical savings, but no hand-written URL parser to own. Draft to pick between the two approaches.
Not yet validated here: the JS integration fixtures (need a full napi build + vitest) and the Windows
file://path.