Skip to content

perf: replace the url crate with iri-string (-210 KB)#9794

Closed
Boshen wants to merge 2 commits into
mainfrom
perf/replace-url-with-iri-string
Closed

perf: replace the url crate with iri-string (-210 KB)#9794
Boshen wants to merge 2 commits into
mainfrom
perf/replace-url-with-iri-string

Conversation

@Boshen

@Boshen Boshen commented Jun 16, 2026

Copy link
Copy Markdown
Member

Do we need the url crate? No — a 0-dep RFC 3986 parser covers it.

url is pulled in only by rolldown crates, and it drags the entire IDNA/Unicode stack along:

url -> idna -> idna_adapter -> ICU4X (icu_normalizer / icu_properties + *_data + zerovec / yoke / ... — ~21 crates)

That's ~210 KB of the shipped librolldown_binding.node for internationalized-domain-name handling (münchen.de -> punycode) that rolldown never needs — and url has no feature to disable idna.

iri-string is a zero-dependency, RFC 3986/3987 URI parser. It treats the host as opaque text, so it has no IDNA/ICU at all, while still being a real, spec-compliant parser (validation + relative resolution) rather than hand-rolled string slicing.

What we use url for, and the replacement

site usage with iri-string
file_url.rs parse file:// -> path UriStr components (authority/path/query/fragment)
normalize_binding_options.rs validate sourcemapBaseUrl + trailing / UriStr::new validation + append /
process_code_and_sourcemap.rs join map filename onto base base is normalized to end in /, so plain append

10 new unit tests, all passing; matches the output/sourcemap-base-url fixture.

Impact (aarch64, LTO release, same-worktree A/B)

approach size saved parser
baseline (url + idna + ICU4X) 23,675,392 WHATWG
idna_adapter ASCII stub (keeps url) 23,543,232 −129 KB WHATWG, needs a vendored crate + [patch]
iri-string (this PR) 23,460,448 −210 KB real RFC 3986, no ICU, 0 deps
full hand-roll (#9790) 23,443,904 −226 KB hand-written

iri-string captures 93% of removing url entirely (−210 of −226 KB), only 16 KB behind hand-rolling, but the tricky parts (file:// parsing, RFC-3986 resolution) are done by a real parser. It removes url + idna + the ICU4X stack from the binary (idna/ICU remain only behind the jsonschema test dep).

Trade-off

iri-string is RFC 3986 (stricter than url's WHATWG) — it rejects a few malformed inputs url would auto-normalize (e.g. raw unencoded spaces). For the well-formed file:///sourcemap URLs rolldown actually sees they agree, and sourcemapBaseUrl validation just gets marginally stricter.

Relationship to #9790

This is the recommended alternative to #9790 (which hand-rolls the parsing): near-identical savings, but no hand-written URL parser to own. Draft to pick between the two approaches.

Not yet validated here: the JS integration fixtures (need a full napi build + vitest) and the Windows file:// path.

`url` pulls in `idna` -> `idna_adapter` -> ICU4X (~21 crates, ~210 KB of the
shipped binary) for internationalized-domain-name handling that rolldown never
exercises. `iri-string` is a zero-dependency RFC 3986 URI parser that treats the
host as opaque text (no IDNA/ICU), so it covers the three `Url` call sites with a
real, spec-compliant parser instead of hand-rolled string handling:

- file_url.rs: parse `file://` components (authority/path/query/fragment).
- normalize_binding_options.rs: validate `sourcemapBaseUrl`, append trailing `/`.
- process_code_and_sourcemap.rs: the normalized base ends with `/`, so joining
  the relative map filename is a plain append.

Removes `url` from the binary; `idna`/ICU remain only behind the `jsonschema`
test dependency.

Measured url -> iri-string delta (aarch64 LTO release): -210 KB
(23,675,392 -> 23,460,448 bytes), i.e. ~93% of removing `url` entirely, with a
real parser rather than hand-rolled parsing.
@netlify

netlify Bot commented Jun 16, 2026

Copy link
Copy Markdown

Deploy Preview for rolldown-rs canceled.

Name Link
🔨 Latest commit b88a931
🔍 Latest deploy log https://app.netlify.com/projects/rolldown-rs/deploys/6a312f396060a70008fa65eb

@socket-security

socket-security Bot commented Jun 16, 2026

Copy link
Copy Markdown

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addedcargo/​iri-string@​0.7.127210093100100

View full report

@Boshen

Boshen commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Superseded by #9811. Rather than swapping the parser, #9811 keeps url as-is and disables idna's ICU backend via the idna_adapter = "=1.0.0" pin — the −129 KB "keep url" option from this PR's table, but using the official no-ICU idna_adapter release instead of a vendored stub + [patch].

@Boshen Boshen closed this Jun 17, 2026
graphite-app Bot pushed a commit that referenced this pull request Jun 17, 2026
…29 KB) (#9811)

## Disable `idna`'s ICU backend instead of removing `url`

`url` is pulled in **only** by rolldown crates, and it drags the entire IDNA/Unicode stack along:

```
url -> idna -> idna_adapter -> ICU4X
                               (icu_normalizer / icu_properties + *_data + zerovec / yoke / tinystr / ... — ~20 crates)
```

That ships **~129 KB** of Unicode tables in `librolldown_binding.node` purely for internationalized-domain-name handling (`münchen.de` → `xn--mnchen-3ya.de`) that rolldown never needs. `url` mandates `idna` for special-scheme hosts (incl. `file:`) and has **no feature** to turn it off.

### This PR — the least-invasive option

[`idna_adapter`](https://docs.rs/crate/idna_adapter) is the indirection layer that lets `idna` pick a Unicode backend. Its **1.0.0** release is the official *no-ICU* backend (zero dependencies); **1.1+** switched to ICU4X. Pinning it to `=1.0.0` makes `idna` resolve to the ASCII-only implementation and drops the entire ICU4X stack from the build — while **`url` and its WHATWG behaviour stay exactly as-is (zero new code)**.

- `Cargo.toml`: `idna_adapter = "=1.0.0"` in `[workspace.dependencies]`, referenced from `rolldown_plugin_vite_resolve` (the crate behind the `url` edge).
- The exact `=1.0.0` requirement makes any ICU-backed version **unresolvable**, so a blanket `cargo update` can't silently pull it back in — verified: `cargo update -p idna_adapter --precise 1.2.2` is rejected with `failed to select a version for the requirement idna_adapter = "=1.0.0"`.
- Excluded from `cargo-shear` (`[package.metadata.cargo-shear]`) since it's a build-only pin we never import.

### Trade-off

Per the [`idna_adapter` docs](https://docs.rs/crate/idna_adapter/latest#:~:text=Turning%20off%20IDNA%20support), the no-Unicode backend rejects non-ASCII domain inputs and skips UTS-46 enforcement on Punycode labels. rolldown only ever feeds `url` ASCII (file paths / sourcemap URLs), so this is a no-op in practice.

### Impact (darwin-arm64, fat-LTO release, same-machine A/B)

| build | `rolldown-binding.node` | saved |
|---|---|---|
| baseline (`url` + `idna` + ICU4X) | 23,409,856 | — |
| **this PR (no-ICU `idna_adapter`)** | **23,277,696** | **−132,160 bytes (−129 KB)** |

The `icu_*` / `zerovec` / `yoke` / `tinystr` / … crates are gone from the build (`idna` + ICU remain in `Cargo.lock` only behind the `jsonschema` **test** dependency). The −132,160-byte saving is byte-for-byte the same as the "keep `url`" row measured in #9794, confirming the official `idna_adapter 1.0.0` matches a vendored ASCII stub.

### Relationship to #9790 / #9794

Both #9790 (hand-rolled parser, −226 KB) and #9794 (`iri-string`, −210 KB) removed `url` entirely and explicitly flagged this as the safe, minimal alternative. This captures ~57% of the maximum saving with **no behaviour change and no URL-parsing code to own**, so it supersedes both — they're closed in favour of this.
@shulaoda shulaoda deleted the perf/replace-url-with-iri-string branch June 19, 2026 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant