Skip to content

perf: remove the url crate (-226 KB)#9790

Closed
Boshen wants to merge 2 commits into
mainfrom
perf/remove-url-crate
Closed

perf: remove the url crate (-226 KB)#9790
Boshen wants to merge 2 commits into
mainfrom
perf/remove-url-crate

Conversation

@Boshen

@Boshen Boshen commented Jun 16, 2026

Copy link
Copy Markdown
Member

Do we really need the url crate?

This is a draft to ask exactly that. url is one of our heavier dependencies relative to what it actually does for us.

What it costs

url is pulled in only by rolldown crates, and it drags the entire IDNA/Unicode stack along:

url -> idna -> idna_adapter -> ICU4X
                               (icu_normalizer / icu_properties + *_data + zerovec / yoke / tinystr / ... — ~21 crates)

That is ~226 KB of the shipped librolldown_binding.node (__const Unicode tables + url/idna code), all for internationalized-domain-name handling (münchen.de -> xn--mnchen-3ya.de) that rolldown never needs. url mandates idna for special-scheme hosts (incl. file:) and has no feature to turn it off.

What we actually use url for (3 sites)

site usage
rolldown_plugin_vite_resolve/src/file_url.rs parse file:// URLs -> path (Vite's fileURLToPath)
rolldown_binding/.../normalize_binding_options.rs validate sourcemapBaseUrl + ensure trailing /
rolldown/.../process_code_and_sourcemap.rs join the map filename onto sourcemapBaseUrl

All three only ever see ASCII (file paths / sourcemap URLs), never IDN hosts.

This PR

Replaces those three sites with targeted parsing and drops url:

  • a hand file:// splitter (host / path / query / fragment), keeping the exact posix/windows fileURLToPath rules and the %2F / hostname rejections
  • normalize_sourcemap_base_url(): validate scheme://… and append a trailing /
  • the base is normalized to end in /, so the join becomes a plain append

11 new unit tests, all passing; matches the existing output/sourcemap-base-url fixture's expected //# sourceMappingURL=… output.

Impact (aarch64, LTO release, same worktree A/B)

build size
baseline (url + idna + ICU4X) 23,675,392
this PR (url removed) 23,443,904 — −226 KB

url / idna / idna_adapter / icu_* are gone from the binary (nm confirms 0 symbols; they remain in Cargo.lock only behind the jsonschema test dependency).

Open questions (why this is a draft)

  • Correctness trade-off: we'd now own URL parsing / joining / validation at these 3 sites instead of url's WHATWG-compliant implementation. Unit tests cover the common cases, but url's edge cases (path ./.. normalization, exotic percent-encoding) are no longer guaranteed. Acceptable for sourcemapBaseUrl, a public option that mirrors Rollup?
  • Not yet validated here: the JS integration fixtures (need a full napi build + vitest) and the Windows file:// path.
  • Less invasive alternative: keep url and only swap idna's ICU backend for a tiny ASCII-only idna_adapter via [patch.crates-io] — saves ~129 KB (57% of this) with zero change to url's behavior. Easy to switch to if we'd rather not bring URL handling in-house.

Opening as a draft to get a call on the size-vs-correctness trade before polishing.

`url` pulls in `idna` -> `idna_adapter` -> ICU4X purely for
internationalized-domain-name handling that rolldown never exercises (it
only parses `file://`, sourcemap and option URLs, all ASCII). Replace the
three `Url` call sites with targeted parsing and drop the dependency:

- file_url.rs: parse `file://` URLs by hand (host/path/query/fragment),
  keeping the existing fileURLToPath posix/windows logic and rejection
  rules. Adds unit tests.
- normalize_binding_options.rs: validate `sourcemapBaseUrl` is an absolute
  URL and append a trailing `/`, without `Url::set_path`. Adds unit tests.
- process_code_and_sourcemap.rs: the base is normalized to end with `/`, so
  joining the relative map filename is a plain append.

Removes `url` + `idna` + `idna_adapter` + the ICU4X stack from the shipped
binary (idna/ICU remain only for the jsonschema test-dependency).

Measured (aarch64 LTO release): librolldown_binding.dylib
23,675,392 -> 23,443,904 bytes (-226 KB).
@netlify

netlify Bot commented Jun 16, 2026

Copy link
Copy Markdown

Deploy Preview for rolldown-rs ready!

Name Link
🔨 Latest commit ba14c9e
🔍 Latest deploy log https://app.netlify.com/projects/rolldown-rs/deploys/6a30ed91a2f1a400086b7702
😎 Deploy Preview https://deploy-preview-9790--rolldown-rs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented Jun 16, 2026

Copy link
Copy Markdown

Deploy Preview for rolldown-rs ready!

Name Link
🔨 Latest commit 3f6441c
🔍 Latest deploy log https://app.netlify.com/projects/rolldown-rs/deploys/6a30edd218c0270008c6af4f
😎 Deploy Preview https://deploy-preview-9790--rolldown-rs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@Boshen

Boshen commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Superseded by #9811, which takes the least-invasive route both this PR and #9794 identified: keep url untouched and just disable idna's ICU backend by pinning idna_adapter to its no-ICU 1.0.0 release (−129 KB, no hand-rolled URL parser to own).

@Boshen Boshen closed this Jun 17, 2026
graphite-app Bot pushed a commit that referenced this pull request Jun 17, 2026
…29 KB) (#9811)

## Disable `idna`'s ICU backend instead of removing `url`

`url` is pulled in **only** by rolldown crates, and it drags the entire IDNA/Unicode stack along:

```
url -> idna -> idna_adapter -> ICU4X
                               (icu_normalizer / icu_properties + *_data + zerovec / yoke / tinystr / ... — ~20 crates)
```

That ships **~129 KB** of Unicode tables in `librolldown_binding.node` purely for internationalized-domain-name handling (`münchen.de` → `xn--mnchen-3ya.de`) that rolldown never needs. `url` mandates `idna` for special-scheme hosts (incl. `file:`) and has **no feature** to turn it off.

### This PR — the least-invasive option

[`idna_adapter`](https://docs.rs/crate/idna_adapter) is the indirection layer that lets `idna` pick a Unicode backend. Its **1.0.0** release is the official *no-ICU* backend (zero dependencies); **1.1+** switched to ICU4X. Pinning it to `=1.0.0` makes `idna` resolve to the ASCII-only implementation and drops the entire ICU4X stack from the build — while **`url` and its WHATWG behaviour stay exactly as-is (zero new code)**.

- `Cargo.toml`: `idna_adapter = "=1.0.0"` in `[workspace.dependencies]`, referenced from `rolldown_plugin_vite_resolve` (the crate behind the `url` edge).
- The exact `=1.0.0` requirement makes any ICU-backed version **unresolvable**, so a blanket `cargo update` can't silently pull it back in — verified: `cargo update -p idna_adapter --precise 1.2.2` is rejected with `failed to select a version for the requirement idna_adapter = "=1.0.0"`.
- Excluded from `cargo-shear` (`[package.metadata.cargo-shear]`) since it's a build-only pin we never import.

### Trade-off

Per the [`idna_adapter` docs](https://docs.rs/crate/idna_adapter/latest#:~:text=Turning%20off%20IDNA%20support), the no-Unicode backend rejects non-ASCII domain inputs and skips UTS-46 enforcement on Punycode labels. rolldown only ever feeds `url` ASCII (file paths / sourcemap URLs), so this is a no-op in practice.

### Impact (darwin-arm64, fat-LTO release, same-machine A/B)

| build | `rolldown-binding.node` | saved |
|---|---|---|
| baseline (`url` + `idna` + ICU4X) | 23,409,856 | — |
| **this PR (no-ICU `idna_adapter`)** | **23,277,696** | **−132,160 bytes (−129 KB)** |

The `icu_*` / `zerovec` / `yoke` / `tinystr` / … crates are gone from the build (`idna` + ICU remain in `Cargo.lock` only behind the `jsonschema` **test** dependency). The −132,160-byte saving is byte-for-byte the same as the "keep `url`" row measured in #9794, confirming the official `idna_adapter 1.0.0` matches a vendored ASCII stub.

### Relationship to #9790 / #9794

Both #9790 (hand-rolled parser, −226 KB) and #9794 (`iri-string`, −210 KB) removed `url` entirely and explicitly flagged this as the safe, minimal alternative. This captures ~57% of the maximum saving with **no behaviour change and no URL-parsing code to own**, so it supersedes both — they're closed in favour of this.
@shulaoda shulaoda deleted the perf/remove-url-crate branch June 19, 2026 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant