Skip to content

perf(utils): avoid allocation in default_sanitize_file_name for clean names#9928

Merged
graphite-app[bot] merged 1 commit into
mainfrom
perf/sanitize-file-name-cow
Jun 23, 2026
Merged

perf(utils): avoid allocation in default_sanitize_file_name for clean names#9928
graphite-app[bot] merged 1 commit into
mainfrom
perf/sanitize-file-name-cow

Conversation

@Boshen

@Boshen Boshen commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

default_sanitize_file_name (called once per output chunk/asset filename) unconditionally allocated String::with_capacity(str.len()) and rebuilt the name char-by-char — even though the common case is a filename that contains no invalid characters (index.js, react.production.min.js, any path without shell/NTFS-unsafe chars) and needs no rewriting at all. This is the same wasted-allocation pattern recently fixed in legitimize_identifier_name (#9926).

The function now returns Cow<str>:

  • Clean path (common): scan for the first invalid char; if none, return Cow::Borrowed(str) — zero allocation, zero copy.
  • Dirty path: allocate only when a replacement is actually needed, and bulk-copy the valid prefix (drive letter included) with push_str instead of one char at a time.

The scan is done over bytes, not chars. Every invalid character is ASCII (≤ 0x7F), and UTF-8 guarantees that no byte of a multi-byte character is < 0x80, so a byte scan finds exactly the same positions as a char scan without per-char UTF-8 decoding, and every match lands on a char boundary. The dirty-path rewrite uses u8::try_from(char) rather than char as u8 so a non-ASCII char (e.g. 😀, whose low byte is 0x00) is never truncated into a false match.

Both call sites in rolldown_common already do .into() into ArcStr (which implements From<Cow<str>>), so they are unchanged.

Measured impact

Measured locally with a Criterion microbench (not committed), original String version vs this change:

input before after change
clean_short (index.js) 21.4 ns 6.7 ns −69%
clean_long (78-char ASCII path) 122 ns 46 ns −63%
clean_unicode (30-char Cyrillic path) 85 ns 32 ns −63%
dirty (needs rewriting) 60 ns 49 ns −18%

All changes statistically significant (p < 0.05). The byte scan is what unlocks the gains on longer and non-ASCII paths (no per-char decoding).

Correctness

Output is byte-for-byte identical to the previous implementation, including Windows drive-letter semantics (C:/foo.js preserved, later : still replaced: C:/a:b.jsC:/a_b.js). All rolldown_utils tests pass.

  • test_sanitize_file_name — the borrowed/owned split, empty string, and the Windows-drive paths.
  • test_sanitize_unicode — clean multi-byte names (2-, 3-, and 4-byte sequences: café.js, 日本語.js, компоненты/Кнопка.js, emoji_😀.js) returned borrowed, and multi-byte chars surviving the rewrite path verbatim (a?éa_é, a?😀a_😀, 日本?語日本_語, café:dir.jscafé_dir.js).

🤖 Generated with Claude Code

@netlify

netlify Bot commented Jun 22, 2026

Copy link
Copy Markdown

Deploy Preview for rolldown-rs ready!

Name Link
🔨 Latest commit cffb2b2
🔍 Latest deploy log https://app.netlify.com/projects/rolldown-rs/deploys/6a3a4934ae40e300087636b7
😎 Deploy Preview https://deploy-preview-9928--rolldown-rs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@Boshen Boshen force-pushed the perf/sanitize-file-name-cow branch 3 times, most recently from d407fa2 to c4bd423 Compare June 23, 2026 01:17
@Boshen Boshen marked this pull request as ready for review June 23, 2026 02:54
@shulaoda shulaoda requested a review from Copilot June 23, 2026 02:56

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes rolldown_utils’s filename sanitization hot path by avoiding an unconditional allocation/copy for already-valid output filenames, which are the common case during chunk/asset emission.

Changes:

  • Change default_sanitize_file_name to return Cow<'_, str> and early-return Cow::Borrowed when no invalid characters are present.
  • Replace the previous per-char rebuild with a fast byte scan to find the first invalid ASCII byte, then bulk-copy the valid prefix and rewrite only the remainder.
  • Add targeted tests covering the borrowed/owned split, Windows drive-letter : semantics, and Unicode (multi-byte UTF-8) correctness on both clean and rewrite paths.

@codspeed-hq

codspeed-hq Bot commented Jun 23, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 7 untouched benchmarks
⏩ 10 skipped benchmarks1


Comparing perf/sanitize-file-name-cow (ff13c26) with main (9f960eb)

Open in CodSpeed

Footnotes

  1. 10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Boshen commented Jun 23, 2026

Copy link
Copy Markdown
Member Author

Merge activity

  • Jun 23, 8:51 AM UTC: The merge label 'graphite: merge-when-ready' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Jun 23, 8:51 AM UTC: Boshen added this pull request to the Graphite merge queue.
  • Jun 23, 8:56 AM UTC: Merged by the Graphite merge queue.

… names (#9928)

## Summary

`default_sanitize_file_name` (called once per output chunk/asset filename) unconditionally allocated `String::with_capacity(str.len())` and rebuilt the name char-by-char — even though the common case is a filename that contains **no** invalid characters (`index.js`, `react.production.min.js`, any path without shell/NTFS-unsafe chars) and needs no rewriting at all. This is the same wasted-allocation pattern recently fixed in `legitimize_identifier_name` (#9926).

The function now returns `Cow<str>`:
- **Clean path** (common): scan for the first invalid char; if none, return `Cow::Borrowed(str)` — zero allocation, zero copy.
- **Dirty path**: allocate only when a replacement is actually needed, and bulk-copy the valid prefix (drive letter included) with `push_str` instead of one `char` at a time.

The scan is done over **bytes, not chars**. Every invalid character is ASCII (≤ 0x7F), and UTF-8 guarantees that no byte of a multi-byte character is < 0x80, so a byte scan finds exactly the same positions as a char scan without per-char UTF-8 decoding, and every match lands on a char boundary. The dirty-path rewrite uses `u8::try_from(char)` rather than `char as u8` so a non-ASCII char (e.g. `😀`, whose low byte is `0x00`) is never truncated into a false match.

Both call sites in `rolldown_common` already do `.into()` into `ArcStr` (which implements `From<Cow<str>>`), so they are unchanged.

## Measured impact

Measured locally with a Criterion microbench (not committed), original `String` version vs this change:

| input | before | after | change |
|---|---|---|---|
| `clean_short` (`index.js`) | 21.4 ns | 6.7 ns | **−69%** |
| `clean_long` (78-char ASCII path) | 122 ns | 46 ns | **−63%** |
| `clean_unicode` (30-char Cyrillic path) | 85 ns | 32 ns | **−63%** |
| `dirty` (needs rewriting) | 60 ns | 49 ns | **−18%** |

All changes statistically significant (p < 0.05). The byte scan is what unlocks the gains on longer and non-ASCII paths (no per-char decoding).

## Correctness

Output is byte-for-byte identical to the previous implementation, including Windows drive-letter semantics (`C:/foo.js` preserved, later `:` still replaced: `C:/a:b.js` → `C:/a_b.js`). All `rolldown_utils` tests pass.

- `test_sanitize_file_name` — the borrowed/owned split, empty string, and the Windows-drive paths.
- `test_sanitize_unicode` — clean multi-byte names (2-, 3-, and 4-byte sequences: `café.js`, `日本語.js`, `компоненты/Кнопка.js`, `emoji_😀.js`) returned borrowed, and multi-byte chars surviving the rewrite path verbatim (`a?é` → `a_é`, `a?😀` → `a_😀`, `日本?語` → `日本_語`, `café:dir.js` → `café_dir.js`).

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@graphite-app graphite-app Bot force-pushed the perf/sanitize-file-name-cow branch from ff13c26 to cffb2b2 Compare June 23, 2026 08:52
@graphite-app graphite-app Bot merged commit cffb2b2 into main Jun 23, 2026
34 checks passed
@graphite-app graphite-app Bot deleted the perf/sanitize-file-name-cow branch June 23, 2026 08:56
@rolldown-guard rolldown-guard Bot mentioned this pull request Jun 24, 2026
shulaoda added a commit that referenced this pull request Jun 24, 2026
## [1.1.3] - 2026-06-24

### 🐛 Bug Fixes

- `defer_drop` crashes the browser main thread (#9942) by @shulaoda
- camel-case: correct camel case for nested values (#9933) by @kb019
- cli: display --help options in camelCase (#9941) by @IWANABETHATGUY
- preserve used re-exports under preserveModules (#9122) (#9934) by @IWANABETHATGUY
- watch: make close reentrant in event callbacks (#9904) by @hyf0
- git for windows treats symlink files as regular files (#9915) by @AliceLanniste
- dev: cancel pending full reload on build error (#9903) by @h-a-n-a
- chunking: pass plugin meta to codeSplitting groups name function (#9267) by @Kyujenius
- dev: serve assets emitted during HMR/lazy compile (vite#22596) (#9815) by @h-a-n-a
- release: dry-run step no longer publishes binding packages (#9866) by @Boshen

### 🚜 Refactor

- rolldown_common: model ModuleId as a classified Path/Virtual/Bare enum (#9927) by @Boshen
- remove unused LegacyModuleIdx (#9872) by @shulaoda
- remove unused StmtInfos::get_namespace_stmt_info (#9870) by @shulaoda
- remove unused Module::as_external_mut (#9871) by @shulaoda
- remove unused EcmaAst::is_body_empty (#9869) by @shulaoda
- drop dead is_css_module handling in resolve_dependencies (#9867) by @shulaoda
- drop redundant with_commonjs on cjs source type (#9868) by @shulaoda

### 📚 Documentation

- clarify on drafting PRs (#9952) by @h-a-n-a
- update contribution guidelines (#9944) by @fubhy
- note Rust crates don't follow semver in AGENTS.md (#9905) by @IWANABETHATGUY
- add feedback form (#9159) by @TheAlexLichter

### ⚡ Performance

- utils: avoid allocation in default_sanitize_file_name for clean names (#9928) by @Boshen
- binding: box once-per-build futures before spawn_future (#9864) by @Boshen
- utils: avoid wasted allocation in legitimize_identifier_name (#9926) by @Boshen
- rolldown: fuse the canonical-name dedup and insert in the renamer (#9900) by @Boshen
- rolldown: probe the name map once in ConflictResolver::resolve (#9899) by @Boshen
- cut two heap allocations from wrapped ESM init finalize (#9901) by @Boshen
- rolldown_plugin_vite_reporter: hoist invariant out_dir prefix out of reporter loop (#9873) by @shulaoda
- drop throwaway Vec in wrapped esm init stmt (#9878) by @shulaoda
- borrow owner_filename in build-import-analysis AddDeps (#9874) by @shulaoda

### 🧪 Testing

- cover preserveModules named export via namespace re-export (#6010) (#9937) by @IWANABETHATGUY

### ⚙️ Miscellaneous Tasks

- deps: update napi to v3.9.4 (#9954) by @shulaoda
- reduce noise from CODEOWNERS for trival changes (#9953) by @h-a-n-a
- deps: update mimalloc-safe to 0.1.64 (#9950) by @shulaoda
- deps: update rollup submodule for tests to v4.62.2 (#9931) by @rolldown-guard[bot]
- deps: test mimalloc-safe upstream-mimalloc switch in CI (#9930) by @shulaoda
- rolldown_plugin_vite_build_import_analysis: remove unused v2 code path (#9917) by @shulaoda
- rolldown_plugin_vite_manifest: remove unused is_enable_v2 code path (#9916) by @shulaoda
- rolldown_plugin_vite_asset_import_meta_url: remove unexposed native vite plugin (#9896) by @shulaoda
- rolldown_plugin_vite_asset: remove unexposed native vite plugin (#9895) by @shulaoda
- rolldown_plugin_vite_css_post: remove unexposed native vite plugin (#9894) by @shulaoda
- rolldown_plugin_vite_css: remove unexposed native vite plugin (#9893) by @shulaoda
- rolldown_plugin_vite_html_inline_proxy: remove unexposed native vite plugin (#9892) by @shulaoda
- rolldown_plugin_vite_html: remove unexposed native vite plugin (#9891) by @shulaoda
- deps: update github actions (#9909) by @renovate[bot]
- deps: update rust crate oxc_sourcemap to v8.0.2 (#9910) by @renovate[bot]
- deps: update npm packages (#9912) by @renovate[bot]
- deps: update github actions to v7 (#9913) by @renovate[bot]
- deps: update rolldown-plugin-dts to ^0.26.0 (#9897) by @renovate[bot]
- remove rolldown_filter_analyzer crate (#9865) by @Boshen

### ❤️ New Contributors

* @fubhy made their first contribution in [#9944](#9944)

Co-authored-by: shulaoda <165626830+shulaoda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants