perf(string_wizard): reduce allocations and add ASCII fast paths#8541
Conversation
How to use the Graphite Merge QueueAdd the label graphite: merge-when-ready to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
✅ Deploy Preview for rolldown-rs canceled.
|
Merge activity
|
There was a problem hiding this comment.
Pull request overview
This PR optimizes string_wizard hot paths by reducing iterator work and allocations, primarily around UTF-16 column/offset calculations for sourcemap generation and some string manipulation helpers.
Changes:
- Added ASCII fast paths for UTF-16 length computation in sourcemap-related code.
- Reduced allocations by pre-sizing
StringandFxHashMap, and switched tosort_unstable()where stability isn’t needed. - Optimized
MagicString::last_char()by usingchars().next_back().
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/string_wizard/src/source_map/sourcemap_builder.rs | Adds ASCII fast path when advancing generated column offsets (UTF-16). |
| crates/string_wizard/src/source_map/locator.rs | Adds ASCII fast path when building UTF-16-based line offsets. |
| crates/string_wizard/src/magic_string/source_map.rs | Uses sort_unstable(), preallocates FxHashMap, and adds ASCII fast path during UTF-16 index map precompute. |
| crates/string_wizard/src/magic_string/mod.rs | Switches last_char() from last() to next_back() for faster last-character lookup. |
| crates/string_wizard/src/magic_string/indent.rs | Preallocates indentation buffer capacity to reduce reallocations. |
Benchmarks Rust
|
### ### 1\. `last_char()` — `next_back()` vs `last()` **Before:** `.chars().last()` consumes the iterator front-to-back to find the last element — O(n) where n is the string length. **Why faster:** `.chars().next_back()` uses `DoubleEndedIterator` to read the last character directly from the end of the UTF-8 byte sequence — O(1). For a 10,000-character string, this avoids iterating 9,999 characters. ### 2\. `indent_frag()` — pre-allocated capacity **Before:** `String::new()` starts with zero capacity. As characters are pushed, the string reallocates and copies its buffer multiple times (typically ~log₂(n) reallocations). **Why faster:** `String::with_capacity(frag.len() + indentor.len())` allocates enough space upfront. The string grows without any reallocation, avoiding repeated memcpy of the growing buffer. ### 3\. `precompute_utf16_index_map()` — `sort_unstable()` + pre-allocated HashMap + ASCII fast path - **`sort_unstable()`:** Unlike `sort()`, `sort_unstable()` does not allocate temporary storage for merge sort. For `u32` values (no meaningful stability), this is purely free savings. - **Pre-allocated HashMap:** `FxHashMap::with_capacity_and_hasher(n, ...)` allocates the hash table at the right size upfront. Without this, the map starts small and rehashes (recomputes all hashes and copies all entries) as it grows. - **ASCII fast path:** `slice.is_ascii()` is a single SIMD-accelerated scan. When true, `slice.len()` gives the UTF-16 length directly (1 byte = 1 UTF-16 unit for ASCII). This skips the per-character `.chars().map(|c| c.len_utf16())` iteration. Since source code is predominantly ASCII, this fast path hits almost always. ### 4\. `Locator::new()` and `SourcemapBuilder::advance()` — ASCII fast path Same principle as above. Both methods compute UTF-16 lengths by iterating every character calling `len_utf16()`. For ASCII lines (the vast majority of source code), `line.is_ascii()` lets us use `line.len()` directly, replacing per-character iteration with a single length read.
c188c37 to
360667f
Compare
## [1.0.0-rc.7] - 2026-03-05 ⚡ Smarter Code Generation Defaults - DCE-only minification and smart constant inlining are now enabled by default - Produces cleaner, smaller output bundles without requiring explicit configuration 💡 LLM-Friendly Bundle Analyzer Reports - New markdown output format for the bundle analyzer plugin with bundle summaries, module graphs, dependency chains, and optimization suggestions - Optimization suggestions now also recommend using the entriesAware option when common chunks contain modules only reachable from specific entries ### 💥 BREAKING CHANGES - enable minify: 'dce-only' by default (#8465) by @IWANABETHATGUY - settings `inlineConst: { mode: 'smart', pass: 1}` by default (#8444) by @IWANABETHATGUY ### 🚀 Features - binding: add original getter to BindingMagicString (#8533) by @IWANABETHATGUY - native-magic-string: add `offset` property support (#8531) by @IWANABETHATGUY - add `output.strict` option to control `"use strict"` directive emission (#8489) by @Copilot - watch: expose `watcher.compareContentsForPolling` (#8526) by @hyf0 - watch: use new watcher to support watch mode (#8475) by @hyf0 - rust/watch: handle bulk-change (#8466) by @hyf0 - add LLM-friendly markdown output format to bundle analyzer plugin (#8242) by @IWANABETHATGUY ### 🐛 Bug Fixes - expose `plugins` on `NormalizedInputOptions` for `buildStart` hook (#8521) by @Copilot - only uppercase facade symbols in JSX preserve mode (#8519) by @IWANABETHATGUY - binding: export BindingResult in generated dts header (#8537) by @minsoo-web - pre-resolve paths option to avoid `invoke_sync` deadlock (#8518) by @IWANABETHATGUY - remove debug-only jsx_preset and UntranspiledSyntaxError (#8511) by @IWANABETHATGUY - apply `topLevelVar` to exported `const`/`let` declarations (#8507) by @IWANABETHATGUY - rolldown_plugin_vite_web_worker_post: avoid replacing `new.target` (#8488) by @sapphi-red - update copyright year to 2026 (#8486) by @maciekzygmunt ### 🚜 Refactor - rust: use Oxc's SymbolFlags::ConstVariable instead of custom IsConst flag (#8543) by @Dunqing - rust: remove FacadeScoping, use Scoping::create_symbol for facade symbols (#8540) by @Dunqing - rust/watch: remove hacky `reset_closed_for_watch_mode` (#8530) by @hyf0 - binding: return &str instead of String in filename() getter (#8534) by @IWANABETHATGUY - rust: remove old watch mode implementation (#8525) by @hyf0 - rust/watch: simply watch logic in the binding layer (#8516) by @hyf0 - rust/watch: tweak struct/function names (#8464) by @hyf0 ### 📚 Documentation - explain how external modules work in rolldown (#8457) by @sapphi-red - add some diagrams using graphviz (#8499) by @sapphi-red - use `vitepress-plugin-graphviz` (#8498) by @sapphi-red - list s390x/ppc64le prebuilt binaries (#8495) by @crusty-voidzero - fix error type for `RolldownBuild.generate` and others (#8490) by @sapphi-red ### ⚡ Performance - string_wizard: reduce allocations and add ASCII fast paths (#8541) by @IWANABETHATGUY - use IndexBitSet to replace IndexVec<XXXIdx, bool> for module/stmt inclusion tracking (#8503) by @IWANABETHATGUY - plugin: use IndexBitSet to optimize skipped plugins checking (#8497) by @ShroXd - rust/tla: skip compute_tla if there is no module use TLA (#8487) by @ShroXd ### 🧪 Testing - node/watch: make watch tests run in concurrent and retry-able (#8512) by @hyf0 - add test case for static flag tree-shaking (#8476) by @IWANABETHATGUY - migrate post-banner sourcemap-with-shebang to Rust (#8477) by @Copilot ### ⚙️ Miscellaneous Tasks - vscode: `formatOnSave` for markdown files using oxc formatter (#8536) by @minsoo-web - deps: update test262 submodule for tests (#8528) by @sapphi-red - remove `retry` workaround from output paths test fixtures (#8520) by @Copilot - docs: add Shuyuan Wang (h-a-n-a) and remove from acknowledgements (#8509) by @Copilot - consolidate top_level_var test cases using configVariants (#8508) by @IWANABETHATGUY - add s390x and ppc64le linux gnu targets (#8493) by @Brooooooklyn ###◀️ Revert - fix(rolldown): increase tokio blocking threads size for watch mode (#8517) by @hyf0 ### ❤️ New Contributors * @minsoo-web made their first contribution in [#8536](#8536) * @crusty-voidzero made their first contribution in [#8495](#8495) * @maciekzygmunt made their first contribution in [#8486](#8486) Co-authored-by: shulaoda <165626830+shulaoda@users.noreply.github.com>

1.
last_char()—next_back()vslast()Before:
.chars().last()consumes the iterator front-to-back to find the last element — O(n) where n is the string length.Why faster:
.chars().next_back()usesDoubleEndedIteratorto read the last character directly from the end of the UTF-8 byte sequence — O(1). For a 10,000-character string, this avoids iterating 9,999 characters.2.
indent_frag()— pre-allocated capacityBefore:
String::new()starts with zero capacity. As characters are pushed, the string reallocates and copies its buffer multiple times (typically ~log₂(n) reallocations).Why faster:
String::with_capacity(frag.len() + indentor.len())allocates enough space upfront. The string grows without any reallocation, avoiding repeated memcpy of the growing buffer.3.
precompute_utf16_index_map()—sort_unstable()+ pre-allocated HashMap + ASCII fast pathsort_unstable(): Unlikesort(),sort_unstable()does not allocate temporary storage for merge sort. Foru32values (no meaningful stability), this is purely free savings.FxHashMap::with_capacity_and_hasher(n, ...)allocates the hash table at the right size upfront. Without this, the map starts small and rehashes (recomputes all hashes and copies all entries) as it grows.slice.is_ascii()is a single SIMD-accelerated scan. When true,slice.len()gives the UTF-16 length directly (1 byte = 1 UTF-16 unit for ASCII). This skips the per-character.chars().map(|c| c.len_utf16())iteration. Since source code is predominantly ASCII, this fast path hits almost always.4.
Locator::new()andSourcemapBuilder::advance()— ASCII fast pathSame principle as above. Both methods compute UTF-16 lengths by iterating every character calling
len_utf16(). For ASCII lines (the vast majority of source code),line.is_ascii()lets us useline.len()directly, replacing per-character iteration with a single length read.