Skip to content

perf(string_wizard): reduce allocations and add ASCII fast paths#8541

Merged
IWANABETHATGUY merged 1 commit intomainfrom
03-04-perf_string_wizard_reduce_allocations_and_add_ascii_fast_paths
Mar 4, 2026
Merged

perf(string_wizard): reduce allocations and add ASCII fast paths#8541
IWANABETHATGUY merged 1 commit intomainfrom
03-04-perf_string_wizard_reduce_allocations_and_add_ascii_fast_paths

Conversation

@IWANABETHATGUY
Copy link
Member

@IWANABETHATGUY IWANABETHATGUY commented Mar 4, 2026

1. last_char()next_back() vs last()

Before: .chars().last() consumes the iterator front-to-back to find the last element — O(n) where n is the string length.
Why faster: .chars().next_back() uses DoubleEndedIterator to read the last character directly from the end of the UTF-8 byte sequence — O(1). For a 10,000-character string, this avoids iterating 9,999 characters.

2. indent_frag() — pre-allocated capacity

Before: String::new() starts with zero capacity. As characters are pushed, the string reallocates and copies its buffer multiple times (typically ~log₂(n) reallocations).
Why faster: String::with_capacity(frag.len() + indentor.len()) allocates enough space upfront. The string grows without any reallocation, avoiding repeated memcpy of the growing buffer.

3. precompute_utf16_index_map()sort_unstable() + pre-allocated HashMap + ASCII fast path

  • sort_unstable(): Unlike sort(), sort_unstable() does not allocate temporary storage for merge sort. For u32 values (no meaningful stability), this is purely free savings.
  • Pre-allocated HashMap: FxHashMap::with_capacity_and_hasher(n, ...) allocates the hash table at the right size upfront. Without this, the map starts small and rehashes (recomputes all hashes and copies all entries) as it grows.
  • ASCII fast path: slice.is_ascii() is a single SIMD-accelerated scan. When true, slice.len() gives the UTF-16 length directly (1 byte = 1 UTF-16 unit for ASCII). This skips the per-character .chars().map(|c| c.len_utf16()) iteration. Since source code is predominantly ASCII, this fast path hits almost always.

4. Locator::new() and SourcemapBuilder::advance() — ASCII fast path

Same principle as above. Both methods compute UTF-16 lengths by iterating every character calling len_utf16(). For ASCII lines (the vast majority of source code), line.is_ascii() lets us use line.len() directly, replacing per-character iteration with a single length read.

Copy link
Member Author


How to use the Graphite Merge Queue

Add the label graphite: merge-when-ready to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@netlify
Copy link

netlify bot commented Mar 4, 2026

Deploy Preview for rolldown-rs canceled.

Name Link
🔨 Latest commit 360667f
🔍 Latest deploy log https://app.netlify.com/projects/rolldown-rs/deploys/69a7db0b62d5bb00085168c4

@IWANABETHATGUY IWANABETHATGUY marked this pull request as ready for review March 4, 2026 06:22
Copilot AI review requested due to automatic review settings March 4, 2026 06:22
Copy link
Member Author

IWANABETHATGUY commented Mar 4, 2026

Merge activity

  • Mar 4, 6:22 AM UTC: The merge label 'graphite: merge-when-ready' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Mar 4, 7:10 AM UTC: IWANABETHATGUY added this pull request to the Graphite merge queue.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes string_wizard hot paths by reducing iterator work and allocations, primarily around UTF-16 column/offset calculations for sourcemap generation and some string manipulation helpers.

Changes:

  • Added ASCII fast paths for UTF-16 length computation in sourcemap-related code.
  • Reduced allocations by pre-sizing String and FxHashMap, and switched to sort_unstable() where stability isn’t needed.
  • Optimized MagicString::last_char() by using chars().next_back().

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/string_wizard/src/source_map/sourcemap_builder.rs Adds ASCII fast path when advancing generated column offsets (UTF-16).
crates/string_wizard/src/source_map/locator.rs Adds ASCII fast path when building UTF-16-based line offsets.
crates/string_wizard/src/magic_string/source_map.rs Uses sort_unstable(), preallocates FxHashMap, and adds ASCII fast path during UTF-16 index map precompute.
crates/string_wizard/src/magic_string/mod.rs Switches last_char() from last() to next_back() for faster last-character lookup.
crates/string_wizard/src/magic_string/indent.rs Preallocates indentation buffer capacity to reduce reallocations.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 4, 2026

Benchmarks Rust

  • target: main(22c9344)
  • pr: 03-04-perf_string_wizard_reduce_allocations_and_add_ascii_fast_paths(c188c37)
group                                                        pr                                     target
-----                                                        --                                     ------
bundle/bundle@multi-duplicated-top-level-symbol              1.00     74.0±2.61ms        ? ?/sec    1.07     79.2±1.75ms        ? ?/sec
bundle/bundle@multi-duplicated-top-level-symbol-sourcemap    1.00     80.5±2.12ms        ? ?/sec    1.10     88.3±1.85ms        ? ?/sec
bundle/bundle@rome_ts                                        1.00    156.7±4.50ms        ? ?/sec    1.10    172.1±7.44ms        ? ?/sec
bundle/bundle@rome_ts-sourcemap                              1.00    172.1±4.98ms        ? ?/sec    1.11    191.5±4.42ms        ? ?/sec
bundle/bundle@threejs                                        1.00     70.2±3.26ms        ? ?/sec    1.08     76.1±2.07ms        ? ?/sec
bundle/bundle@threejs-sourcemap                              1.00     78.0±2.49ms        ? ?/sec    1.08     84.3±2.12ms        ? ?/sec
bundle/bundle@threejs10x                                     1.00   737.0±10.90ms        ? ?/sec    1.07   789.8±24.21ms        ? ?/sec
bundle/bundle@threejs10x-sourcemap                           1.00   867.9±18.67ms        ? ?/sec    1.06   921.6±15.15ms        ? ?/sec
scan/scan@rome_ts                                            1.04     81.6±1.98ms        ? ?/sec    1.00     78.5±1.51ms        ? ?/sec
scan/scan@threejs                                            1.00     28.0±0.75ms        ? ?/sec    1.01     28.3±1.67ms        ? ?/sec
scan/scan@threejs10x                                         1.02    275.8±4.25ms        ? ?/sec    1.00    271.5±4.99ms        ? ?/sec

###

### 1\. `last_char()` — `next_back()` vs `last()`

**Before:** `.chars().last()` consumes the iterator front-to-back to find the last element — O(n) where n is the string length.
**Why faster:** `.chars().next_back()` uses `DoubleEndedIterator` to read the last character directly from the end of the UTF-8 byte sequence — O(1). For a 10,000-character string, this avoids iterating 9,999 characters.

### 2\. `indent_frag()` — pre-allocated capacity

**Before:** `String::new()` starts with zero capacity. As characters are pushed, the string reallocates and copies its buffer multiple times (typically ~log₂(n) reallocations).
**Why faster:** `String::with_capacity(frag.len() + indentor.len())` allocates enough space upfront. The string grows without any reallocation, avoiding repeated memcpy of the growing buffer.

### 3\. `precompute_utf16_index_map()` — `sort_unstable()` + pre-allocated HashMap + ASCII fast path

- **`sort_unstable()`:** Unlike `sort()`, `sort_unstable()` does not allocate temporary storage for merge sort. For `u32` values (no meaningful stability), this is purely free savings.
- **Pre-allocated HashMap:** `FxHashMap::with_capacity_and_hasher(n, ...)` allocates the hash table at the right size upfront. Without this, the map starts small and rehashes (recomputes all hashes and copies all entries) as it grows.
- **ASCII fast path:** `slice.is_ascii()` is a single SIMD-accelerated scan. When true, `slice.len()` gives the UTF-16 length directly (1 byte = 1 UTF-16 unit for ASCII). This skips the per-character `.chars().map(|c| c.len_utf16())` iteration. Since source code is predominantly ASCII, this fast path hits almost always.

### 4\. `Locator::new()` and `SourcemapBuilder::advance()` — ASCII fast path

Same principle as above. Both methods compute UTF-16 lengths by iterating every character calling `len_utf16()`. For ASCII lines (the vast majority of source code), `line.is_ascii()` lets us use `line.len()` directly, replacing per-character iteration with a single length read.
@graphite-app graphite-app bot force-pushed the 03-04-perf_string_wizard_reduce_allocations_and_add_ascii_fast_paths branch from c188c37 to 360667f Compare March 4, 2026 07:11
@IWANABETHATGUY IWANABETHATGUY merged commit b47a18a into main Mar 4, 2026
34 checks passed
@IWANABETHATGUY IWANABETHATGUY deleted the 03-04-perf_string_wizard_reduce_allocations_and_add_ascii_fast_paths branch March 4, 2026 07:20
@github-actions github-actions bot mentioned this pull request Mar 5, 2026
shulaoda added a commit that referenced this pull request Mar 5, 2026
## [1.0.0-rc.7] - 2026-03-05

⚡ Smarter Code Generation Defaults
- DCE-only minification and smart constant inlining are now enabled by default
- Produces cleaner, smaller output bundles without requiring explicit configuration

💡 LLM-Friendly Bundle Analyzer Reports
- New markdown output format for the bundle analyzer plugin with bundle summaries, module graphs, dependency chains, and optimization suggestions
- Optimization suggestions now also recommend using the entriesAware option when common chunks contain modules only reachable from specific entries


### 💥 BREAKING CHANGES

- enable minify: 'dce-only' by default (#8465) by @IWANABETHATGUY
- settings `inlineConst: { mode: 'smart', pass: 1}`  by default (#8444) by @IWANABETHATGUY

### 🚀 Features

- binding: add original getter to BindingMagicString (#8533) by @IWANABETHATGUY
- native-magic-string: add `offset` property support (#8531) by @IWANABETHATGUY
- add `output.strict` option to control `"use strict"` directive emission (#8489) by @Copilot
- watch: expose `watcher.compareContentsForPolling` (#8526) by @hyf0
- watch: use new watcher to support watch mode (#8475) by @hyf0
- rust/watch: handle bulk-change (#8466) by @hyf0
- add LLM-friendly markdown output format to bundle analyzer plugin (#8242) by @IWANABETHATGUY

### 🐛 Bug Fixes

- expose `plugins` on `NormalizedInputOptions` for `buildStart` hook (#8521) by @Copilot
- only uppercase facade symbols in JSX preserve mode (#8519) by @IWANABETHATGUY
- binding: export BindingResult in generated dts header (#8537) by @minsoo-web
- pre-resolve paths option to avoid `invoke_sync` deadlock (#8518) by @IWANABETHATGUY
- remove debug-only jsx_preset and UntranspiledSyntaxError (#8511) by @IWANABETHATGUY
- apply `topLevelVar` to exported `const`/`let` declarations (#8507) by @IWANABETHATGUY
- rolldown_plugin_vite_web_worker_post: avoid replacing `new.target` (#8488) by @sapphi-red
- update copyright year to 2026 (#8486) by @maciekzygmunt

### 🚜 Refactor

- rust: use Oxc's SymbolFlags::ConstVariable instead of custom IsConst flag (#8543) by @Dunqing
- rust: remove FacadeScoping, use Scoping::create_symbol for facade symbols (#8540) by @Dunqing
- rust/watch: remove hacky `reset_closed_for_watch_mode` (#8530) by @hyf0
- binding: return &str instead of String in filename() getter (#8534) by @IWANABETHATGUY
- rust: remove old watch mode implementation (#8525) by @hyf0
- rust/watch: simply watch logic in the binding layer (#8516) by @hyf0
- rust/watch: tweak struct/function names (#8464) by @hyf0

### 📚 Documentation

- explain how external modules work in rolldown (#8457) by @sapphi-red
- add some diagrams using graphviz (#8499) by @sapphi-red
- use `vitepress-plugin-graphviz` (#8498) by @sapphi-red
- list s390x/ppc64le prebuilt binaries (#8495) by @crusty-voidzero
- fix error type for `RolldownBuild.generate` and others (#8490) by @sapphi-red

### ⚡ Performance

- string_wizard: reduce allocations and add ASCII fast paths (#8541) by @IWANABETHATGUY
- use IndexBitSet to replace IndexVec<XXXIdx, bool> for module/stmt inclusion tracking (#8503) by @IWANABETHATGUY
- plugin: use IndexBitSet to optimize skipped plugins checking (#8497) by @ShroXd
- rust/tla: skip compute_tla if there is no module use TLA (#8487) by @ShroXd

### 🧪 Testing

- node/watch: make watch tests run in concurrent and retry-able (#8512) by @hyf0
- add test case for static flag tree-shaking (#8476) by @IWANABETHATGUY
- migrate post-banner sourcemap-with-shebang to Rust (#8477) by @Copilot

### ⚙️ Miscellaneous Tasks

- vscode: `formatOnSave` for markdown files using oxc formatter (#8536) by @minsoo-web
- deps: update test262 submodule for tests (#8528) by @sapphi-red
- remove `retry` workaround from output paths test fixtures (#8520) by @Copilot
- docs: add Shuyuan Wang (h-a-n-a) and remove from acknowledgements (#8509) by @Copilot
- consolidate top_level_var test cases using configVariants (#8508) by @IWANABETHATGUY
- add s390x and ppc64le linux gnu targets (#8493) by @Brooooooklyn

### ◀️ Revert

- fix(rolldown): increase tokio blocking threads size for watch mode (#8517) by @hyf0

### ❤️ New Contributors

* @minsoo-web made their first contribution in [#8536](#8536)
* @crusty-voidzero made their first contribution in [#8495](#8495)
* @maciekzygmunt made their first contribution in [#8486](#8486)

Co-authored-by: shulaoda <165626830+shulaoda@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants