feat(mcs): maxSize will split the oversized chunk with taking file relevance into account#8277
Conversation
How to use the Graphite Merge QueueAdd the label graphite: merge-when-ready to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Pull request overview
Updates rolldown’s manual code splitting behavior so maxSize-driven splitting considers module “relevance” (via stable module IDs), and extends the integration fixtures/snapshots to cover the new splitting behavior (including a new similarity-based maxSize test case).
Changes:
- Adjust manual code splitting to split oversized groups using stable-id ordering plus a relevance-based split-point heuristic.
- Add a new
similarity_max_sizefixture asserting vendor-like clustering behavior undermaxSize. - Update affected artifact and filename-hash snapshots for the new chunk outputs.
Reviewed changes
Copilot reviewed 8 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/rolldown/src/stages/generate_stage/manual_code_splitting.rs | Reworks maxSize splitting to choose a split index based on stable-id “relevance” and adds a unit test for the split picker. |
| crates/rolldown/tests/snapshots/integration_rolldown__filename_with_hash.snap | Updates expected hashed filenames and adds the new similarity_max_size case outputs. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/_config.json | New fixture config to exercise relevance-aware splitting with maxSize. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/main.js | New entry importing app + vendor modules to form a realistic mixed group. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/src/app-shell.js | New fixture module. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/node_modules/react/index.js | New fixture vendor module. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/node_modules/react/jsx-runtime.js | New fixture vendor module. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/node_modules/date-fns/format.js | New fixture vendor module. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/node_modules/date-fns/parseISO.js | New fixture vendor module. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/similarity_max_size/artifacts.snap | New expected artifact snapshot for the similarity-based fixture. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/max_size/artifacts.snap | Updates expected artifacts to match the new splitting behavior. |
| crates/rolldown/tests/rolldown/function/advanced_chunks/max_size3/artifacts.snap | Updates expected artifacts to match the new splitting behavior. |
crates/rolldown/src/stages/generate_stage/manual_code_splitting.rs
Outdated
Show resolved
Hide resolved
Benchmarks Rust
|
6985dfb to
7e2a4b1
Compare
6cc0de7 to
4c77fe1
Compare
7e2a4b1 to
3a5eaf4
Compare
✅ Deploy Preview for rolldown-rs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
3a5eaf4 to
d826652
Compare
d826652 to
ae375f4
Compare
ae375f4 to
e67b616
Compare
crates/rolldown/tests/rolldown/function/advanced_chunks/max_size/artifacts.snap
Show resolved
Hide resolved
e67b616 to
b1a13f7
Compare
b1a13f7 to
30def6a
Compare
30def6a to
f3de923
Compare
135774c to
e7ba08d
Compare
Merge activity
|
A prior fix for issues found in #8277 (comment).
f3de923 to
fe3123f
Compare
e7ba08d to
b942949
Compare
fe3123f to
6da21bb
Compare
6da21bb to
32f7ee2
Compare
…relevance into account (#8277) ## Summary When a manual chunk splitting (MCS) group exceeds `maxSize`, the old greedy loop split at the first position where the left half met `minSize`. This was **position-dependent and unaware of file relevance** — related files (e.g. all of `date-fns/format*`) could be torn apart simply because they happened to straddle the size boundary. This PR replaces that logic with a **similarity-based split point selection** algorithm. ## How it works ### 1. Sort by `stable_id`, then pick the best split point Modules in the oversized group are sorted lexicographically by `stable_id` (file path). The algorithm then scans every candidate split index that satisfies `minSize` on both sides, and scores the boundary between `modules[i-1]` and `modules[i]` using `stable_id_similarity`. ### 2. `stable_id_similarity` — character-level path similarity ```rust fn stable_id_similarity(lhs: &str, rhs: &str) -> i32 { lhs.as_bytes().iter().zip(rhs.as_bytes()).fold(0, |acc, (lhs_char, rhs_char)| { acc + (10 - (i32::from(*lhs_char) - i32::from(*rhs_char)).abs()).max(0) }) } ``` Each aligned byte pair contributes 0–10 points (10 = identical). Paths sharing a long common prefix (same directory) accumulate a high score; paths diverging early score low. **Lower similarity = better split point** (the boundary between unrelated files). ### 3. Noise reduction: `SIMILARITY_SIGNIFICANCE_THRESHOLD` (= 10) Digit-level ASCII differences (e.g. `util1.js` vs `util2.js`) produce tiny similarity fluctuations. The threshold of 10 (one character position's max score) absorbs this noise: two candidates whose similarity differs by ≤ 10 are treated as a **tie**, and the tie-breaker criteria decide instead. ### 4. 3-tier comparison for picking the best split ``` if |best_similarity − similarity| > THRESHOLD → prefer lower similarity else if oversized_side_count differs → prefer fewer oversized halves else → prefer smaller max_side_size ``` - **Tier 1 — Similarity**: Split between the least-related files (e.g. between `react/*` and `date-fns/*`). - **Tier 2 — Oversized side count**: Among ties, prefer a split that keeps both halves under `maxSize`. - **Tier 3 — Max side size**: Among remaining ties, prefer the most balanced split. ### 5. Recursive splitting If either half is still oversized, it re-enters the same loop and splits again — so the algorithm naturally recurses until all chunks satisfy `maxSize` (or no valid split exists). ## Example Given modules sorted by `stable_id`: ``` date-fns/addDays.js date-fns/format.js date-fns/parse.js ← high similarity across these react/Component.js react/hooks.js ← high similarity across these lodash/merge.js ``` The algorithm picks the boundary with the **lowest** similarity (e.g. between `date-fns/parse.js` and `react/Component.js`, or between `react/hooks.js` and `lodash/merge.js`), keeping related library files together in the same chunk.
32f7ee2 to
6b7b0ef
Compare
## [1.0.0-rc.5] - 2026-02-18
💡 Smarter `entriesAware` Manual Code Splitting
New `entriesAware` and `entriesAwareMergeThreshold` options for `manualCodeSplitting.groups[]` enable
entry-reachability-based chunk splitting with automatic small chunk merging.
- `entriesAware: true` splits matched modules by entry reachability — modules reached by the same set of entries are grouped together, providing the most precise loading behavior with less over-fetching
- Chunks now get more readable names reflecting their entry associations (e.g. `vendor-entry-a-entry-b.js`) instead of opaque hashes
- `entriesAwareMergeThreshold` sets a byte-size threshold to merge tiny subgroups into the closest sibling with the fewest extra entries, reducing micro-chunk fragmentation while preserving precision
- Recommended to use together with `maxSize`: merge tiny chunks first to reduce request overhead, then cap large chunks to control payload size
```js
manualCodeSplitting: {
groups: [{
name: 'vendor',
test: /node_modules/,
entriesAware: true,
entriesAwareMergeThreshold: 28000, // bytes
}]
}
```
### 🚀 Features
- add `Visitor` to `rolldown/utils` (#8373) by @sapphi-red
- module-info: add `inputFormat` property to `ModuleInfo` (#8329) by @shulaoda
- default `treeshake.invalid_import_side_effects` to `false` (#8357) by @sapphi-red
- rolldown_utils: add `IndexBitSet` (#8343) by @sapphi-red
- rolldown_utils: add more methods and trait impls to BitSet (#8342) by @sapphi-red
- rolldown_plugin_vite_build_import_analysis: add support for `await import().then((m) => m.prop)` (#8328) by @sapphi-red
- rolldown_plugin_vite_reporter: support custom logger for build infos (#7652) by @shulaoda
- rust/mcs: support `entriesAwareMergeThreshold` (#8312) by @hyf0
- mcs: `maxSize` will split the oversized chunk with taking file relevance into account (#8277) by @hyf0
- rolldown_plugin_vite_import_glob: support template literal in glob import patterns (#8298) by @shulaoda
- rolldown_plugin_chunk_import_map: output importmap without spaces (#8297) by @sapphi-red
- add INEFFECTIVE_DYNAMIC_IMPORT warning in core (#8284) by @shulaoda
- mcs: generate more readable name for `entriesAware` chunks (#8275) by @hyf0
- mcs: support `entriesAware` (#8274) by @hyf0
### 🐛 Bug Fixes
- improve circular dependency detection in chunk optimizer (#8371) by @IWANABETHATGUY
- align `minify.compress: true` and `minify.mangle: true` with `minify: true` (#8367) by @sapphi-red
- rolldown_plugin_esm_external_require: apply conversion to UMD and IIFE outputs (#8359) by @sapphi-red
- cjs: bailout treeshaking on cjs modules that have multiple re-exports (#8348) by @hyf0
- handle member expression and this expression in JSX element name rewriting (#8323) by @IWANABETHATGUY
- pad `encode_hash_with_base` output to fixed length to prevent slice panics (#8320) by @shulaoda
- `xxhash_with_base` skips hashing when input is exactly 16 bytes (#8319) by @shulaoda
- complete `ImportKind::try_from` with missing variants and correct `url-import` to `url-token` (#8310) by @shulaoda
- mark Node.js builtin modules as side-effect-free when resolved via `external` config (#8304) by @IWANABETHATGUY
- mcs: `maxSize` should split chunks correctly based on sizes (#8289) by @hyf0
### 🚜 Refactor
- introduce `RawMangleOptions` and `RawCompressOptions` (#8366) by @sapphi-red
- mcs: refactor `apply_manual_code_splitting` into `ManualSplitter` (#8346) by @hyf0
- rolldown_plugin_vite_reporter: simplify hook registration and remove redundant state (#8322) by @shulaoda
- use set to store user defined entry modules (#8315) by @IWANABETHATGUY
- rust/mcs: collect groups into map at first for having clean and performant operations (#8313) by @hyf0
- mcs: introduce newtype `ModuleGroupOrigin` and `ModuleGroupId` (#8311) by @hyf0
- remove unnecessary `FinalizerMutableState` struct (#8303) by @shulaoda
- move module finalization into `finalize_modules` (#8302) by @shulaoda
- extract `apply_transfer_parts_mutation` into its own module (#8301) by @shulaoda
- move ESM format check into `determine_export_mode` (#8294) by @shulaoda
- remove `warnings` field from `GenerateContext` (#8293) by @shulaoda
- extract util function remove clippy supression (#8290) by @IWANABETHATGUY
- move `is_in_node_modules` to `PathExt` trait in `rolldown_std_utils` (#8286) by @shulaoda
- rolldown_plugin_vite_reporter: remove unnecessary ineffective dynamic import detection logic (#8285) by @shulaoda
- dev: inject hmr runtime to `\0rolldown/runtime.js` (#8234) by @hyf0
- improve naming in chunk_optimizer (#8287) by @IWANABETHATGUY
- simplify PostChunkOptimizationOperation from bitflags to enum (#8283) by @IWANABETHATGUY
- optimize BitSet.index_of_one to return iterator instead of Vec (#8282) by @IWANABETHATGUY
### 📚 Documentation
- change default value in `format` JSDoc from `'esm'` to `'es'` (#8372) by @shulaoda
- in-depth: remove `invalidImportSideEffects` option mention from lazy barrel optimization doc (#8355) by @sapphi-red
- mcs: clarify `minSize` constraints (#8279) by @ShroXd
### ⚡ Performance
- use IndexVec for chunk TLA detection (#8341) by @sapphi-red
- only invoke single resolve call for the same specifier and import kind (#8332) by @sapphi-red
- rolldown_plugin_vite_reporter: skip gzip computation when `report_compressed_size` is disabled (#8321) by @shulaoda
### 🧪 Testing
- use `vi.waitFor` and `expect.poll` instead of custom `waitUtil` function (#8369) by @sapphi-red
- rolldown_plugin_esm_external_require_plugin: add tests (#8358) by @sapphi-red
- add watch file tests (#8330) by @sapphi-red
- rolldown_plugin_vite_build_import_analysis: add test for dynamic import treeshaking (#8327) by @sapphi-red
### ⚙️ Miscellaneous Tasks
- prepare-release: skip workflow on forked repositories (#8368) by @shulaoda
- format more files (#8360) by @sapphi-red
- deps: update oxc to v0.114.0 (#8347) by @camc314
- deps: update test262 submodule for tests (#8354) by @sapphi-red
- deps: update crate-ci/typos action to v1.43.5 (#8350) by @renovate[bot]
- deps: update oxc apps (#8351) by @renovate[bot]
- rolldown_plugin_vite_reporter: remove unnecessary README.md (#8334) by @shulaoda
- deps: update npm packages (#8338) by @renovate[bot]
- deps: update rust crates (#8339) by @renovate[bot]
- deps: update dependency oxlint-tsgolint to v0.13.0 (#8337) by @renovate[bot]
- deps: update github-actions (#8336) by @renovate[bot]
- deps: update napi to v3.8.3 (#8331) by @renovate[bot]
- deps: update dependency oxlint-tsgolint to v0.12.2 (#8325) by @renovate[bot]
- remove unnecessary transform.decorator (#8314) by @IWANABETHATGUY
- deps: update dependency rust to v1.93.1 (#8305) by @renovate[bot]
- deps: update dependency oxlint-tsgolint to v0.12.1 (#8300) by @renovate[bot]
- deps: update oxc apps (#8296) by @renovate[bot]
- docs: don't skip for build runs without cache (#8281) by @sapphi-red

Summary
When a manual chunk splitting (MCS) group exceeds
maxSize, the old greedy loop split at the first position where the left half metminSize. This was position-dependent and unaware of file relevance — related files (e.g. all ofdate-fns/format*) could be torn apart simply because they happened to straddle the size boundary.This PR replaces that logic with a similarity-based split point selection algorithm.
How it works
1. Sort by
stable_id, then pick the best split pointModules in the oversized group are sorted lexicographically by
stable_id(file path). The algorithm then scans every candidate split index that satisfiesminSizeon both sides, and scores the boundary betweenmodules[i-1]andmodules[i]usingstable_id_similarity.2.
stable_id_similarity— character-level path similarityEach aligned byte pair contributes 0–10 points (10 = identical). Paths sharing a long common prefix (same directory) accumulate a high score; paths diverging early score low. Lower similarity = better split point (the boundary between unrelated files).
3. Noise reduction:
SIMILARITY_SIGNIFICANCE_THRESHOLD(= 10)Digit-level ASCII differences (e.g.
util1.jsvsutil2.js) produce tiny similarity fluctuations. The threshold of 10 (one character position's max score) absorbs this noise: two candidates whose similarity differs by ≤ 10 are treated as a tie, and the tie-breaker criteria decide instead.4. 3-tier comparison for picking the best split
react/*anddate-fns/*).maxSize.5. Recursive splitting
If either half is still oversized, it re-enters the same loop and splits again — so the algorithm naturally recurses until all chunks satisfy
maxSize(or no valid split exists).Example
Given modules sorted by
stable_id:The algorithm picks the boundary with the lowest similarity (e.g. between
date-fns/parse.jsandreact/Component.js, or betweenreact/hooks.jsandlodash/merge.js), keeping related library files together in the same chunk.