fix(dev): make register_modules async#7289
Conversation
How to use the Graphite Merge QueueAdd the label graphite: merge to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
❌ Deploy Preview for rolldown-rs failed.
|
Benchmarks Rust |
There was a problem hiding this comment.
Pull request overview
This PR fixes a deadlock issue in the dev engine by making the register_modules method asynchronous. The deadlock occurred when the dev engine held a lock on the clients DashMap during HMR update generation while Node.js synchronously waited to acquire the same lock in register_modules, preventing the event loop from resolving pending promises needed by the HMR generation process.
Key Changes:
- Changed
register_modulesfrom synchronous to async to prevent blocking the Node.js thread when acquiring the DashMap lock - Added clippy allow attributes with explanatory comment documenting the deadlock prevention rationale
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
17bcbf0 to
15213a6
Compare
Merge activity
|
The issue is "when dev engine is in the process of generating hmr update, a new file change will cause the nodejs hang during the second hmr update generating" In short, the hang is caused by deadlock. ## Details Rolldown stores the client information into `client` variable using `DashMap`, which is basically equivalent to `Mutex<HashMap>`. During the process of generating hmr update https://github.com/rolldown/rolldown/blob/5ef49ad615cfef1a9ebc97368546e1b9adbaf48d/crates/rolldown_dev/src/bundling_task.rs#L140 this line of code is equivalent to `client.lock()`, will means the lock of `client` will be hold during the whole generating process. The first hmr update is generated successfully and is sent to vite to trigger the hmr process. The browser loads the hmr patch and and sends message to vite node to register the new loaded module via code https://github.com/rolldown/rolldown/blob/0ce4a17c5ae1a95e331f8d38c5230742b09d1fd3/crates/rolldown_binding/src/binding_dev_engine.rs#L182-L184 nodejs calls `register_modules` https://github.com/rolldown/rolldown/blob/0ce4a17c5ae1a95e331f8d38c5230742b09d1fd3/crates/rolldown_binding/src/binding_dev_engine.rs#L183 This line of is also equivalent to `client.lock()`. In the meantime, dev engine is in the second hmr update generation, which holds the client lock already. So nodejs needs to wait for this lock to get free, and this wait is a **SYNCHRONOUS** wait. The nodejs itself doesn't have chance to resolve pending promises anymore(like resolving a timeout timer), while the hmr generating is awaiting js `transform` hook to be finished. All these things together cause a deadlock situation. --- If we have #7287 in the first place, it will be much easier to locate the problem, becuase we could easily find out which function is the last called function from `binding.js`. --- # Visual Explanation of the Deadlock ### The Circular Dependency ``` ┌──────────────────────────────────────────────────────────┐ │ │ ▼ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ Task 2 │ awaits │ Node.js │ waits │ Lock │ │ │ (Rust) │────────►│ transform │────────►│ (held by │───┘ │ │ │ hook │ SYNC │ Task 2) │ └────────────┘ └────────────┘ └────────────┘ │ ▲ │ │ │ register_modules │ │ called by browser │ │ (from Task 1's │ │ HMR patch) │ └──────────────────────┘ ``` ### Sequence of Events ``` 1. Task 1: Generate HMR #1 → Send to browser → Complete ✓ 2. File changes again → Task 2 starts 3. Task 2: Acquire clients lock 4. Task 2: Call JS transform hook (await) 5. Browser: Receives HMR #1, loads patch, calls register_modules 6. Node.js: register_modules tries to acquire lock (SYNC) 7. Node.js: BLOCKED waiting for lock (Task 2 has it) 8. Task 2: Waiting for JS promise to resolve 9. Node.js: Can't process promises (event loop blocked) 10. DEADLOCK! ``` ### The Fix Making `register_modules` async allows Node.js to **yield control** when waiting for the lock, keeping the event loop alive so it can resolve the transform hook promise. | Before (sync) | After (async) | |---------------|---------------| | Node.js **blocks entirely** waiting for lock | Node.js **yields** control back to event loop | | Event loop frozen, can't process callbacks | Event loop continues, can process transform callback | | **DEADLOCK** | **Works correctly** |
15213a6 to
13ea88c
Compare
## [1.0.0-beta.53] - 2025-12-03 💥 Breaking Changes - Drop `i686-pc-windows-msvc` target support 🚀 Chunk Merging Optimization - Rolldown now automatically merges shared chunks when entries import each other (when `preserveEntrySignature` is not `strict`) ```shell Before: entry.js → imports → shared.js (common chunk) entry2.js → imports → shared.js Output: 3 chunks (entry.js, entry2.js, shared.js) After: entry.js → contains shared code entry2.js → imports → entry.js Output: 2 chunks (entry.js, entry2.js) ``` ### 💥 BREAKING CHANGES - drop `i686-pc-windows-msvc` target support (#7230) by @sapphi-red ### 🚀 Features - rolldown_plugin_vite_manifest: pass normalized options to `isLegacy` callback (#7321) by @shulaoda - plugin/vite-resolve: add `disableCache` option (#6763) by @sapphi-red - rolldown: export `createTokioRuntime` for tsdown (#7264) by @shulaoda - rolldown_plugin_vite_html: sync `moduleSideEffects` for already loaded modules (#7254) by @shulaoda - rolldown_plugin_vite_html: load module scripts with side effects to prevent tree-shaking (#7244) by @shulaoda - rolldown_plugin_vite_css_post: implement `cssScopeTo` for scoped CSS tree-shaking (#7240) by @shulaoda ### 🐛 Bug Fixes - export default class decl __name runtime insertion (#7316) by @IWANABETHATGUY - chunk side effects calculation (#7273) by @IWANABETHATGUY - node: `output.generateCode.preset: 'es2015'` should set `output.generateCode.symbols: true` by default (#7314) by @sapphi-red - skip name helper for classes with static name property (#7312) by @IWANABETHATGUY - preserve chunk imports relationship after chunk merging (#7303) by @shulaoda - dev: make `register_modules` async (#7289) by @hyf0 - preserve computed property in object destructuring (#7288) by @IWANABETHATGUY - support dynamic imports with shared dependencies (#7261) by @IWANABETHATGUY - call `defer_sync_scan_data` in non-incremental build mode (#7255) by @shulaoda - optimize chunk merging for shared entry points (#7194) by @IWANABETHATGUY - add indentation for UMD format output (#7263) by @IWANABETHATGUY - rolldown_plugin_vite_css_post: pass options to `isLegacy` callback for proper legacy detection (#7260) by @shulaoda - rolldown_plugin_vite_css_post: also detect `?inline=true` query for inlined CSS (#7245) by @shulaoda - rolldown_plugin_vite_css_post: distinguish empty CSS from no CSS (#7241) by @shulaoda - add Windows support for t-run command (#7242) by @IWANABETHATGUY - cjs: prevent duplicate require declarations for external modules with preserveModules (#7234) by @logaretm - rolldown_plugin_vite_resolve: resolve from root for virtual modules (#7236) by @sapphi-red - include entry level external modules in chunk exports (#7218) by @IWANABETHATGUY ### 🚜 Refactor - dev: make `removeClient` async (#7313) by @hyf0 - move chunk merging code out of code_splitting.rs (#7285) by @IWANABETHATGUY - extract common function util for chunk merging (#7271) by @IWANABETHATGUY - use iterative method to merge chunks (#7256) by @IWANABETHATGUY - use concat_string! instead of string replace for generating chunk level exports (#7247) by @IWANABETHATGUY ### 📚 Documentation - add warning to experimental.resolveNewUrlToAsset about JS/TS files (#7300) by @Copilot - add sequential hook execution difference in plugin-api.md (#7308) by @Copilot - add migration example from onwarn to onLog (#7299) by @Copilot - add migration example for manualChunks to advancedChunks (#7298) by @Copilot - deps: bump vitepress to fix build (#7307) by @sapphi-red - examples & text for experimental.resolveNewUrlToAsset (#7259) by @TheAlexLichter ### ⚡ Performance - rolldown_plugin_vite_css_post: lazily load `cssScopeTo` from JS module options (#7253) by @shulaoda - rolldown_plugin_vite_css_post: avoid unnecessary string clones in `resolve_asset_urls_in_css` (#7250) by @shulaoda ### 🧪 Testing - generate relative path like name in advanced chunks (#7267) by @IWANABETHATGUY - add test case for preserveEntrySignatures with re-exports (#7279) by @IWANABETHATGUY - add test262 integration tests (#7196) by @sapphi-red ### ⚙️ Miscellaneous Tasks - deps: update dependency rolldown-plugin-dts to v0.18.1 (#7304) by @renovate[bot] - enable tracing feature for napi (#7322) by @sapphi-red - deps: update napi (#7320) by @renovate[bot] - deps: update oxc (#7318) by @renovate[bot] - deps: update napi (#7317) by @renovate[bot] - deps: update oxc to v0.100.0 (#7301) by @renovate[bot] - deps: downgrade pnpm to 10.23.0 to fix Netlify build (#7306) by @shulaoda - add `trustPolicyExclude` for chokidar and semver (#7302) by @sapphi-red - update pnpm lockfile (#7291) by @IWANABETHATGUY - deps: update npm packages (#7272) by @renovate[bot] - deps: update rust crates (#7270) by @renovate[bot] - deps: update oxc (#7262) by @renovate[bot] - deps: update github-actions (#7269) by @renovate[bot] - deps: update dependency dprint-typescript to v0.95.13 (#7268) by @renovate[bot] - deps: update `html5gum` to 0.8.1 (#7265) by @shulaoda - rolldown: remove unused `getModuleOptions` from `PluginContext` (#7266) by @shulaoda - remove unnecessary justfile ignore (#7243) by @IWANABETHATGUY - deps: update oxc apps (#7238) by @renovate[bot] - add `nul` to workaround https://github.com/anthropics/claude-c… (#7237) by @IWANABETHATGUY - deps: update dependency valibot to v1.2.0 [security] (#7231) by @renovate[bot] - deps: update crate-ci/typos action to v1.40.0 (#7232) by @renovate[bot] ### ❤️ New Contributors * @logaretm made their first contribution in [#7234](#7234) Co-authored-by: shulaoda <165626830+shulaoda@users.noreply.github.com>
…9031) Closes the `emit_chunk` item from the sync-NAPI deadlock audit in **#7311**. ### Symptom Plugins that call `this.emitFile({ type: 'chunk', ... })` from a `transform` (or any hook, really) hang rolldown indefinitely once enough emits accumulate. The hang is deterministic for tight emit loops (~1025 emits from a single hook) and **non-deterministic under parallelism — it can trigger with as few as ~400 emits** spread across concurrent transforms. There is no error, no log, no stack trace from rolldown — the build just stops making progress. The main JS thread ends up parked inside `_pthread_cond_wait` under `napi_call_function → ... → block_on`, and every tokio worker is parked at the same `pthread_cond_wait`. This was reproducible against `rolldown@1.0.0-rc.13` on a real RSC plugin ([@lazarv/react-server](https://github.com/lazarv/react-server)) building a Mantine-sized app (~3300 `"use client"` modules, each emitted as an entry chunk), and then minimised to 30 lines of plugin code. ### Root cause `PluginContext.emitFile({ type: 'chunk' })` is a **sync** napi binding. Until this PR it called `napi::bindgen_prelude::block_on(...)` on the JS thread to drive an `async fn emit_chunk` whose two await points were: 1. `self.tx.lock().await` on a `tokio::sync::Mutex<Option<Sender<ModuleLoaderMsg>>>` 2. `send(AddEntryModule(...)).await` on a bounded `mpsc::channel(1024)` shared with the module loader The bounded channel is the trap. Once it fills, the send future can only be unblocked by the module loader draining it. But **draining each `AddEntryModule` message dispatches plugin hooks (`resolveId`, `load`, further `transform`s) back to the JS thread via TSFN** — and the JS thread is pinned inside `block_on` servicing the current `emit_chunk`. The only consumer that can free capacity is waiting on the only thread that is blocked producing. Classic producer ⇄ consumer deadlock through TSFN. A couple of subtleties worth calling out, because they explain why the bug was easy to miss: - **The effective capacity is much lower than 1024.** Under parallelism, in-flight module tasks spawned for earlier emits are already waiting on TSFN responses from the blocked JS thread. Those pending tasks keep the loader effectively frozen while the channel stays saturated, so the producer hits `tx.send().await` and hangs at a much smaller emit count. In local testing against unpatched `rc.13`: - 200 parallel inputs × 1 emit each → ✓ 62 ms - 500 → ✓ 154 ms - **600 → hang** (after ~400 transforms processed) - 2000 → hang (never even reaches `buildStart`) - **Parallel transforms without `emitFile` scale fine.** A control experiment — 12 000 virtual inputs each going through a `transform` hook that does *not* call `emitFile` — completes cleanly. So the bottleneck is not transform dispatch, TSFN throughput, `fetch_modules`, or the loader loop. It is specifically `emit_chunk`'s `block_on` + bounded-channel interaction, exactly as #7311 predicted when it flagged `emit_chunk` as MEDIUM risk. - **Yielding inside the JS transform does not help.** `setImmediate` / `process.nextTick` / `Promise.resolve` yields let libuv service TSFN callbacks, but at the point the producer is stuck in `block_on` the consumer cannot hand the JS thread anything to do — the loader is either blocked on the current transform's TSFN response, or has no work ready that does not depend on the blocked thread. The deadlock is structural, not a scheduling race. - **This is not an "insanely large build" problem.** Each `"use client"` file in a normal RSC project emits exactly one chunk from its own `transform`. 3300 client components is a reasonable size for an app. No plugin is doing anything pathological; the rolldown primitive simply does not survive its documented usage at scale. ### Fix Collapse the entire `emit_chunk` path to synchronous code and drop `block_on` from the napi binding. There is no reason any of it was async: - `FileEmitter::emit_chunk` is now a sync `fn`. Its `tx` is a `std::sync::Mutex<Option<UnboundedSender<ModuleLoaderMsg>>>`. The critical section is `Option::clone()` — the `Sender` is cheaply cloneable — then the lock is dropped before `send`. The install/clear path runs only at scan boundaries and is never contended with build traffic; the per-emit path's contention is bounded to nanoseconds (lock-free CAS on the clone). - `BindingPluginContext::emit_chunk` no longer enters the tokio runtime. It is now a plain sync binding with the same shape as the adjacent `emit_file`, marked `// SYNC-SAFE` per the convention introduced by #7289 / #7311. - `PluginContext::emit_chunk` and `NativePluginContextImpl::emit_chunk` are de-async'd to match. - `PluginDriver.tx` is unified on `std::sync::Mutex` for consistency with `FileEmitter.tx` — the `load` hook holds it only to clone the sender and drop the lock before awaiting module-load completion. - As defense in depth, the module loader's message channel is switched to `unbounded_channel()`. `UnboundedSender::send` is synchronous and infallible, so even if a future refactor reintroduces a sync wait on this path, there is no `.await` that can park the JS thread. All `tx.send(...).await` call sites in `module_task.rs`, `external_module_task.rs`, `runtime_module_task.rs`, and `native_plugin_context.rs::load` are updated accordingly. ### JS API impact **None.** `this.emitFile({ type: 'chunk' })` remains synchronous and returns `string` directly, matching Rollup's `PluginContext.emitFile` contract. No plugin needs to change. ### Tests Two new deterministic regression fixtures under `packages/rolldown/tests/fixtures/plugin/context/`: - **`emit-chunk-many-from-transform`** — a single `transform` hook emits 2000 chunks in a tight loop. Exercises the "tight emit loop" path. Deadlocks on `main`, passes in ~470 ms with this fix. - **`emit-chunk-many-parallel-inputs`** — 1500 virtual inputs, each with its own `transform` emitting exactly one chunk. Exercises the realistic "large plugin at scale" path. Deadlocks on `main`, passes in ~600 ms with this fix. Both are marked `sequential: true` and live alongside the existing `plugin/context/emit-file` fixtures. Full `fixture.test.ts` run is green (99/99). ### Relation to #7311 #7311 audited sync NAPI bindings after the dev-engine deadlock fixed in #7289. It explicitly flagged `emit_chunk` as a **MEDIUM-risk** sync binding with deadlock potential, listed two possible resolutions (Option A: make async; Option B: add `SYNC-SAFE` comment with justification), and left the task unchecked when the issue was closed as completed. This PR resolves that task by taking a third path: **make the entire underlying implementation synchronous so the binding can stay sync without `block_on`**. The `SYNC-SAFE` comment is added per convention. This avoids the breaking-change implications of making `this.emitFile` return a `Promise`. --------- Co-authored-by: 翠 <green@sapphi.red> Co-authored-by: dalaoshu <165626830+shulaoda@users.noreply.github.com>

The issue is "when dev engine is in the process of generating hmr update, a new file change will cause the nodejs hang during the second hmr update generating"
In short, the hang is caused by deadlock.
Details
Rolldown stores the client information into
clientvariable usingDashMap, which is basically equivalent toMutex<HashMap>.During the process of generating hmr update
rolldown/crates/rolldown_dev/src/bundling_task.rs
Line 140 in 5ef49ad
this line of code is equivalent to
client.lock(), will means the lock ofclientwill be hold during the whole generating process.The first hmr update is generated successfully and is sent to vite to trigger the hmr process.
The browser loads the hmr patch and and sends message to vite node to register the new loaded module via code
rolldown/crates/rolldown_binding/src/binding_dev_engine.rs
Lines 182 to 184 in 0ce4a17
nodejs calls
register_modulesrolldown/crates/rolldown_binding/src/binding_dev_engine.rs
Line 183 in 0ce4a17
This line of is also equivalent to
client.lock().In the meantime, dev engine is in the second hmr update generation, which holds the client lock already.
So nodejs needs to wait for this lock to get free, and this wait is a SYNCHRONOUS wait. The nodejs itself doesn't have chance to resolve pending promises anymore(like resolving a timeout timer), while the hmr generating is awaiting js
transformhook to be finished.All these things together cause a deadlock situation.
If we have #7287 in the first place, it will be much easier to locate the problem, becuase we could easily find out which function is the last called function from
binding.js.Visual Explanation of the Deadlock
The Circular Dependency
Sequence of Events
The Fix
Making
register_modulesasync allows Node.js to yield control when waiting for the lock, keeping the event loop alive so it can resolve the transform hook promise.