Bug type
Regression
Summary
Each pi-agent invocation triggers a synchronous "mirror" walk over every bundled plugin, blocking the gateway main thread. On the first agent run after gateway start, four full sweeps stack up and produce ~80–90 seconds of contiguous main-thread blocking (cold fs page cache). On subsequent agent runs the block is shorter — roughly 15 s per sweep on warm fs cache — but it never stops happening because the per-plugin work is not memoized.
Steps to reproduce
- Install OpenClaw
2026.4.26 globally via npm. The bundled plugin runtime install root lands at ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-<hash>/ with the default 114 bundled plugins on disk.
- Cold-start the gateway. Wait for it to be idle.
- Attach a CPU profile or a 200 ms tick main-thread lag probe inside the gateway process.
- Send the first
agent RPC after gateway start.
- Observe the lag probe reports ~89 s of cumulative main-thread blocking before the first model event arrives.
A self-contained reproduction without running gateway is described under "Reproduction" below.
Expected behavior
Plugin runtime mirror preparation must not block the event loop for more than a few hundred milliseconds at a time, regardless of how many plugins are installed.
Actual behavior
Main thread is fully blocked. Lag probe summary from one observed first send:
lag.summary dur=89500ms ticks=5 max=49860ms over100=4
Three large spikes inside that 89.5 s: 5.9 s, 32.5 s, 49.9 s. CPU profile attribution (10 ms sampling, the 187 s..277 s window of a 9-minute capture, which is the lag window) places >99% of the time inside the mirror call chain:
TOP SELF time in 89.5s window:
8177.7ms (garbage collector)
2666.8ms readFileUtf8
2625.2ms readFileUtf8
2513.1ms readFileUtf8
2466.5ms readFileUtf8
...
1488.4ms lstat
1356.8ms lstat
1336.9ms readFileSync (node:fs:433)
...
1064.0ms existsSync
954.3ms existsSync
...
415.5ms RegExp: (?:^|\n)\/\/#region extensions\/[^/\s]+(?:\/|$)
TOP TOTAL time in 89.5s window:
45576.6ms runEmbeddedAttempt
45262.5ms createOpenClawCodingTools / createOpenClawTools
14000.9ms createVideoGenerateTool / resolveVideoGenerationModelConfigForTool
13953.3ms createImageGenerateTool / resolveImageGenerationModelConfigForTool
12876.2ms createMusicGenerateTool / resolveMusicGenerationModelConfigForTool
12721.4ms runWithModelFallback → runFallbackAttempt → runAgentAttempt → runEmbeddedPiAgent
Time inside os.networkInterfaces() in window: 0.0 ms (0.0%)
The 89.5 s splits into two contiguous synchronous runs:
- 39.3 s starting at offset 187 s — root frame
loadOpenClawPlugins (one full plugin-registry resolution).
- 50.5 s starting at offset 226 s — root frame
createOpenClawCodingTools. Three generate-tool factories each spin up a runEmbeddedPiAgent, and each pi-agent re-resolves the registry, which calls prepareBundledPluginRuntimeRoot once per plugin all over again.
Hot stack at the deepest sample:
... ← shouldMaterializeBundledRuntimeMirrorDistFile / materializeBundledRuntimeMirrorDistFile
← mirrorBundledRuntimeDistRootEntries
← prepareBundledPluginRuntimeDistMirror
← (anonymous)
← withBundledRuntimeDepsFilesystemLock
← mirrorBundledPluginRuntimeRoot
← prepareBundledPluginRuntimeRoot (per plugin, ×114)
← loadOpenClawPlugins | createVideoGenerateTool / createImageGenerateTool / createMusicGenerateTool
Root cause analysis
The agent-run path triggers loadOpenClawPlugins (a synchronous, per-plugin loop) four times during a single first send: once via resolveRuntimePluginRegistry for the runtime context, and three more times via the video/image/music tool factories where each tool spawns a runEmbeddedPiAgent that re-resolves the registry. Each of those four resolutions calls prepareBundledPluginRuntimeRoot once per plugin (~114 plugins on a default install).
Inside prepareBundledPluginRuntimeRoot, mirrorBundledRuntimeDistRootEntries walks the entire dist/ top level (~2760 .js files, ~24 MB on disk) every time. There is no "this install root has already been mirrored in this process" guard.
Specific issues we identified while reading the source and reproducing the slowness:
1. Everything in the chain is function, not async function. All file IO is *Sync:
readFileSync, lstatSync, realpathSync, linkSync, rmSync, mkdirSync, existsSync, readdirSync, readlinkSync, copyFileSync, symlinkSync, renameSync, writeFileSync, chmodSync, statSync.
- Even the file-system lock waits use a hard thread block:
Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms) (src/plugins/bundled-runtime-deps.ts:426).
A single sample of a prepareBundledPluginRuntimeRoot call therefore parks the event loop for the entire wall-clock cost of all the syscalls put together.
2. No process-level dedup across plugins for the dist-root walk. The same (installRoot, sourceDistRoot) pair gets walked once per plugin. With 114 plugins × 4 invocations per first send, that is ~456 full dist/ sweeps per first send, even though the walk is logically idempotent for a fixed source dist.
3. materializeBundledRuntimeMirrorDistFile's early-return is broken in the steady state. At src/plugins/bundled-runtime-deps.ts:171:
try {
if (
fs.realpathSync(sourcePath) === fs.realpathSync(targetPath) &&
!fs.lstatSync(targetPath).isSymbolicLink()
) {
return;
}
} catch {}
fs.mkdirSync(path.dirname(targetPath), { recursive: true, mode: 0o755 });
fs.rmSync(targetPath, { recursive: true, force: true });
try {
fs.linkSync(sourcePath, targetPath);
...
}
For a hardlink target, realpathSync(target) returns the target's own canonical path, not the source path — realpath does not "deduplicate" hardlinks. So the equality fails even when the target is already the correct hardlink that points at the same inode as the source. We always fall through to rmSync(target) + linkSync(source, target), rewriting the same hardlink with no functional change. On a default install this happens to ~462 dist-root files on every single sweep.
The intent is "skip if target already mirrors source"; the actual condition should compare (dev, ino) from lstatSync, since two paths that share (dev, ino) already point to the same on-disk content.
4. The shipped 2026.4.26 build has no shouldMaterializeBundledRuntimeMirrorDistFile cache. The main branch adds bundledRuntimeMirrorMaterializeCache keyed by stat signature, which would short-circuit subsequent calls in the same process. The shipped compiled module reads + regex-tests every file every time:
function shouldMaterializeBundledRuntimeMirrorDistFile(sourcePath) {
if (!BUNDLED_RUNTIME_MIRROR_MATERIALIZED_EXTENSIONS.has(path.extname(sourcePath))) return false;
try {
return BUNDLED_RUNTIME_MIRROR_PLUGIN_REGION_RE.test(fs.readFileSync(sourcePath, "utf8"));
} catch { return false; }
}
This alone accounts for ~24 MB of readFileSync per sweep, repeated per plugin. After the cache lands in a future release this cost goes away, but issues 1, 2, and 3 above still produce 462 unconditional unlink + linkSync per plugin per sweep.
Reproduction (independent bench)
Three Node ESM scripts, no gateway needed. They directly import the compiled prepareBundledPluginRuntimeRoot from the installed openclaw and run sweeps against the real install root. Mirror is idempotent so the install root state after the bench is functionally identical to before it.
bench-mirror-real.mjs — restores the dist-root hardlinks back to symlinks (mimicking just-installed state) and runs prepareBundledPluginRuntimeRoot for several plugins in sequence. Reports per-call duration and a 200 ms tick lag probe summary.
bench-mirror-classes.mjs — runs four full 114-plugin sweeps in a row, mirroring the four sweeps observed in the CPU profile. Outputs per-round totals and per-plugin top-N timing.
bench-mirror-direct.mjs — independent reimplementation of the algorithm, tests sub-steps in isolation against a /tmp staging dir.
Steady-state warm-fs results on a WSL2 ext4 host (114 bundled plugins on disk):
| What |
Time |
| One full 114-plugin sweep (steady state) |
~15 seconds |
Average per-plugin prepareBundledPluginRuntimeRoot |
~130 ms |
| Four sweeps in a row (matches the four-sweep first-send pattern) |
~60 seconds |
| Hottest sub-step inside a sweep |
mirrorBundledRuntimeDistRootEntries (462 unlink+linkSync + 2298 existsSync + per-call regex over 2760 files) |
| Slowest per-plugin call (steady state, plugin with the largest dependency closure) |
~290 ms |
Cold fs page cache (the production gateway scenario) inflates the first sweep from ~15 s to ~39 s and adds ~8 s of GC pressure across the four-sweep run. That gets us from 60 s warm to the 89.5 s observed.
The bench scripts are short (~150 lines each) and self-contained; happy to attach as a gist if useful.
Suggested fix
Convert the entire chain to fs.promises.* + await, and add per-process memoization for the dist-root mirror operation. Concretely:
-
Async-ify the hot path. Make prepareBundledPluginRuntimeRoot / mirrorBundledPluginRuntimeRoot / prepareBundledPluginRuntimeDistMirror / mirrorBundledRuntimeDistRootEntries / refreshBundledPluginRuntimeMirrorRoot / copyBundledPluginRuntimeRoot / fingerprintBundledRuntimeMirrorSourceRoot / hashBundledRuntimeMirrorDirectory async and replace every *Sync call with the promise variant.
-
Replace the synchronous lock with an async-friendly one. withBundledRuntimeDepsFilesystemLock can wrap the existing fs.mkdirSync(lockDir) acquisition behind an in-process Mutex (e.g., async-mutex) and await the work inside. The lock-dir acquisition itself can stay sync; the long-running work inside it must not.
-
Yield to the event loop at directory boundaries. Inside hashBundledRuntimeMirrorDirectory and copyBundledPluginRuntimeRoot, await new Promise(setImmediate) once per directory (or per N entries) so a single large plugin tree cannot starve the loop. SHA-256 hashing has no async fs primitive, so this pattern is what keeps the hot loop cooperative.
-
Memoize the dist-root mirror result by (installRoot, sourceDistRoot, source dist mtime). Once the dist root is mirrored in the current process, all subsequent plugins on the same source dist skip the per-plugin re-walk entirely. This collapses ~456 sweeps per first send to one.
-
Fix materializeBundledRuntimeMirrorDistFile's early-return. Compare (dev, ino) from lstatSync(source) and lstatSync(target) instead of realpathSync equality. Two paths that share (dev, ino) are already the same file on disk; that is the actual condition the rewrite is trying to avoid.
-
Land the existing bundledRuntimeMirrorMaterializeCache from main in a patch release, even ahead of the async migration. It cuts ~24 MB of readFileSync per sweep down to one stat per file in the steady state.
Items 4 and 5 alone collapse the 89.5 s observed block to roughly the cost of a single sweep (~15 s warm / ~39 s cold). Adding async + setImmediate yield (items 1–3) makes the remaining work non-blocking — events and other RPCs on the same gateway can interleave.
Regression evidence
src/plugins/bundled-runtime-root.ts was added on 2026-04-22 in commit 9c733956c0 ("fix(plugins): repair bundled deps on activation"), and src/plugins/bundled-runtime-mirror.ts on 2026-04-27 in commit 6f09039b0c ("fix(plugins): reuse unchanged runtime mirrors"). Together with ~8 follow-up fix(plugins): ... commits over the next two days, this code path replaced a much lighter pre-existing approach.
Earlier OpenClaw versions blocked the main thread on gateway restart but not for 80+ seconds, and openclaw logs --follow could still attach during the block. Both regressed in 2026.4.22+: on 2026.4.26, openclaw logs --follow cannot attach for the full duration of the first agent-run block.
This issue is related to but distinct from #74325 (gateway restart 75 s block). #74325 is dominated by mDNS / os.networkInterfaces() polling during gateway startup; this issue covers the bundled-plugin-mirror path that fires per agent run, with os.networkInterfaces() accounting for 0% of the lag window measured here.
OpenClaw version
2026.4.26
Operating system
Ubuntu 22.04 on WSL2 (kernel 6.6.87.2-microsoft-standard-WSL2), Node.js 22.21.1, ext4. The cold-fs amplification is platform-dependent (WSL2's seccomp_do_user_notification adds ~10 ms per os.networkInterfaces() call, but no impact on the mirror path here); the warm-fs ~15 s/sweep baseline reproduces on any platform with a default 114-plugin install.
Model
N/A — reproduces before any model call; the lag is between phase-1 ack and the first model event.
Provider / routing chain
N/A
Install method
npm global (npm install -g openclaw)
Logs, screenshots, and evidence
Plugin-side lag probe summary from one observed first send (timestamps from CoClaw's RPC/main-thread tracing):
gw.send agent
gw.recv accepted ← phase-1 ack 15ms
... (5.9s) lag.spike +5903ms
dc.recv coclaw.sessions.getById ← unrelated RPC slips through during a brief unblock
... (32.5s) lag.spike +32487ms
... (50.0s) lag.spike +49860ms
gw.recv event agent ← first model event finally arrives
lag.summary dur=89500ms ticks=5 max=49860ms over100=4
CPU profile call attribution is in the "Actual behavior" section above. Full .cpuprofile file (10 ms sampling, ~6 MB) and the bench scripts available on request.
Additional information
Reproduction was done by directly importing the compiled prepareBundledPluginRuntimeRoot from ~/.nvm/.../openclaw/dist/bundled-runtime-root-DEMD7-O_.js — same code path the gateway runs, just isolated from network and pi-agent overhead. Steady-state warm-fs numbers (~15 s/sweep) are a hard lower bound; production blocking will always be at least this much per sweep.
We are happy to provide the cpuprofile, the bench scripts, and any further measurements that help. We would also be happy to test a candidate fix on our setup.
Reported by the CoClaw team.
This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.
Bug type
Regression
Summary
Each pi-agent invocation triggers a synchronous "mirror" walk over every bundled plugin, blocking the gateway main thread. On the first agent run after gateway start, four full sweeps stack up and produce ~80–90 seconds of contiguous main-thread blocking (cold fs page cache). On subsequent agent runs the block is shorter — roughly 15 s per sweep on warm fs cache — but it never stops happening because the per-plugin work is not memoized.
Steps to reproduce
2026.4.26globally via npm. The bundled plugin runtime install root lands at~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-<hash>/with the default 114 bundled plugins on disk.agentRPC after gateway start.A self-contained reproduction without running gateway is described under "Reproduction" below.
Expected behavior
Plugin runtime mirror preparation must not block the event loop for more than a few hundred milliseconds at a time, regardless of how many plugins are installed.
Actual behavior
Main thread is fully blocked. Lag probe summary from one observed first send:
Three large spikes inside that 89.5 s: 5.9 s, 32.5 s, 49.9 s. CPU profile attribution (10 ms sampling, the 187 s..277 s window of a 9-minute capture, which is the lag window) places >99% of the time inside the mirror call chain:
The 89.5 s splits into two contiguous synchronous runs:
loadOpenClawPlugins(one full plugin-registry resolution).createOpenClawCodingTools. Three generate-tool factories each spin up arunEmbeddedPiAgent, and each pi-agent re-resolves the registry, which callsprepareBundledPluginRuntimeRootonce per plugin all over again.Hot stack at the deepest sample:
Root cause analysis
The agent-run path triggers
loadOpenClawPlugins(a synchronous, per-plugin loop) four times during a single first send: once viaresolveRuntimePluginRegistryfor the runtime context, and three more times via the video/image/music tool factories where each tool spawns arunEmbeddedPiAgentthat re-resolves the registry. Each of those four resolutions callsprepareBundledPluginRuntimeRootonce per plugin (~114 plugins on a default install).Inside
prepareBundledPluginRuntimeRoot,mirrorBundledRuntimeDistRootEntrieswalks the entiredist/top level (~2760.jsfiles, ~24 MB on disk) every time. There is no "this install root has already been mirrored in this process" guard.Specific issues we identified while reading the source and reproducing the slowness:
1. Everything in the chain is
function, notasync function. All file IO is*Sync:readFileSync,lstatSync,realpathSync,linkSync,rmSync,mkdirSync,existsSync,readdirSync,readlinkSync,copyFileSync,symlinkSync,renameSync,writeFileSync,chmodSync,statSync.Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms)(src/plugins/bundled-runtime-deps.ts:426).A single sample of a
prepareBundledPluginRuntimeRootcall therefore parks the event loop for the entire wall-clock cost of all the syscalls put together.2. No process-level dedup across plugins for the dist-root walk. The same
(installRoot, sourceDistRoot)pair gets walked once per plugin. With 114 plugins × 4 invocations per first send, that is ~456 fulldist/sweeps per first send, even though the walk is logically idempotent for a fixed source dist.3.
materializeBundledRuntimeMirrorDistFile's early-return is broken in the steady state. Atsrc/plugins/bundled-runtime-deps.ts:171:For a hardlink target,
realpathSync(target)returns the target's own canonical path, not the source path —realpathdoes not "deduplicate" hardlinks. So the equality fails even when the target is already the correct hardlink that points at the same inode as the source. We always fall through tormSync(target) + linkSync(source, target), rewriting the same hardlink with no functional change. On a default install this happens to ~462 dist-root files on every single sweep.The intent is "skip if target already mirrors source"; the actual condition should compare
(dev, ino)fromlstatSync, since two paths that share(dev, ino)already point to the same on-disk content.4. The shipped 2026.4.26 build has no
shouldMaterializeBundledRuntimeMirrorDistFilecache. Themainbranch addsbundledRuntimeMirrorMaterializeCachekeyed by stat signature, which would short-circuit subsequent calls in the same process. The shipped compiled module reads + regex-tests every file every time:This alone accounts for ~24 MB of
readFileSyncper sweep, repeated per plugin. After the cache lands in a future release this cost goes away, but issues 1, 2, and 3 above still produce 462 unconditionalunlink + linkSyncper plugin per sweep.Reproduction (independent bench)
Three Node ESM scripts, no gateway needed. They directly import the compiled
prepareBundledPluginRuntimeRootfrom the installed openclaw and run sweeps against the real install root. Mirror is idempotent so the install root state after the bench is functionally identical to before it.bench-mirror-real.mjs— restores the dist-root hardlinks back to symlinks (mimicking just-installed state) and runsprepareBundledPluginRuntimeRootfor several plugins in sequence. Reports per-call duration and a 200 ms tick lag probe summary.bench-mirror-classes.mjs— runs four full 114-plugin sweeps in a row, mirroring the four sweeps observed in the CPU profile. Outputs per-round totals and per-plugin top-N timing.bench-mirror-direct.mjs— independent reimplementation of the algorithm, tests sub-steps in isolation against a/tmpstaging dir.Steady-state warm-fs results on a WSL2 ext4 host (114 bundled plugins on disk):
prepareBundledPluginRuntimeRootmirrorBundledRuntimeDistRootEntries(462 unlink+linkSync + 2298 existsSync + per-call regex over 2760 files)Cold fs page cache (the production gateway scenario) inflates the first sweep from ~15 s to ~39 s and adds ~8 s of GC pressure across the four-sweep run. That gets us from 60 s warm to the 89.5 s observed.
The bench scripts are short (~150 lines each) and self-contained; happy to attach as a gist if useful.
Suggested fix
Convert the entire chain to
fs.promises.*+await, and add per-process memoization for the dist-root mirror operation. Concretely:Async-ify the hot path. Make
prepareBundledPluginRuntimeRoot/mirrorBundledPluginRuntimeRoot/prepareBundledPluginRuntimeDistMirror/mirrorBundledRuntimeDistRootEntries/refreshBundledPluginRuntimeMirrorRoot/copyBundledPluginRuntimeRoot/fingerprintBundledRuntimeMirrorSourceRoot/hashBundledRuntimeMirrorDirectoryasync and replace every*Synccall with the promise variant.Replace the synchronous lock with an async-friendly one.
withBundledRuntimeDepsFilesystemLockcan wrap the existingfs.mkdirSync(lockDir)acquisition behind an in-processMutex(e.g.,async-mutex) andawaitthe work inside. The lock-dir acquisition itself can stay sync; the long-running work inside it must not.Yield to the event loop at directory boundaries. Inside
hashBundledRuntimeMirrorDirectoryandcopyBundledPluginRuntimeRoot,await new Promise(setImmediate)once per directory (or per N entries) so a single large plugin tree cannot starve the loop. SHA-256 hashing has no async fs primitive, so this pattern is what keeps the hot loop cooperative.Memoize the dist-root mirror result by
(installRoot, sourceDistRoot, source dist mtime). Once the dist root is mirrored in the current process, all subsequent plugins on the same source dist skip the per-plugin re-walk entirely. This collapses ~456 sweeps per first send to one.Fix
materializeBundledRuntimeMirrorDistFile's early-return. Compare(dev, ino)fromlstatSync(source)andlstatSync(target)instead ofrealpathSyncequality. Two paths that share(dev, ino)are already the same file on disk; that is the actual condition the rewrite is trying to avoid.Land the existing
bundledRuntimeMirrorMaterializeCachefrommainin a patch release, even ahead of the async migration. It cuts ~24 MB ofreadFileSyncper sweep down to one stat per file in the steady state.Items 4 and 5 alone collapse the 89.5 s observed block to roughly the cost of a single sweep (~15 s warm / ~39 s cold). Adding async + setImmediate yield (items 1–3) makes the remaining work non-blocking — events and other RPCs on the same gateway can interleave.
Regression evidence
src/plugins/bundled-runtime-root.tswas added on 2026-04-22 in commit9c733956c0("fix(plugins): repair bundled deps on activation"), andsrc/plugins/bundled-runtime-mirror.tson 2026-04-27 in commit6f09039b0c("fix(plugins): reuse unchanged runtime mirrors"). Together with ~8 follow-upfix(plugins): ...commits over the next two days, this code path replaced a much lighter pre-existing approach.Earlier OpenClaw versions blocked the main thread on gateway restart but not for 80+ seconds, and
openclaw logs --followcould still attach during the block. Both regressed in 2026.4.22+: on 2026.4.26,openclaw logs --followcannot attach for the full duration of the first agent-run block.This issue is related to but distinct from #74325 (gateway restart 75 s block). #74325 is dominated by mDNS /
os.networkInterfaces()polling during gateway startup; this issue covers the bundled-plugin-mirror path that fires per agent run, withos.networkInterfaces()accounting for 0% of the lag window measured here.OpenClaw version
2026.4.26Operating system
Ubuntu 22.04 on WSL2 (kernel
6.6.87.2-microsoft-standard-WSL2), Node.js 22.21.1, ext4. The cold-fs amplification is platform-dependent (WSL2'sseccomp_do_user_notificationadds ~10 ms peros.networkInterfaces()call, but no impact on the mirror path here); the warm-fs ~15 s/sweep baseline reproduces on any platform with a default 114-plugin install.Model
N/A — reproduces before any model call; the lag is between phase-1 ack and the first model event.
Provider / routing chain
N/A
Install method
npm global (
npm install -g openclaw)Logs, screenshots, and evidence
Plugin-side lag probe summary from one observed first send (timestamps from CoClaw's RPC/main-thread tracing):
CPU profile call attribution is in the "Actual behavior" section above. Full
.cpuprofilefile (10 ms sampling, ~6 MB) and the bench scripts available on request.Additional information
Reproduction was done by directly importing the compiled
prepareBundledPluginRuntimeRootfrom~/.nvm/.../openclaw/dist/bundled-runtime-root-DEMD7-O_.js— same code path the gateway runs, just isolated from network and pi-agent overhead. Steady-state warm-fs numbers (~15 s/sweep) are a hard lower bound; production blocking will always be at least this much per sweep.We are happy to provide the cpuprofile, the bench scripts, and any further measurements that help. We would also be happy to test a candidate fix on our setup.
Reported by the CoClaw team.
This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.