Skip to content

[Bug]: Bundled plugin runtime mirror runs synchronously on every pi-agent invocation, blocking the gateway main thread for tens of seconds (regression in 2026.4.22+) #75069

@xiaohuaxi

Description

@xiaohuaxi

Bug type

Regression

Summary

Each pi-agent invocation triggers a synchronous "mirror" walk over every bundled plugin, blocking the gateway main thread. On the first agent run after gateway start, four full sweeps stack up and produce ~80–90 seconds of contiguous main-thread blocking (cold fs page cache). On subsequent agent runs the block is shorter — roughly 15 s per sweep on warm fs cache — but it never stops happening because the per-plugin work is not memoized.

Steps to reproduce

  1. Install OpenClaw 2026.4.26 globally via npm. The bundled plugin runtime install root lands at ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-<hash>/ with the default 114 bundled plugins on disk.
  2. Cold-start the gateway. Wait for it to be idle.
  3. Attach a CPU profile or a 200 ms tick main-thread lag probe inside the gateway process.
  4. Send the first agent RPC after gateway start.
  5. Observe the lag probe reports ~89 s of cumulative main-thread blocking before the first model event arrives.

A self-contained reproduction without running gateway is described under "Reproduction" below.

Expected behavior

Plugin runtime mirror preparation must not block the event loop for more than a few hundred milliseconds at a time, regardless of how many plugins are installed.

Actual behavior

Main thread is fully blocked. Lag probe summary from one observed first send:

lag.summary dur=89500ms ticks=5 max=49860ms over100=4

Three large spikes inside that 89.5 s: 5.9 s, 32.5 s, 49.9 s. CPU profile attribution (10 ms sampling, the 187 s..277 s window of a 9-minute capture, which is the lag window) places >99% of the time inside the mirror call chain:

TOP SELF time in 89.5s window:
  8177.7ms  (garbage collector)
  2666.8ms  readFileUtf8
  2625.2ms  readFileUtf8
  2513.1ms  readFileUtf8
  2466.5ms  readFileUtf8
  ...
  1488.4ms  lstat
  1356.8ms  lstat
  1336.9ms  readFileSync (node:fs:433)
  ...
  1064.0ms  existsSync
   954.3ms  existsSync
   ...
   415.5ms  RegExp: (?:^|\n)\/\/#region extensions\/[^/\s]+(?:\/|$)

TOP TOTAL time in 89.5s window:
 45576.6ms  runEmbeddedAttempt
 45262.5ms  createOpenClawCodingTools / createOpenClawTools
 14000.9ms  createVideoGenerateTool / resolveVideoGenerationModelConfigForTool
 13953.3ms  createImageGenerateTool / resolveImageGenerationModelConfigForTool
 12876.2ms  createMusicGenerateTool / resolveMusicGenerationModelConfigForTool
 12721.4ms  runWithModelFallback → runFallbackAttempt → runAgentAttempt → runEmbeddedPiAgent

Time inside os.networkInterfaces() in window: 0.0 ms (0.0%)

The 89.5 s splits into two contiguous synchronous runs:

  • 39.3 s starting at offset 187 s — root frame loadOpenClawPlugins (one full plugin-registry resolution).
  • 50.5 s starting at offset 226 s — root frame createOpenClawCodingTools. Three generate-tool factories each spin up a runEmbeddedPiAgent, and each pi-agent re-resolves the registry, which calls prepareBundledPluginRuntimeRoot once per plugin all over again.

Hot stack at the deepest sample:

... ← shouldMaterializeBundledRuntimeMirrorDistFile / materializeBundledRuntimeMirrorDistFile
    ← mirrorBundledRuntimeDistRootEntries
    ← prepareBundledPluginRuntimeDistMirror
    ← (anonymous)
    ← withBundledRuntimeDepsFilesystemLock
    ← mirrorBundledPluginRuntimeRoot
    ← prepareBundledPluginRuntimeRoot   (per plugin, ×114)
    ← loadOpenClawPlugins | createVideoGenerateTool / createImageGenerateTool / createMusicGenerateTool

Root cause analysis

The agent-run path triggers loadOpenClawPlugins (a synchronous, per-plugin loop) four times during a single first send: once via resolveRuntimePluginRegistry for the runtime context, and three more times via the video/image/music tool factories where each tool spawns a runEmbeddedPiAgent that re-resolves the registry. Each of those four resolutions calls prepareBundledPluginRuntimeRoot once per plugin (~114 plugins on a default install).

Inside prepareBundledPluginRuntimeRoot, mirrorBundledRuntimeDistRootEntries walks the entire dist/ top level (~2760 .js files, ~24 MB on disk) every time. There is no "this install root has already been mirrored in this process" guard.

Specific issues we identified while reading the source and reproducing the slowness:

1. Everything in the chain is function, not async function. All file IO is *Sync:

  • readFileSync, lstatSync, realpathSync, linkSync, rmSync, mkdirSync, existsSync, readdirSync, readlinkSync, copyFileSync, symlinkSync, renameSync, writeFileSync, chmodSync, statSync.
  • Even the file-system lock waits use a hard thread block: Atomics.wait(new Int32Array(new SharedArrayBuffer(4)), 0, 0, ms) (src/plugins/bundled-runtime-deps.ts:426).

A single sample of a prepareBundledPluginRuntimeRoot call therefore parks the event loop for the entire wall-clock cost of all the syscalls put together.

2. No process-level dedup across plugins for the dist-root walk. The same (installRoot, sourceDistRoot) pair gets walked once per plugin. With 114 plugins × 4 invocations per first send, that is ~456 full dist/ sweeps per first send, even though the walk is logically idempotent for a fixed source dist.

3. materializeBundledRuntimeMirrorDistFile's early-return is broken in the steady state. At src/plugins/bundled-runtime-deps.ts:171:

try {
  if (
    fs.realpathSync(sourcePath) === fs.realpathSync(targetPath) &&
    !fs.lstatSync(targetPath).isSymbolicLink()
  ) {
    return;
  }
} catch {}
fs.mkdirSync(path.dirname(targetPath), { recursive: true, mode: 0o755 });
fs.rmSync(targetPath, { recursive: true, force: true });
try {
  fs.linkSync(sourcePath, targetPath);
  ...
}

For a hardlink target, realpathSync(target) returns the target's own canonical path, not the source path — realpath does not "deduplicate" hardlinks. So the equality fails even when the target is already the correct hardlink that points at the same inode as the source. We always fall through to rmSync(target) + linkSync(source, target), rewriting the same hardlink with no functional change. On a default install this happens to ~462 dist-root files on every single sweep.

The intent is "skip if target already mirrors source"; the actual condition should compare (dev, ino) from lstatSync, since two paths that share (dev, ino) already point to the same on-disk content.

4. The shipped 2026.4.26 build has no shouldMaterializeBundledRuntimeMirrorDistFile cache. The main branch adds bundledRuntimeMirrorMaterializeCache keyed by stat signature, which would short-circuit subsequent calls in the same process. The shipped compiled module reads + regex-tests every file every time:

function shouldMaterializeBundledRuntimeMirrorDistFile(sourcePath) {
  if (!BUNDLED_RUNTIME_MIRROR_MATERIALIZED_EXTENSIONS.has(path.extname(sourcePath))) return false;
  try {
    return BUNDLED_RUNTIME_MIRROR_PLUGIN_REGION_RE.test(fs.readFileSync(sourcePath, "utf8"));
  } catch { return false; }
}

This alone accounts for ~24 MB of readFileSync per sweep, repeated per plugin. After the cache lands in a future release this cost goes away, but issues 1, 2, and 3 above still produce 462 unconditional unlink + linkSync per plugin per sweep.

Reproduction (independent bench)

Three Node ESM scripts, no gateway needed. They directly import the compiled prepareBundledPluginRuntimeRoot from the installed openclaw and run sweeps against the real install root. Mirror is idempotent so the install root state after the bench is functionally identical to before it.

  • bench-mirror-real.mjs — restores the dist-root hardlinks back to symlinks (mimicking just-installed state) and runs prepareBundledPluginRuntimeRoot for several plugins in sequence. Reports per-call duration and a 200 ms tick lag probe summary.
  • bench-mirror-classes.mjs — runs four full 114-plugin sweeps in a row, mirroring the four sweeps observed in the CPU profile. Outputs per-round totals and per-plugin top-N timing.
  • bench-mirror-direct.mjs — independent reimplementation of the algorithm, tests sub-steps in isolation against a /tmp staging dir.

Steady-state warm-fs results on a WSL2 ext4 host (114 bundled plugins on disk):

What Time
One full 114-plugin sweep (steady state) ~15 seconds
Average per-plugin prepareBundledPluginRuntimeRoot ~130 ms
Four sweeps in a row (matches the four-sweep first-send pattern) ~60 seconds
Hottest sub-step inside a sweep mirrorBundledRuntimeDistRootEntries (462 unlink+linkSync + 2298 existsSync + per-call regex over 2760 files)
Slowest per-plugin call (steady state, plugin with the largest dependency closure) ~290 ms

Cold fs page cache (the production gateway scenario) inflates the first sweep from ~15 s to ~39 s and adds ~8 s of GC pressure across the four-sweep run. That gets us from 60 s warm to the 89.5 s observed.

The bench scripts are short (~150 lines each) and self-contained; happy to attach as a gist if useful.

Suggested fix

Convert the entire chain to fs.promises.* + await, and add per-process memoization for the dist-root mirror operation. Concretely:

  1. Async-ify the hot path. Make prepareBundledPluginRuntimeRoot / mirrorBundledPluginRuntimeRoot / prepareBundledPluginRuntimeDistMirror / mirrorBundledRuntimeDistRootEntries / refreshBundledPluginRuntimeMirrorRoot / copyBundledPluginRuntimeRoot / fingerprintBundledRuntimeMirrorSourceRoot / hashBundledRuntimeMirrorDirectory async and replace every *Sync call with the promise variant.

  2. Replace the synchronous lock with an async-friendly one. withBundledRuntimeDepsFilesystemLock can wrap the existing fs.mkdirSync(lockDir) acquisition behind an in-process Mutex (e.g., async-mutex) and await the work inside. The lock-dir acquisition itself can stay sync; the long-running work inside it must not.

  3. Yield to the event loop at directory boundaries. Inside hashBundledRuntimeMirrorDirectory and copyBundledPluginRuntimeRoot, await new Promise(setImmediate) once per directory (or per N entries) so a single large plugin tree cannot starve the loop. SHA-256 hashing has no async fs primitive, so this pattern is what keeps the hot loop cooperative.

  4. Memoize the dist-root mirror result by (installRoot, sourceDistRoot, source dist mtime). Once the dist root is mirrored in the current process, all subsequent plugins on the same source dist skip the per-plugin re-walk entirely. This collapses ~456 sweeps per first send to one.

  5. Fix materializeBundledRuntimeMirrorDistFile's early-return. Compare (dev, ino) from lstatSync(source) and lstatSync(target) instead of realpathSync equality. Two paths that share (dev, ino) are already the same file on disk; that is the actual condition the rewrite is trying to avoid.

  6. Land the existing bundledRuntimeMirrorMaterializeCache from main in a patch release, even ahead of the async migration. It cuts ~24 MB of readFileSync per sweep down to one stat per file in the steady state.

Items 4 and 5 alone collapse the 89.5 s observed block to roughly the cost of a single sweep (~15 s warm / ~39 s cold). Adding async + setImmediate yield (items 1–3) makes the remaining work non-blocking — events and other RPCs on the same gateway can interleave.

Regression evidence

src/plugins/bundled-runtime-root.ts was added on 2026-04-22 in commit 9c733956c0 ("fix(plugins): repair bundled deps on activation"), and src/plugins/bundled-runtime-mirror.ts on 2026-04-27 in commit 6f09039b0c ("fix(plugins): reuse unchanged runtime mirrors"). Together with ~8 follow-up fix(plugins): ... commits over the next two days, this code path replaced a much lighter pre-existing approach.

Earlier OpenClaw versions blocked the main thread on gateway restart but not for 80+ seconds, and openclaw logs --follow could still attach during the block. Both regressed in 2026.4.22+: on 2026.4.26, openclaw logs --follow cannot attach for the full duration of the first agent-run block.

This issue is related to but distinct from #74325 (gateway restart 75 s block). #74325 is dominated by mDNS / os.networkInterfaces() polling during gateway startup; this issue covers the bundled-plugin-mirror path that fires per agent run, with os.networkInterfaces() accounting for 0% of the lag window measured here.

OpenClaw version

2026.4.26

Operating system

Ubuntu 22.04 on WSL2 (kernel 6.6.87.2-microsoft-standard-WSL2), Node.js 22.21.1, ext4. The cold-fs amplification is platform-dependent (WSL2's seccomp_do_user_notification adds ~10 ms per os.networkInterfaces() call, but no impact on the mirror path here); the warm-fs ~15 s/sweep baseline reproduces on any platform with a default 114-plugin install.

Model

N/A — reproduces before any model call; the lag is between phase-1 ack and the first model event.

Provider / routing chain

N/A

Install method

npm global (npm install -g openclaw)

Logs, screenshots, and evidence

Plugin-side lag probe summary from one observed first send (timestamps from CoClaw's RPC/main-thread tracing):

gw.send agent
gw.recv accepted                       ← phase-1 ack 15ms
... (5.9s)   lag.spike +5903ms
dc.recv coclaw.sessions.getById        ← unrelated RPC slips through during a brief unblock
... (32.5s)  lag.spike +32487ms
... (50.0s)  lag.spike +49860ms
gw.recv event agent                    ← first model event finally arrives
lag.summary dur=89500ms ticks=5 max=49860ms over100=4

CPU profile call attribution is in the "Actual behavior" section above. Full .cpuprofile file (10 ms sampling, ~6 MB) and the bench scripts available on request.

Additional information

Reproduction was done by directly importing the compiled prepareBundledPluginRuntimeRoot from ~/.nvm/.../openclaw/dist/bundled-runtime-root-DEMD7-O_.js — same code path the gateway runs, just isolated from network and pi-agent overhead. Steady-state warm-fs numbers (~15 s/sweep) are a hard lower bound; production blocking will always be at least this much per sweep.

We are happy to provide the cpuprofile, the bench scripts, and any further measurements that help. We would also be happy to test a candidate fix on our setup.


Reported by the CoClaw team.
This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions