Gateway blocks event loop 30s+ per message; bundled runtime deps re-stage every startup, manifest never persists
Summary
On a clean Linux install (Ubuntu / Linux 6.8, Node 22.22.2), openclaw gateway run blocks the Node event loop for 30+ seconds per inbound message. Reproduced across 2026.4.15, 2026.4.24, 2026.4.25, 2026.4.26, 2026.4.27, 2026.4.29. The bundled runtime deps re-stage every gateway startup and every message. The manifest never persists into a form that prevents the next staging.
The symptom for end users on Telegram: message → "typing…" → stops → starts again → reply, with 30 s – 3 min latency. Confirmed gateway-wide (Telegram + MS Teams aspire-help bot both slow).
Cleanest repro (no host state, no plugins enabled)
A brand-new --dev profile on a clean droplet:
$ openclaw --version
OpenClaw 2026.4.29 (a448042)
$ ls /root/.openclaw-dev # confirms no prior state
ls: cannot access '/root/.openclaw-dev': No such file or directory
$ openclaw --dev gateway run
2026-05-01T04:38:58 [gateway] loading configuration…
2026-05-01T04:38:58 [gateway] starting...
2026-05-01T04:39:05 [gateway] [plugins] staging bundled runtime deps before gateway startup (35 specs): ...
2026-05-01T04:40:07 [gateway] [plugins] installed bundled runtime deps before gateway startup in 61827ms
2026-05-01T04:40:07 [plugins] acpx staging bundled runtime deps (42 specs): ...
2026-05-01T04:40:18 [plugins] acpx installed bundled runtime deps in 10657ms
2026-05-01T04:40:35 [diagnostic] liveness warning: reasons=event_loop_delay
eventLoopDelayP99Ms=1370.5 eventLoopDelayMaxMs=26659
eventLoopUtilization=0.919
2026-05-01T04:40:35 [gateway] http server listening (8 plugins; 96.5s)
2026-05-01T04:40:36 [gateway] ready
96.5 seconds to "ready" with 8 default plugins, no user config, no Telegram/MSTeams. A 26-second event-loop block during the staging phase.
Production trace (one Telegram message)
Embedded fallback trace from a single inbound agent run with 5 agents configured (Anthropic + Google + Telegram + MSTeams + memory-core):
[agent/embedded] [trace:embedded-run] startup stages: phase=attempt-dispatch
totalMs=13904 stages=
workspace:2ms,
runtime-plugins:3498ms,
hooks:2ms,
model-resolution:2621ms,
auth:3886ms,
context-engine:3ms,
attempt-dispatch:3891ms
[agent/embedded] [trace:embedded-run] prep stages: phase=stream-ready
totalMs=39033 stages=
workspace-sandbox:33ms,
skills:1ms,
core-plugin-tools:16352ms, ← per-message tool re-load
bootstrap-context:85ms,
bundle-tools:1374ms,
system-prompt:7277ms,
session-resource-loader:7536ms,
agent-session:7ms,
stream-setup:6368ms
Concurrent: liveness warning eventLoopDelayMaxMs=39795.6, utilization=1.0
core-plugin-tools ≈ 16 s every message, regardless of whether the system prompt is 36 KB or 0.4 KB (verified by stripping all per-agent context files to 44-byte stubs and re-running — total prep changed by under 5%).
Why this is upstream, not host state
We ran a controlled diagnostic to rule out host-state corruption:
| Variable |
Production ~/.openclaw/ |
Fresh ~/.openclaw-dev/ |
| Existing state |
5.7 GB, 6 stacked versions |
None — directory did not exist |
| Custom config |
5 agents, 4 channels, 3 MCP servers |
None (wizard defaults) |
| Plugins enabled |
38 specs across anthropic/google/telegram/msteams/memory-core |
35 specs default + 8 plugins |
| Time to "ready" |
n/a (always running) |
96.5 s, single 26 s loop block |
Per-message core-plugin-tools |
16 s |
not measurable without agent (no auth) |
A clean profile reproduces the same multi-tens-of-seconds runtime-deps install at startup. The bug is not host-state-dependent.
Root-cause hypothesis (CPU profile evidence)
Live V8 CPU profile, 181 s window during a confirmed stall (file: cpu-profile-live-2026-04-30T04-09-35-052Z.cpuprofile, attachable):
By total wall time:
loadOpenClawPlugins — 96.9 s (53.6 %)
withBundledRuntimeDepsFilesystemLock — 53.2 s (29.4 %)
ensureBundledPluginRuntimeDeps — 36.7 s (20.3 %)
withBundledRuntimeDepsInstallRootLock — 35.2 s (19.4 %)
By self time:
child_process.spawn — 30.2 s (16.7 % of busy CPU) — actual npm install subprocess invocations.
normalizePluginLoaderAliasMapForJiti (dist/sdk-alias-DIhpBBl1.js:320) — 24 % of all samples; called from getCachedPluginJitiLoader (dist/bundled-plugin-metadata-VxOxTVqO.js:99). The "cached" function is called per-message and re-normalizes the alias map each time — V8 NameDictionary representation, dictionary key sort + per-entry path.resolve(). Hot V8 symbols around it confirm: EnumIndexComparator<NameDictionary>, GetOwnEnumPropertyDictionaryKeys, Builtins_ForInFilter, String::WriteToFlat. Plus 21 % in ConcurrentMarking::RunMajor (V8 GC) — secondary symptom of allocation churn.
The cycle that doesn't break
- Plugin requests a runtime dep not in
/root/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/.openclaw-runtime-deps.json (the static manifest).
- Gateway logs
staging bundled runtime deps... N missing.
npm install --ignore-scripts <missingSpecs> runs in installExecutionRoot.
- Install reports success in NN ms — but specs never make it back to the on-disk manifest.
- Additionally,
pruneRetainedRuntimeDepsManifestSpecs deletes anything in node_modules/ not in the manifest.
- Next startup / next message → re-detect "missing" → respawn npm → goto 1.
Note: in 2026.4.29, /root/.openclaw/plugin-runtime-deps/openclaw-2026.4.29-<hash>/.openclaw-runtime-deps.json does not exist at all — older versions had it (e.g. 2026.4.27 had 16 baked-in specs). Either the manifest format changed and the writer stopped emitting it, or it's newly absent. Plugins still want 38 specs, so 100 % are treated as missing on every cycle.
Things ruled out
- Memory cgroup / swap: raising
MemoryHigh 1G→3G eliminated 337K throttle events and 972 MB swap usage but did not change the symptom.
- Trajectory bloat: archived a 9.3 MB Telegram trajectory; no improvement.
- Auth flow / network: stalls reproduce without an outbound model call (during gateway boot itself).
- Droplet sizing: DigitalOcean Basic Regular 4GB / 2vCPU NYC3 — RAM headroom plenty.
- Host filesystem state: clean
--dev profile reproduces (above).
- Per-message context size: stripping SOUL/MEMORY/TOOLS files from 36 KB to 0.4 KB stubs changed total prep by <5 %.
- Downgrading: 2026.4.26 has the same buggy
getCachedPluginJitiLoader cache-key shape (${jitiFilename}::${params.cacheScopeKey ?? cacheKey}); only function name moved between versions. 2026.4.15 had a simpler-keyed cache that may have hit more often, but the cycle is fundamentally the same.
Environment
OpenClaw 2026.4.29 (a448042)
Node v22.22.2
Platform Linux 6.8.0-110-generic (Ubuntu 24.04)
Host DigitalOcean Basic Regular Intel, 4GB / 2vCPU, NYC3
Filesystem ext4 on /dev/vda1 (no overlayfs, not WSL/Docker Desktop)
Configured 5 agents (anthropic+google primaries), telegram+msteams channels,
3 stdio MCP servers, ~30 plugins.allow keys
Evidence (attachable on request)
In /root/projects/openclaw-perf-evidence-2026-04-30/:
cpu-profile-live-2026-04-30T04-09-35-052Z.cpuprofile (42 MB) — live V8 profile during a confirmed stall.
perf-stall2.data (2.4 MB) + perf-153801.map (V8 JIT symbols) + perf-report.txt + perf-script.txt — Linux perf record capture, full call stacks.
analyze-profile.py + capture-live-profile.mjs — capture/analysis scripts used.
Repro recipe for a maintainer
# Reproduces the 96 s startup + 26 s loop block on any Linux Node 22 host:
openclaw --dev gateway run
# Watch for: [diagnostic] liveness warning ... eventLoopDelayMaxMs=N (N > 5000)
# Watch for: installed bundled runtime deps before gateway startup in NNNNNms (N > 5000)
Possible directions
- Persist the install into the on-disk manifest so the next staging is a no-op. The 12 ms install times we see indicate
npm install itself is fine when nothing's missing — the cycle is about the manifest write being lost.
- True caching in
getCachedPluginJitiLoader — the function name implies a cache, but ${params.cacheScopeKey ?? cacheKey} with different callers passing different cacheScopeKeys effectively bypasses it.
- Hoist
ensureBundledPluginRuntimeDeps out of the per-message path entirely. It belongs at startup or after a config change, not on every Telegram message.
Happy to capture more profiles, run patched builds, or test a candidate fix on this droplet.
Gateway blocks event loop 30s+ per message; bundled runtime deps re-stage every startup, manifest never persists
Summary
On a clean Linux install (Ubuntu / Linux 6.8, Node 22.22.2),
openclaw gateway runblocks the Node event loop for 30+ seconds per inbound message. Reproduced across 2026.4.15, 2026.4.24, 2026.4.25, 2026.4.26, 2026.4.27, 2026.4.29. The bundled runtime deps re-stage every gateway startup and every message. The manifest never persists into a form that prevents the next staging.The symptom for end users on Telegram: message → "typing…" → stops → starts again → reply, with 30 s – 3 min latency. Confirmed gateway-wide (Telegram + MS Teams aspire-help bot both slow).
Cleanest repro (no host state, no plugins enabled)
A brand-new
--devprofile on a clean droplet:96.5 seconds to "ready" with 8 default plugins, no user config, no Telegram/MSTeams. A 26-second event-loop block during the staging phase.
Production trace (one Telegram message)
Embedded fallback trace from a single inbound agent run with 5 agents configured (Anthropic + Google + Telegram + MSTeams + memory-core):
core-plugin-tools≈ 16 s every message, regardless of whether the system prompt is 36 KB or 0.4 KB (verified by stripping all per-agent context files to 44-byte stubs and re-running — total prep changed by under 5%).Why this is upstream, not host state
We ran a controlled diagnostic to rule out host-state corruption:
~/.openclaw/~/.openclaw-dev/core-plugin-toolsA clean profile reproduces the same multi-tens-of-seconds runtime-deps install at startup. The bug is not host-state-dependent.
Root-cause hypothesis (CPU profile evidence)
Live V8 CPU profile, 181 s window during a confirmed stall (file:
cpu-profile-live-2026-04-30T04-09-35-052Z.cpuprofile, attachable):By total wall time:
loadOpenClawPlugins— 96.9 s (53.6 %)withBundledRuntimeDepsFilesystemLock— 53.2 s (29.4 %)ensureBundledPluginRuntimeDeps— 36.7 s (20.3 %)withBundledRuntimeDepsInstallRootLock— 35.2 s (19.4 %)By self time:
child_process.spawn— 30.2 s (16.7 % of busy CPU) — actualnpm installsubprocess invocations.normalizePluginLoaderAliasMapForJiti(dist/sdk-alias-DIhpBBl1.js:320) — 24 % of all samples; called fromgetCachedPluginJitiLoader(dist/bundled-plugin-metadata-VxOxTVqO.js:99). The "cached" function is called per-message and re-normalizes the alias map each time — V8 NameDictionary representation, dictionary key sort + per-entrypath.resolve(). Hot V8 symbols around it confirm:EnumIndexComparator<NameDictionary>,GetOwnEnumPropertyDictionaryKeys,Builtins_ForInFilter,String::WriteToFlat. Plus 21 % inConcurrentMarking::RunMajor(V8 GC) — secondary symptom of allocation churn.The cycle that doesn't break
/root/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/.openclaw-runtime-deps.json(the static manifest).staging bundled runtime deps... N missing.npm install --ignore-scripts <missingSpecs>runs ininstallExecutionRoot.pruneRetainedRuntimeDepsManifestSpecsdeletes anything innode_modules/not in the manifest.Note: in 2026.4.29,
/root/.openclaw/plugin-runtime-deps/openclaw-2026.4.29-<hash>/.openclaw-runtime-deps.jsondoes not exist at all — older versions had it (e.g. 2026.4.27 had 16 baked-in specs). Either the manifest format changed and the writer stopped emitting it, or it's newly absent. Plugins still want 38 specs, so 100 % are treated as missing on every cycle.Things ruled out
MemoryHigh1G→3G eliminated 337K throttle events and 972 MB swap usage but did not change the symptom.--devprofile reproduces (above).getCachedPluginJitiLoadercache-key shape (${jitiFilename}::${params.cacheScopeKey ?? cacheKey}); only function name moved between versions. 2026.4.15 had a simpler-keyed cache that may have hit more often, but the cycle is fundamentally the same.Environment
Evidence (attachable on request)
In
/root/projects/openclaw-perf-evidence-2026-04-30/:cpu-profile-live-2026-04-30T04-09-35-052Z.cpuprofile(42 MB) — live V8 profile during a confirmed stall.perf-stall2.data(2.4 MB) +perf-153801.map(V8 JIT symbols) +perf-report.txt+perf-script.txt— Linuxperf recordcapture, full call stacks.analyze-profile.py+capture-live-profile.mjs— capture/analysis scripts used.Repro recipe for a maintainer
Possible directions
npm installitself is fine when nothing's missing — the cycle is about the manifest write being lost.getCachedPluginJitiLoader— the function name implies a cache, but${params.cacheScopeKey ?? cacheKey}with different callers passing differentcacheScopeKeys effectively bypasses it.ensureBundledPluginRuntimeDepsout of the per-message path entirely. It belongs at startup or after a config change, not on every Telegram message.Happy to capture more profiles, run patched builds, or test a candidate fix on this droplet.