Environment
- OpenClaw: 2026.4.26 (be8c246)
- Node: v24.15.0
- OS: Ubuntu 24.04 (6.8.0-110-generic, x86_64)
- Deployment:
openclaw-gateway.service via systemd user unit
Symptom
openclaw gateway start takes ~67s from systemd start to [gateway] ready. The 55-60s window is silent (no logs). App is unusable during this period.
Reproduction
Any cold start of the gateway with channels configured:
time systemctl --user restart openclaw-gateway
# watch journalctl; ~55s gap between "[hooks] loaded" and "[gateway] ready"
Config that triggers it: channels.telegram.enabled = true (though telegram is not the root cause — see bisect below).
Instrumentation
Built-in startup trace (OPENCLAW_GATEWAY_STARTUP_TRACE=1)
| Stage |
Duration |
eventLoopMax |
plugins.bootstrap |
2721ms |
— |
sidecars.session-locks |
4.5ms |
0ms |
sidecars.gmail-watch |
0.1ms |
0ms |
sidecars.gmail-model |
0.2ms |
0ms |
sidecars.internal-hooks |
1882ms |
36ms |
sidecars.channels |
54 829ms |
22 029ms |
sidecars.plugin-services |
379ms |
372ms |
sidecars.memory |
0.1ms |
0ms |
sidecars.total |
57 128ms |
— |
ready |
1.3ms |
0ms |
eventLoopMax = 22 029ms means the JS event loop was synchronously blocked for 22 seconds at one point — not a network timeout.
Bisect
| Run |
sidecars.channels |
eventLoopMax |
| baseline |
54 829ms |
22 029ms |
OPENCLAW_SKIP_CHANNELS=1 |
1.9ms |
0ms |
OPENCLAW_TELEGRAM_DISABLE_AUTO_SELECT_FAMILY=1 + OPENCLAW_TELEGRAM_DNS_RESULT_ORDER=ipv4first |
54 456ms |
22 029ms |
channels.telegram.enabled = false (in config) |
54 406ms |
21 760ms |
Telegram is not the cause. Disabling telegram or any DNS hardening has zero effect. Skipping the entire channels block (OPENCLAW_SKIP_CHANNELS=1) eliminates the hang.
V8 CPU profile (node --prof)
Top JavaScript hot frames (ticks = share of 74s profiling window):
2877 ticks (4.3%) json5/lib/parse.js *parse
2670 ticks (4.0%) json5/lib/parse.js *beforePropertyValue
2328 ticks (3.5%) json5/lib/parse.js *string
Bottom-up callchain through the hot json5 frames:
loadPluginManifest dist/manifest-DkU_xlZi.js:1166
← loadPluginManifestRegistry dist/manifest-registry-CXpW6f0a.js:341 (57.7%)
← discoverInDirectory dist/discovery-CRcfnviq.js:481
← loadOpenClawPlugins dist/loader--FR-1ZCZ.js:2903
Also significant:
collectRuntimePackageWildcardImportTargets / isPathInside / boundary-path — synchronous path resolution inside the discovery loop
- 2275 ticks in
node:path resolve driven by boundary checks
Top C++ (syscall view)
| Syscall |
Ticks |
% of C++ |
syscall |
5639 |
12.0% |
__open |
2189 |
4.7% |
access |
2126 |
4.5% |
__read |
1805 |
3.8% |
getdents64 |
198 |
0.4% |
Heavy synchronous filesystem walk — opening, statting, and reading many files on the critical path.
Root Cause Hypothesis
sidecars.channels calls prewarmConfiguredPrimaryModel before startChannels(). prewarmConfiguredPrimaryModel calls ensureOpenClawModelsJson → getCurrentPluginMetadataSnapshot → triggers a full plugin manifest discovery walk (the same work plugins.bootstrap already did 50s earlier). Discovery synchronously opens every plugin's package.json/manifest, json5-parses it, and canonicalizes paths — blocking the event loop for ~22s and taking ~55s wall time.
The prewarm is also active even when the primary model (google/gemini-3.1-flash-lite-preview) passes through a non-pi harness. The three early-exit guards (isConfiguredCliBackendPrimary, isCliProvider, selectAgentHarness().id !== "pi") are checked after the 7-module Promise.all import and the discovery-triggering ensureOpenClawModelsJson, so non-pi models still pay the full cost.
Things That Didn't Help
OPENCLAW_TELEGRAM_DISABLE_AUTO_SELECT_FAMILY=1 / OPENCLAW_TELEGRAM_DNS_RESULT_ORDER=ipv4first — no effect
- Disabling the telegram channel in config — no effect
- Node 24 (upgraded from v22) — no effect
Workaround
Add to the systemd unit:
Environment=OPENCLAW_SKIP_CHANNELS=1
Drops sidecars.channels from 54 829ms to 1.9ms; total cold start goes from ~68s to ~14s. Channels and telegram are disabled.
Suggested Fixes
-
Re-order gates in prewarmConfiguredPrimaryModel (server.impl-*:8428): check isConfiguredCliBackendPrimary / isCliProvider / selectAgentHarness().id !== "pi" before the Promise.all import block and before calling ensureOpenClawModelsJson. Non-pi providers (google, openai, custom) should return immediately with zero discovery work.
-
Reuse the plugins.bootstrap snapshot in getCurrentPluginMetadataSnapshot: the full discovery already ran once (2.7s at plugins.bootstrap). The result should be cached in a process-singleton that ensureOpenClawModelsJson reads rather than re-discovering. The MODELS_JSON_STATE.readyCache fingerprint cache is keyed per targetPath, but the underlying plugin metadata scan runs unconditionally on a cache miss.
-
Break the sync discovery loop: discoverInDirectory + loadPluginManifest pin the event loop for 22s in a tight synchronous loop. Inserting await new Promise(r => setImmediate(r)) between manifest reads, or moving discovery to a worker thread, would allow the rest of startup to interleave and would prevent starving incoming WS connections.
Environment
openclaw-gateway.servicevia systemd user unitSymptom
openclaw gateway starttakes ~67s from systemd start to[gateway] ready. The 55-60s window is silent (no logs). App is unusable during this period.Reproduction
Any cold start of the gateway with channels configured:
Config that triggers it:
channels.telegram.enabled = true(though telegram is not the root cause — see bisect below).Instrumentation
Built-in startup trace (
OPENCLAW_GATEWAY_STARTUP_TRACE=1)plugins.bootstrapsidecars.session-lockssidecars.gmail-watchsidecars.gmail-modelsidecars.internal-hookssidecars.channelssidecars.plugin-servicessidecars.memorysidecars.totalreadyeventLoopMax = 22 029msmeans the JS event loop was synchronously blocked for 22 seconds at one point — not a network timeout.Bisect
sidecars.channelsOPENCLAW_SKIP_CHANNELS=1OPENCLAW_TELEGRAM_DISABLE_AUTO_SELECT_FAMILY=1+OPENCLAW_TELEGRAM_DNS_RESULT_ORDER=ipv4firstchannels.telegram.enabled = false(in config)Telegram is not the cause. Disabling telegram or any DNS hardening has zero effect. Skipping the entire channels block (
OPENCLAW_SKIP_CHANNELS=1) eliminates the hang.V8 CPU profile (
node --prof)Top JavaScript hot frames (ticks = share of 74s profiling window):
Bottom-up callchain through the hot json5 frames:
Also significant:
collectRuntimePackageWildcardImportTargets/isPathInside/boundary-path— synchronous path resolution inside the discovery loopnode:path resolvedriven by boundary checksTop C++ (syscall view)
syscall__openaccess__readgetdents64Heavy synchronous filesystem walk — opening, statting, and reading many files on the critical path.
Root Cause Hypothesis
sidecars.channelscallsprewarmConfiguredPrimaryModelbeforestartChannels().prewarmConfiguredPrimaryModelcallsensureOpenClawModelsJson→getCurrentPluginMetadataSnapshot→ triggers a full plugin manifest discovery walk (the same workplugins.bootstrapalready did 50s earlier). Discovery synchronously opens every plugin'spackage.json/manifest, json5-parses it, and canonicalizes paths — blocking the event loop for ~22s and taking ~55s wall time.The prewarm is also active even when the primary model (
google/gemini-3.1-flash-lite-preview) passes through a non-pi harness. The three early-exit guards (isConfiguredCliBackendPrimary,isCliProvider,selectAgentHarness().id !== "pi") are checked after the 7-modulePromise.allimport and the discovery-triggeringensureOpenClawModelsJson, so non-pi models still pay the full cost.Things That Didn't Help
OPENCLAW_TELEGRAM_DISABLE_AUTO_SELECT_FAMILY=1/OPENCLAW_TELEGRAM_DNS_RESULT_ORDER=ipv4first— no effectWorkaround
Add to the systemd unit:
Drops
sidecars.channelsfrom 54 829ms to 1.9ms; total cold start goes from ~68s to ~14s. Channels and telegram are disabled.Suggested Fixes
Re-order gates in
prewarmConfiguredPrimaryModel(server.impl-*:8428): checkisConfiguredCliBackendPrimary/isCliProvider/selectAgentHarness().id !== "pi"before thePromise.allimport block and before callingensureOpenClawModelsJson. Non-pi providers (google, openai, custom) should return immediately with zero discovery work.Reuse the
plugins.bootstrapsnapshot ingetCurrentPluginMetadataSnapshot: the full discovery already ran once (2.7s atplugins.bootstrap). The result should be cached in a process-singleton thatensureOpenClawModelsJsonreads rather than re-discovering. TheMODELS_JSON_STATE.readyCachefingerprint cache is keyed pertargetPath, but the underlying plugin metadata scan runs unconditionally on a cache miss.Break the sync discovery loop:
discoverInDirectory+loadPluginManifestpin the event loop for 22s in a tight synchronous loop. Insertingawait new Promise(r => setImmediate(r))between manifest reads, or moving discovery to a worker thread, would allow the rest of startup to interleave and would prevent starving incoming WS connections.