Bug type
Behavior bug
Summary
On cold start, dashboard / UI clients issue 9–10 RPCs concurrently against the gateway. Two independent issues cause this fanout to take 1.3–2.7 s instead of completing in parallel:
- (A)
"tts.status" is declared async but contains zero await expressions. It runs ~1.5 s of synchronous code (TTS config resolution, provider scanning, plus a synchronous readFileSync inside readPrefs) before returning, monopolizing the event loop and starving every sibling handler on the same connection.
- (B)
applyPluginAutoEnable(...) is invoked 8 times per fanout with the same config object reference and the same process.env — ~75 ms × 8 ≈ 600 ms of redundant pure-CPU work.
Together these account for ~2.1 s of avoidable main-thread occupancy on every cold-start fanout. They are logically independent and can be addressed in separate PRs.
Steps to reproduce
- Fresh-start the gateway:
openclaw gateway restart.
- Load the gateway dashboard (or any UI/MCP client that issues the standard read-only fanout).
- The dashboard typically issues:
sessions.list, status, models.list, usage.cost, tts.status, channels.status, tools.catalog × N agents — all in the same WebSocket frame batch.
- Measure per-RPC latency in the client (or instrument the handlers with
hrtime probes).
Expected behavior
- Independent read-only RPCs should complete concurrently; no single handler should block sibling RPCs sharing the same connection.
- Pure, deterministic helpers like
applyPluginAutoEnable should not recompute the same answer 8 times for the same input within a single fanout.
Actual behavior
Measured cold-start fanout on v2026.5.7, gateway freshly restarted, single dashboard load:
| Handler |
RESP time |
tts.status |
1566 ms |
channels.status |
646 ms (handler ENTER deferred ~1.5 s after WS frame arrival) |
models.list |
2177 ms |
status |
2296 ms |
usage.cost |
2592 ms |
sessions.list |
2662 ms |
tools.catalog × 3 |
186 / 216 / 224 ms (serialized, back-to-back) |
Total wall time ~2.7 s. A main-thread heartbeat probe (5 ms setTimeout, alerts when the gap exceeds 80 ms) fires continuously across the entire 2.7 s window — the event loop never yields.
Bug (A) — tts.status handler synchronously blocks the event loop ~1.5 s
Source: src/gateway/server-methods/tts.ts:29
"tts.status": async ({ respond, context }) => {
try {
const cfg = context.getRuntimeConfig();
const config = resolveTtsConfig(cfg); // ~200 ms
const prefsPath = resolveTtsPrefsPath(config);
const provider = getTtsProvider(config, prefsPath); // ~347 ms (readPrefs → readFileSync)
const persona = getTtsPersona(config, prefsPath);
const autoMode = resolveTtsAutoMode({ config, prefsPath });
const fallbackProviders = resolveTtsProviderOrder(provider, cfg)
.slice(1)
.filter((c) => isTtsProviderConfigured(config, c, cfg)); // ~905 ms (15 providers × isConfigured)
const providerStates = listSpeechProviders(cfg).map(/* isConfigured per provider */); // ~114 ms
respond(true, { /* ... */ });
} catch (err) { /* ... */ }
}
The handler is async, but the body contains no await expression. Every helper invoked is synchronous; several call readFileSync (readPrefs in extensions/speech-core/runtime-api.ts) or do synchronous provider enumeration via isConfigured. The handler therefore executes ~1.5 s of pure synchronous CPU + sync I/O on the event-loop thread before returning — no microtask interleaves during this window.
Per-segment probe data (cold-start, gateway-restarted run):
HND tts.status ENTER @0.0 ms
TS after getRuntimeConfig @0.1 ms
TS after resolveTtsConfig @198.8 ms ← 199 ms
TS after resolveTtsPrefsPath @199.0 ms
TS after getTtsProvider @546.0 ms ← 347 ms (readFileSync inside readPrefs)
TS after getTtsPersona @546.1 ms
TS after resolveTtsAutoMode @546.3 ms
TS after fallbackProviders @1451.3 ms ← 905 ms (slowest segment)
TS after providerStates @1565.8 ms ← 114 ms
HND tts.status RESP +1565.8 ms
Because tts.status enters its handler in the same tick as four sibling handlers (sessions.list, status, models.list, usage.cost) but never yields, all sibling handlers' awaits resolve only after tts.status returns. The dashboard's channels.status request, which arrived in the same WS frame batch, does not even enter its handler until 1.5 s after the others. This single handler accounts for the entire "front-block" segment of the cold-start fanout.
Suggested fixes (any subset would help, in roughly descending impact):
- Convert the synchronous I/O helpers to async (
readPrefs → fs.promises.readFile) and await them — yielding several times during the handler's execution.
- Parallelize
isConfigured across providers (each call is independent of the others) via Promise.all. The current .filter(...isTtsProviderConfigured) is the single largest segment (~900 ms across 15 providers).
- Cache
isConfigured(provider, cfg) for the lifetime of a single cfg reference — useful because both fallbackProviders and providerStates enumerate the same providers back-to-back.
- Even as a stopgap, insert
await Promise.resolve() between the heavy synchronous segments to let sibling handlers interleave.
Bug (B) — applyPluginAutoEnable recomputes the same result 8× per fanout
Source: src/config/plugin-auto-enable.apply.ts:34
export function applyPluginAutoEnable(params: {
config?: OpenClawConfig;
env?: NodeJS.ProcessEnv;
manifestRegistry?: PluginManifestRegistry;
}): PluginAutoEnableResult {
const candidates = detectPluginAutoEnableCandidates(params);
return materializePluginAutoEnableCandidates({
config: params.config,
candidates,
env: params.env,
manifestRegistry: params.manifestRegistry,
});
}
The function is pure on its inputs (config, env, manifestRegistry). During one dashboard fanout, it is invoked 8 times across the read-only RPC paths:
| Caller |
Call count |
channels.status (entry + getRuntimeSnapshot inside the handler) |
2 |
tools.catalog × 3 agents (each calls it twice via ensureStandalonePluginToolRegistryLoaded + resolvePluginTools) |
6 |
| Total per fanout |
8 |
Identity check via WeakMap instrumentation on the inputs:
- All 8 calls during a fanout receive the same
config object reference — context.getRuntimeConfig() returns an identity-stable snapshot within a fanout window.
- All 8 calls receive
params.env === process.env (same identity).
So every call recomputes an answer that already exists. Single-call cost is ~75 ms (≈55 ms detect + ≈22 ms materialize), giving 8 × 75 ms ≈ 600 ms of redundant synchronous CPU per fanout.
Suggested fix — two-level WeakMap keyed on object identity:
const cache = new WeakMap<object, WeakMap<object, PluginAutoEnableResult>>();
export function applyPluginAutoEnable(params) {
const config = params.config;
const env = params.env;
if (config && env) {
let inner = cache.get(config);
if (!inner) { inner = new WeakMap(); cache.set(config, inner); }
const hit = inner.get(env);
if (hit) return hit;
const result = computeAutoEnable(params);
inner.set(env, result);
return result;
}
return computeAutoEnable(params);
}
Because both keys are WeakMap-able objects, entries are collected automatically when a new runtime config snapshot rotates in. manifestRegistry is identity-stable for the same config in our measurements, so the two-level key on (config, env) is sufficient; a single-level WeakMap<config, result> would also work in practice and is even simpler.
Measured hit rate on a real fanout: 7 of 8 calls become cache hits, saving ~525 ms.
OpenClaw version
2026.5.7 (commit eeef486449)
Operating system
WSL2 (Ubuntu 24.04 on Windows 11), Node.js v22.21.1
Model
N/A
Provider / routing chain
N/A
Install method
npm install -g openclaw (running as a systemd user service)
Logs, screenshots, and evidence
All latency numbers above come from hrtime probes inserted at the handler call sites in a freshly restarted gateway during a single dashboard load. No sensitive paths or credentials are included.
Additional information
The two bugs compound: while tts.status holds the event loop for ~1.5 s, sibling handlers' lazy-import I/O (status → loadStatusSummaryRuntimeModule, models.list → loadModelsListCatalog, etc.) can resolve I/O in the background, but their resumed microtasks queue up behind tts.status. Once tts.status returns, the siblings all resolve nearly simultaneously and immediately encounter the redundant applyPluginAutoEnable work along the channels.status and tools.catalog paths.
Estimated impact of fixing both bugs (extrapolated from the probe data, not measured under a patched build):
- Fix (A) alone: cold-start fanout total drops from ~2.7 s to ~1.2 s (siblings can finally overlap).
- Fix (A) + (B): drops to ~500–700 ms.
These two issues are logically independent — they share only the surface symptom ("dashboard cold start feels slow"), not their root cause. We are happy to split them into separate issues if that better fits OpenClaw's triage workflow.
Reported by the CoClaw team.
This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.
Bug type
Behavior bug
Summary
On cold start, dashboard / UI clients issue 9–10 RPCs concurrently against the gateway. Two independent issues cause this fanout to take 1.3–2.7 s instead of completing in parallel:
"tts.status"is declaredasyncbut contains zeroawaitexpressions. It runs ~1.5 s of synchronous code (TTS config resolution, provider scanning, plus a synchronousreadFileSyncinsidereadPrefs) before returning, monopolizing the event loop and starving every sibling handler on the same connection.applyPluginAutoEnable(...)is invoked 8 times per fanout with the sameconfigobject reference and the sameprocess.env— ~75 ms × 8 ≈ 600 ms of redundant pure-CPU work.Together these account for ~2.1 s of avoidable main-thread occupancy on every cold-start fanout. They are logically independent and can be addressed in separate PRs.
Steps to reproduce
openclaw gateway restart.sessions.list,status,models.list,usage.cost,tts.status,channels.status,tools.catalog× N agents — all in the same WebSocket frame batch.hrtimeprobes).Expected behavior
applyPluginAutoEnableshould not recompute the same answer 8 times for the same input within a single fanout.Actual behavior
Measured cold-start fanout on
v2026.5.7, gateway freshly restarted, single dashboard load:tts.statuschannels.statusmodels.liststatususage.costsessions.listtools.catalog× 3Total wall time ~2.7 s. A main-thread heartbeat probe (5 ms
setTimeout, alerts when the gap exceeds 80 ms) fires continuously across the entire 2.7 s window — the event loop never yields.Bug (A) —
tts.statushandler synchronously blocks the event loop ~1.5 sSource:
src/gateway/server-methods/tts.ts:29The handler is
async, but the body contains noawaitexpression. Every helper invoked is synchronous; several callreadFileSync(readPrefsinextensions/speech-core/runtime-api.ts) or do synchronous provider enumeration viaisConfigured. The handler therefore executes ~1.5 s of pure synchronous CPU + sync I/O on the event-loop thread before returning — no microtask interleaves during this window.Per-segment probe data (cold-start, gateway-restarted run):
Because
tts.statusenters its handler in the same tick as four sibling handlers (sessions.list,status,models.list,usage.cost) but never yields, all sibling handlers' awaits resolve only aftertts.statusreturns. The dashboard'schannels.statusrequest, which arrived in the same WS frame batch, does not even enter its handler until 1.5 s after the others. This single handler accounts for the entire "front-block" segment of the cold-start fanout.Suggested fixes (any subset would help, in roughly descending impact):
readPrefs→fs.promises.readFile) andawaitthem — yielding several times during the handler's execution.isConfiguredacross providers (each call is independent of the others) viaPromise.all. The current.filter(...isTtsProviderConfigured)is the single largest segment (~900 ms across 15 providers).isConfigured(provider, cfg)for the lifetime of a singlecfgreference — useful because bothfallbackProvidersandproviderStatesenumerate the same providers back-to-back.await Promise.resolve()between the heavy synchronous segments to let sibling handlers interleave.Bug (B) —
applyPluginAutoEnablerecomputes the same result 8× per fanoutSource:
src/config/plugin-auto-enable.apply.ts:34The function is pure on its inputs
(config, env, manifestRegistry). During one dashboard fanout, it is invoked 8 times across the read-only RPC paths:channels.status(entry +getRuntimeSnapshotinside the handler)tools.catalog× 3 agents (each calls it twice viaensureStandalonePluginToolRegistryLoaded+resolvePluginTools)Identity check via
WeakMapinstrumentation on the inputs:configobject reference —context.getRuntimeConfig()returns an identity-stable snapshot within a fanout window.params.env === process.env(same identity).So every call recomputes an answer that already exists. Single-call cost is ~75 ms (≈55 ms
detect+ ≈22 msmaterialize), giving 8 × 75 ms ≈ 600 ms of redundant synchronous CPU per fanout.Suggested fix — two-level
WeakMapkeyed on object identity:Because both keys are
WeakMap-able objects, entries are collected automatically when a new runtime config snapshot rotates in.manifestRegistryis identity-stable for the sameconfigin our measurements, so the two-level key on(config, env)is sufficient; a single-levelWeakMap<config, result>would also work in practice and is even simpler.Measured hit rate on a real fanout: 7 of 8 calls become cache hits, saving ~525 ms.
OpenClaw version
2026.5.7 (commit
eeef486449)Operating system
WSL2 (Ubuntu 24.04 on Windows 11), Node.js v22.21.1
Model
N/A
Provider / routing chain
N/A
Install method
npm install -g openclaw(running as a systemd user service)Logs, screenshots, and evidence
All latency numbers above come from
hrtimeprobes inserted at the handler call sites in a freshly restarted gateway during a single dashboard load. No sensitive paths or credentials are included.Additional information
The two bugs compound: while
tts.statusholds the event loop for ~1.5 s, sibling handlers' lazy-import I/O (status→loadStatusSummaryRuntimeModule,models.list→loadModelsListCatalog, etc.) can resolve I/O in the background, but their resumed microtasks queue up behindtts.status. Oncetts.statusreturns, the siblings all resolve nearly simultaneously and immediately encounter the redundantapplyPluginAutoEnablework along thechannels.statusandtools.catalogpaths.Estimated impact of fixing both bugs (extrapolated from the probe data, not measured under a patched build):
These two issues are logically independent — they share only the surface symptom ("dashboard cold start feels slow"), not their root cause. We are happy to split them into separate issues if that better fits OpenClaw's triage workflow.
Reported by the CoClaw team.
This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.