[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

# [Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

## Bug type

Performance regression (introduced in 4.29; not present in 4.27)

## Summary

Upgrading from 4.24/4.27 → 4.29 caused every agent dispatch to take **2–5 minutes** to first reply. The gateway log shows new `prep stages` instrumentation in 4.29 that reports each dispatch spending ~73 s of synchronous CPU work *before* the LLM is even called, with single operations blocking the Node.js event loop for over 30 seconds.

The same 13-agent workspace setup on 4.27 returns replies in <1 minute.

A separate Python-based agent runtime (Hermes) on the same machine, using the same Z.AI/MiniMax/DeepSeek API keys and same `glm-5-turbo` model, returns replies in <10 seconds — confirming the bottleneck is **inside the OpenClaw runtime**, not the LLM provider, network, or model.

## Evidence

### Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

```
[trace:embedded-run] startup stages totalMs=28630
  workspace:1ms, runtime-plugins:3ms, hooks:0ms,
  model-resolution:6794ms, auth:12471ms,
  context-engine:0ms, attempt-dispatch:11612ms

[trace:embedded-run] prep stages totalMs=73394
  workspace-sandbox:610ms, skills:0ms,
  core-plugin-tools:8765ms, bootstrap-context:8821ms,
  bundle-tools:3532ms,
  system-prompt:23317ms,            ← largest contributor
  session-resource-loader:7546ms,
  agent-session:5ms,
  stream-setup:20798ms              ← second-largest

[diagnostic] liveness warning:
  eventLoopDelayMaxMs=34024.2 ← single 34-second event-loop block
  eventLoopUtilization=1
  cpuCoreRatio=1.013
```

`prep stages` totals `73 s` and `startup stages` adds another `28 s`, so each dispatch consumes ~100 seconds of CPU time before the model even starts streaming. With CPU saturated, the fallback chain then trips fetch-timeouts cascading for another 1–3 minutes.

408 `[fetch-timeout] fetch timeout reached` log lines were observed in a 2-hour window during typical use.

### 4.27 vs 4.29 instrumentation diff

`grep prepStages.mark` returns:
- 4.29 `dist/selection-CwAy0mf2.js`: **9 hits** (workspace-sandbox, skills, core-plugin-tools, bootstrap-context, bundle-tools, system-prompt, session-resource-loader, agent-session, stream-setup)
- 4.27 `dist/selection-*.js`: **0 hits**

The new `prep stages` instrumentation is the most visible signal that dispatch flow was substantially reworked in 4.29.

### Cross-runtime baseline (same machine, same provider, same model)

| Runtime | Reply latency | Notes |
| --- | --- | --- |
| Hermes (Python) | <10 s | Same `glm-5-turbo`, same Z.AI Coding Plan key |
| OpenClaw 4.27 | <60 s | Production agents, 13 telegram channels |
| OpenClaw 4.29 | 2–5 min | Same workspace, same config |

## Reproduction steps

1. Install `openclaw@2026.4.29` with a non-trivial workspace (≥10 skills under `workspace-*/skills/`) and a Z.AI / MiniMax / DeepSeek primary model.
2. Bind a Telegram channel to one of the agents.
3. Send any short prompt (e.g. `hi`).
4. Observe in `journalctl --user -u openclaw-gateway`:
   - `prep stages totalMs >= 60000`
   - `eventLoopDelayMaxMs > 5000`
   - Reply latency 2–5 minutes
5. Downgrade to `openclaw@2026.4.27` (set `OPENCLAW_ALLOW_OLDER_BINARY_DESTRUCTIVE_ACTIONS=1`), restart gateway, repeat step 3 — reply now <60 s.

## Suspected hot paths

`dist/selection-CwAy0mf2.js` regions between the new prep stage marks:

- **`system-prompt` stage (23 s)**: `buildEmbeddedSystemPrompt` → `buildAgentSystemPrompt` (in `system-prompt-DZrkA5Mv.js:282-648`) does large synchronous string concat + XML escaping + conditional rendering of all skill metadata, with no per-(skills hash + workspace files hash) cache. `bootstrap-cache-CmO66T4a.js` only caches per-session, invalidated each dispatch.
- **`stream-setup` stage (21 s)**: covers `selection-CwAy0mf2.js:6934-7148`, including `applyExtraParamsToAgent` calls into provider runtime deps. (Not the new Google prompt cache path — `isGooglePromptCacheEligible` early-returns for non-Gemini models.)

## Impact

- Telegram bots become unusable (>2 min reply means users assume the bot is broken).
- Per-dispatch CPU saturation cascades: gateway can only handle a single request at a time without queueing.
- `[telegram] sendChatAction failed` and `typing TTL reached (2m); stopping typing indicator` appear consistently.

## Workaround in production

Pinned to `openclaw@2026.4.27` and disabled `weekly-openclaw-update.timer` to prevent auto-upgrade. Required:
- `Environment=OPENCLAW_ALLOW_OLDER_BINARY_DESTRUCTIVE_ACTIONS=1` systemd drop-in (since 4.27 refuses to start against a config last written by 4.29).
- Stripping `plugins.entries.active-memory.config` (4.27 schema rejects it as additional properties).

## Environment

- `openclaw` 2026.4.29 (regression) vs 2026.4.27 (baseline working)
- Node.js v22.22.2 (managed via nvm)
- Ubuntu 25.10 (Linux 6.17.0-22-generic)
- Gateway run via user systemd unit (`systemctl --user`)
- 13 agents, average workspace skills/ size ~3 MB, several `glm-5-turbo` / `MiniMax-M2.7` / `deepseek-v4-flash` models in fallback chains

## Suggested fix direction

1. Cache the built system prompt keyed on (skills SKILL.md hash + AGENTS.md/SOUL.md/IDENTITY.md/USER.md/MEMORY.md hashes); invalidate only when those files change. Skip `buildEmbeddedSystemPrompt` on cache hit.
2. Move CPU-bound prep work off the main event loop (worker thread or chunked yield).
3. Reduce per-dispatch work in `stream-setup` if possible (verify wrapper layers don't re-initialize per dispatch).

Happy to provide additional traces or test patches against affected files.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop #75999

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Bug type

Summary

Evidence

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

4.27 vs 4.29 instrumentation diff

Cross-runtime baseline (same machine, same provider, same model)

Reproduction steps

Suspected hot paths

Impact

Workaround in production

Environment

Suggested fix direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Runtime	Reply latency	Notes
Hermes (Python)	<10 s	Same `glm-5-turbo`, same Z.AI Coding Plan key
OpenClaw 4.27	<60 s	Production agents, 13 telegram channels
OpenClaw 4.29	2–5 min	Same workspace, same config

Uh oh!

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop #75999

Description

[Bug]: 4.29 dispatch prep stages take ~73s of synchronous CPU work, blocking event loop

Bug type

Summary

Evidence

Stage breakdown from a real 4.29 dispatch (commander, glm-5-turbo, ~5 min total)

4.27 vs 4.29 instrumentation diff

Cross-runtime baseline (same machine, same provider, same model)

Reproduction steps

Suspected hot paths

Impact

Workaround in production

Environment

Suggested fix direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions