Skip to content

Agent bootstrap takes 3+ minutes per turn; core-plugin-tools, system-prompt, stream-setup each 45-75s #77532

@Maxtonairay

Description

@Maxtonairay

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Every agent turn takes 3-5 minutes to start streaming on a clean install; trace logs show core-plugin-tools (~73s), system-prompt (~45s), and stream-setup (~45s) as the dominant costs on every turn, unaffected by workspace file size reductions.

Steps to reproduce

  1. Install OpenClaw 2026.4.29 globally via npm (npm install -g openclaw@latest) on Windows 10.
  2. Configure a single Anthropic agent (claude-sonnet-4-6) with a file-provider secret resolving from a local ~100-byte key file.
  3. Disable Telegram channel; no other channels configured.
  4. Start gateway: openclaw gateway. Wait for [gateway] ready.
  5. Open dashboard, send any message (including a single "hi").
  6. Observe time-to-first-token in the chat and [trace:embedded-run] lines in gateway logs.

Expected behavior

Sub-second startup and prep stages on a clean install with default config and minimal workspace state, leading to time-to-first-token in the low single digits of seconds, not minutes.

Actual behavior

Every message produces a [trace:embedded-run] entry showing prep stages totaling ~188,000ms (3 min 8 sec) before stream-setup completes. Representative trace:

prep stages totalMs=188133
workspace-sandbox: 449ms
skills: 0ms
core-plugin-tools: 72,705ms
bootstrap-context: 5,032ms
bundle-tools: 9,564ms
system-prompt: 44,471ms
session-resource-loader: 11,335ms
agent-session: 1ms
stream-setup: 44,576ms

Plus liveness warnings showing event loop blocked for 60-160s at 100% CPU on one core. agent cleanup timed out: step=pi-trajectory-flush timeoutMs=10000 after every run.

OpenClaw version

2026.4.29 (a448042)

Operating system

Windows 10 (10.0.19045.6466)

Install method

npm global

Model

anthropic/claude-sonnet-4-6

Provider / routing chain

openclaw -> anthropic

Additional provider/model setup details

Single Anthropic profile via file-provider secrets (singleValue mode). Secret resolves correctly: openclaw secrets audit reports unresolved=0. The auth stage in trace logs nonetheless takes 24-30s for the ~100-byte file read, which appears pathological.

Logs, screenshots, and evidence

Two representative trace lines from gateway log:

Startup stages:
runId=4978127e-... totalMs=125689
  workspace:1ms
  runtime-plugins:51,684ms
  hooks:1ms
  model-resolution:18,856ms
  auth:24,592ms
  context-engine:1ms
  attempt-dispatch:30,554ms

Prep stages (same run):
totalMs=187,321
  workspace-sandbox:704ms
  skills:1ms
  core-plugin-tools:62,824ms
  bootstrap-context:4,823ms
  bundle-tools:8,814ms
  system-prompt:47,996ms
  session-resource-loader:13,988ms
  agent-session:5ms
  stream-setup:48,166ms

Liveness warning examples:
  eventLoopDelayMaxMs=84,020 cpuCoreRatio=0.985
  eventLoopDelayMaxMs=143,344 cpuCoreRatio=0.985
  eventLoopDelayMaxMs=162,000 cpuCoreRatio=0.988

`workspace bootstrap file TOOLS.md is 12423 chars (limit 12000); truncating in injected context` repeats every run before slimming.

Full logs available on request

Impact and severity

Affected: Single-user local install; usable but degraded.
Severity: Blocks workflow — every chat turn requires 3-5 min wait, making interactive sessions impractical.
Frequency: Always; reproduced on every turn including fresh "hi" messages.
Consequence: Cannot use OpenClaw for real-time agent interaction; cron/heartbeat tasks complete but each consumes 3+ minutes of agent time per run.

Additional information

Mitigations attempted that did NOT change prep-stage timings:

  • Full uninstall + reinstall to 2026.4.29 (resolved separate "missing module" corruption)
  • Cleared plugin-runtime-deps cache
  • Disabled Telegram channel (had separate polling-stall issue)
  • Removed AVG antivirus + reboot; Windows Defender exclusions added for .openclaw, npm modules path, node.exe
  • Stopped and disabled OneDrive
  • Removed extraneous agents from agents.list
  • Confirmed file-provider secret resolution works (secrets audit unresolved=0)
  • openclaw doctor archived 371 orphan transcripts, cleared stale locks, refreshed plugin registry (69 plugins, no errors)
  • Slimmed AGENTS.md (12,369→4,425 chars), MEMORY.md (5,132→3,856), split PROJECTS.md (11,493→2,832 with details extracted to non-injected file)

After all the above, prep stages remain unchanged at ~188,000ms total. Bottleneck is internal to plugin loading and prompt composition, not file sizes.

Hypothesis: core-plugin-tools and system-prompt stages do extensive synchronous file walks of workspace and per-agent state on every turn, regardless of message complexity. auth taking 24-30s for a 100-byte file read also looks pathological. Happy to provide full logs, redacted config, or run any diagnostic commands.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbug:behaviorIncorrect behavior without a crash

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions