Skip to content

[perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized #75887

@highfly-hi

Description

@highfly-hi

[perf] system-prompt rebuild blocks event loop 16-29s per run; idle keeps 1 CPU core fully utilized

Environment

  • openclaw 2026.4.29 (npm: openclaw)
  • Node.js (system), Linux 6.17.0-20-generic
  • Profile: secretary (gateway port 18790)
  • Memory backend: embeddinggemma-300m-qat-Q8_0.gguf (sqlite-vec + FTS5 hybrid + MMR + temporalDecay)
  • DB: 16,464 chunks across 28 files

Symptoms

  1. Every embedded-run prep stages trace shows system-prompt = 16,527-29,185 ms as the single largest stage.
  2. liveness warning reports eventLoopUtilization=1, cpuCoreRatio≈1.06-1.08 even when active=0 waiting=0 queued=0 — main thread is fully saturated by background work.
  3. eventLoopDelayMaxMs peaks (27,363 ms, 21,323 ms, 14,621 ms) coincide with prep-stage rebuilds, causing client-visible timeouts (embedded run failover decision: stage=assistant decision=surface_error reason=timeout).
  4. Reducing bootstrapTotalMaxChars from 150,000 → 60,000 helps but cannot eliminate, because the synchronous embedding search + MMR re-rank still runs on the main thread.

Sample trace

prep stages: totalMs=57374 stages=
  workspace-sandbox:9ms@9ms,
  skills:1ms@10ms,
  core-plugin-tools:3602ms@3612ms,
  bootstrap-context:5ms@3617ms,
  bundle-tools:764ms@4381ms,
  system-prompt:19242ms@23623ms,             ← 19s on main thread
  session-resource-loader:12290ms@35913ms,
  agent-session:2ms@35915ms,
  stream-setup:21459ms@57374ms

liveness warning: reasons=event_loop_delay,event_loop_utilization
  interval=35s eventLoopDelayP99Ms=27363.6 eventLoopDelayMaxMs=27363.6
  eventLoopUtilization=1 cpuCoreRatio=0.254 active=1 waiting=0 queued=0

Root-cause hypothesis

  • node-llama-cpp embedding inference is invoked synchronously from the main event loop during system-prompt build.
  • memorySearch hybrid (vector + FTS) + MMR re-rank + temporal-decay sort over 16K chunks runs in-process.
  • No incremental cache: identical adjacent runs rebuild the prompt from scratch even when memory hasn't changed.

Proposed mitigations (in order of impact)

  1. Move embedding inference to a Worker thread (node-llama-cpp supports nThreads / off-main inference).
  2. Cache system-prompt by (workspaceState, memoryRevision, bootstrapBudget) hash so repeated runs reuse the rendered prompt.
  3. Stream MMR re-rank with setImmediate/yieldEvery N so it doesn't monopolize the loop.
  4. Optionally expose agents.defaults.memorySearch.workerThread = true.

Workaround applied locally

  • Reduced bootstrapMaxChars 20000 → 10000, bootstrapTotalMaxChars 150000 → 60000.
  • Effect: memory peak 5.0 GB → 0.75 GB, but cpuCoreRatio≈1.07 persists when idle.

Repro

  1. Configure agents.defaults.contextInjection: "always" with bootstrapTotalMaxChars: 150000 and a populated memory DB (>10K chunks).
  2. Trigger any embedded-run and observe [trace:embedded-run] prep stages system-prompt ≥15s.
  3. With agent idle, observe [diagnostic] liveness warning showing eventLoopUtilization=1 while active=0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions