-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
Event-loop starvation during context compaction causes fetch timeouts (16.9s timer delay) #86358
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.ClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.ClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🐚 platinum hermitGood issue quality with a plausible reproduction path needing some confirmation.Good issue quality with a plausible reproduction path needing some confirmation.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Summary
During context overflow auto-compaction, the Node.js event loop stalls for ~17 seconds, causing pending fetch operations (e.g. Telegram API calls) to time out — even when their timeout is set to 10s. This is consistent with CPU-synchronous work blocking the event loop during compaction.
Environment
npm info openclaw→git+https://github.com/openclaw/openclaw.git)openai-codex/gpt-5.5Observed sequence
The
timerDelayMs=16963in your own[fetch-timeout]log confirms the event loop was blocked for 16.9s during compaction — the 10s fetch timer couldn't fire until 26.9s elapsed.Cascading effect
After compaction the agent resumed but then ran two web search tool calls that both hit MCP -32001 timeout:
These may also be caused by the event loop being saturated post-compaction, or by MCP server state after the stall.
Expected behaviour
Compaction should not block the event loop. If it involves heavy JSON serialisation / summarisation API calls, those should be done in a worker thread or with
setImmediateyields so pending timers can fire normally.Suggested fix direction
setImmediateto yield the event loopImpact
In our setup (agent-chat-telegram orchestrator driving OpenClaw as a subprocess), the stall causes the orchestrator's own 300s timeout to eventually fire and terminate the OpenClaw call, surfacing as a generic failure to the end user.