-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug]: Gateway liveness reports severe event-loop stalls under subagent load #82936
Copy link
Copy link
Closed
Closed
Copy link
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.maintainerMaintainer-authored PRMaintainer-authored PR
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:linked-pr-openClawSweeper found an open linked pull request for this issue.ClawSweeper found an open linked pull request for this issue.clawsweeper:needs-live-reproClawSweeper needs live local, crabbox, or manual validation to confirm this issue.ClawSweeper needs live local, crabbox, or manual validation to confirm this issue.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.ClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:crash-loopCrash, hang, restart loop, or process-level availability failure.Crash, hang, restart loop, or process-level availability failure.maintainerMaintainer-authored PRMaintainer-authored PR
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug type
Behavior bug (incorrect output/state without crash)
Beta release blocker
No
Summary
Gateway diagnostics reported repeated multi-second event-loop stalls while several agent/subagent runs were active.
Steps to reproduce
Expected behavior
The gateway diagnostic event path should not monopolize the main event loop during bursts from concurrent agent/subagent activity.
Actual behavior
The captured gateway log contains repeated liveness warnings with event-loop delay measured in seconds, including samples with active agent/subagent work and queued work.
OpenClaw version
NOT_ENOUGH_INFO
Operating system
NOT_ENOUGH_INFO
Install method
pnpm dev
Model
NOT_ENOUGH_INFO
Provider / routing chain
NOT_ENOUGH_INFO
Additional provider/model setup details
NOT_ENOUGH_INFO
Logs, screenshots, and evidence
Impact and severity
Affected: Gateway users running concurrent agent/subagent workloads with diagnostics enabled.
Severity: High, because seconds-long event-loop stalls can delay polling, streaming, queue handling, and cleanup.
Frequency: 154 captured liveness lines matched
eventLoopDelayP99Ms=in the observed log.Consequence: Gateway responsiveness degrades while active agent/subagent work is running.
Additional information
The implicated source path is the diagnostic liveness and diagnostic event dispatch path. High-frequency diagnostic events were deferred, but the async queue drained the whole backlog in a single
setImmediateturn, which can starve other gateway work during concurrent agent/subagent bursts.