-
-
Notifications
You must be signed in to change notification settings - Fork 79.1k
[Bug]: Session stuck in "running" status persists in v2026.4.9 — phaseBeforeAbort fix no longer sufficient #63819
Copy link
Copy link
Closed
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Metadata
Metadata
Assignees
Labels
P1High-priority user-facing bug, regression, or broken workflow.High-priority user-facing bug, regression, or broken workflow.bugSomething isn't workingSomething isn't workingclawsweeper:fix-shape-clearClawSweeper found a clear likely implementation shape for this issue.ClawSweeper found a clear likely implementation shape for this issue.clawsweeper:queueable-fixClawSweeper marked this issue as an existing queue_fix_pr work candidate.ClawSweeper marked this issue as an existing queue_fix_pr work candidate.clawsweeper:source-reproClawSweeper found a high-confidence source-level issue reproduction.ClawSweeper found a high-confidence source-level issue reproduction.impact:message-lossChannel message delivery can be lost, duplicated, or misrouted.Channel message delivery can be lost, duplicated, or misrouted.impact:session-stateSession, memory, transcript, context, or agent state can drift or corrupt.Session, memory, transcript, context, or agent state can drift or corrupt.issue-rating: 🦞 diamond lobsterVery strong issue quality with high-confidence source-level or clear reproduction.Very strong issue quality with high-confidence source-level or clear reproduction.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Bug type
Regression (worked before, now fails)
Beta release blocker
No
Summary
In v2026.4.9, sessions get stuck in status: running after any request timeout, even after the phaseBeforeAbort fix from #14228 is confirmed applied (phaseBeforeAbort:0, clearState:5 in runs-D-shWMaO.js).
Steps to reproduce
Expected behavior
Session returns to idle after abort, as in 2026.4.8 with phaseBeforeAbort fix applied.
Actual behavior
Session stays stuck in status: running. Watchdog clears it every ~2-3 minutes but it immediately gets stuck again on the next request.
OpenClaw version
2026.4.9 (0512059)
Operating system
macOS Darwin 25.4.0 x64
Install method
npm global
Model
qwen3-5-27b-8110/qwen3.5-27b (local llama.cpp Vulkan)
Provider / routing chain
openclaw → llama-server 127.0.0.1:8110/v1
Additional provider/model setup details
phaseBeforeAbort fix confirmed in both runs-D-shWMaO.js and pi-embedded-Vw-lS5ti.js. Fix worked in 4.8, broke in 4.9 suggesting root cause moved to a new code path. Related: #14228, #9405, #57617
Logs, screenshots, and evidence
[2026-04-09 10:39:24] Cleared stuck session: agent:main:telegram:direct:8317843287 (stuck 175s) [2026-04-09 10:49:25] Cleared stuck session: agent:main:telegram:direct:8317843287 (stuck 158s) [2026-04-09 10:53:26] Cleared stuck session: agent:main:telegram:direct:8317843287 (stuck 159s) 2026-04-09T14:32:36 embedded_run_agent_end isError:true error:"LLM request failed: network connection error" failoverReason:timeoutImpact and severity
High — blocks all messages after any timeout. Frequency: every ~10 min. Workaround: launchd session watchdog clears stuck sessions every 60s.
Additional information
No response