OpenClaw Diagnostic Recovery Aborts Active Embedded Runs
Date: 2026-05-23
Reporter environment: macOS 26.4.1 arm64, OpenClaw 2026.5.20 (e510042), Node v24.13.0, OpenAI Codex embedded runner.
Summary
OpenClaw core diagnostics repeatedly classify embedded Codex app-server runs as stalled after terminal-looking progress events such as codex_app_server:notification:rawResponseItem/completed, then runs the core recovery action abort_embedded_run. This releases the session lane, but it can also cut off long-running direct/group work that was still producing useful navigation, browser, tool, or assistant progress.
This is not Taskdash, raw-Codex watch, or our custom outcome-supervisor making a decision. The exact warning and recovery strings are emitted by the installed OpenClaw runtime:
/usr/local/lib/node_modules/openclaw/dist/diagnostic-CgdFvhDv.js
/usr/local/lib/node_modules/openclaw/dist/diagnostic-stuck-session-recovery.runtime-C6DQkhmb.js
The user-visible failure is severe: direct work can appear "done" or simply stop responding after CAPTCHA/browser/navigation work, while the task outcome is incomplete and there is no clear terminal delivery. The operator then has to manually reconcile whether the work finished, was blocked, or was system-aborted.
Impact
- Long direct/group tasks are aborted by core recovery even when the app-server stream still alternates between terminal-looking events and real progress events.
- Queued user turns can resume, but the original task outcome is left uncertain.
- The session/run can look terminal in dashboards even when task-level work did not finish.
- Operators must manually inspect transcripts, browser state, logs, and local files to determine whether work succeeded.
Observed Evidence
From local gateway diagnostics, sanitized:
14 stuck session recovery: ... action=abort_embedded_run aborted=true events in the retained gateway diagnostic log.
11 of those were direct Telegram sessions.
3 were group Telegram sessions.
59 queued_behind_terminal_active_work stall warnings in the same retained diagnostic log.
Representative sanitized sequence:
2026-05-19T20:07:58.897+03:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> state=processing age=389s queueDepth=1 reason=queued_behind_terminal_active_work classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=3s terminalProgressStale=true recovery=checking
2026-05-19T20:07:58.940+03:00 [diagnostic] stuck session recovery: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> age=389s action=abort_embedded_run aborted=true drained=true released=0
2026-05-19T20:07:58.942+03:00 [diagnostic] stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> activeSessionId=<redacted-session-id> activeWorkKind=embedded_run lane=session:agent:main:telegram:default:direct:<redacted-chat> aborted=true drained=true forceCleared=false released=0
Later examples show the same pattern:
2026-05-21T19:49:35.284+03:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> state=processing age=435s queueDepth=1 reason=queued_behind_terminal_active_work classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=1s terminalProgressStale=true recovery=checking
2026-05-21T19:49:35.331+03:00 [diagnostic] stuck session recovery: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> age=435s action=abort_embedded_run aborted=true drained=true released=0
2026-05-21T23:54:05.945+03:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> state=processing age=417s queueDepth=1 reason=queued_behind_terminal_active_work classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=2s terminalProgressStale=true recovery=checking
2026-05-21T23:54:05.987+03:00 [diagnostic] stuck session recovery: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> age=417s action=abort_embedded_run aborted=true drained=true released=0
An operator-provided fresh excerpt from 2026-05-23 showed the same session repeatedly alternating between:
long-running session ... lastProgress=codex_app_server:notification:item/agentMessage/delta
long-running session ... lastProgress=codex_app_server:notification:turn/diff/updated
stalled session ... lastProgress=codex_app_server:notification:rawResponseItem/completed ... recovery=checking
stuck session recovery ... action=abort_embedded_run aborted=true drained=true
That run had recently solved a CAPTCHA and navigated pages, then stopped without a visible final direct response. Live state later showed no active direct run, which is consistent with core recovery having ended the embedded run while task-level work remained unresolved.
Source-Level Suspect
In the installed build, classification treats terminal-looking Codex app-server notifications as stale/terminal active work when queued work exists:
if (params.queueDepth > 0 && params.activity.activeWorkKind === "embedded_run" && isTerminalDiagnosticProgressReason(params.activity.lastProgressReason)) return {
eventType: "session.stalled",
reason: "queued_behind_terminal_active_work",
classification: "stalled_agent_run",
activeWorkKind: params.activity.activeWorkKind,
recoveryEligible: false
};
Then separate recovery eligibility permits active abort for stalled embedded runs after the abort threshold:
return params.classification?.eventType === "session.stalled" &&
params.classification.classification === "stalled_agent_run" &&
params.classification.activeWorkKind === "embedded_run" &&
params.ageMs >= params.stuckSessionAbortMs;
The recovery runtime then calls abortAndDrainEmbeddedPiRun and emits:
action=abort_embedded_run aborted=true drained=true
This means a notification such as rawResponseItem/completed can become a recovery trigger even when the larger app-server turn/session still has useful later progress or task-level obligations.
Expected Behavior
OpenClaw should not abort a direct/group embedded run solely because the last low-level app-server event looks terminal while queued work exists.
Safer behavior:
- Distinguish "terminal response item" from "terminal run/session/task".
- Require a durable run/session terminal event, or a stronger no-progress invariant, before aborting active embedded work.
- If recovery is necessary, mark the session/run outcome distinctly as
system_aborted or equivalent, with enough evidence for UI/API consumers to avoid showing normal done.
- Preserve and surface whether a final assistant response was delivered to the original channel.
- Prefer lane release or queue backpressure mechanisms that do not interrupt active browser/tool/model work unless active work is proven orphaned.
Actual Behavior
OpenClaw core emits recovery=checking, calls abort_embedded_run, reports aborted=true drained=true, and the original user task can become outcome-ambiguous. Downstream tools that reconstruct task state from session rows can then flatten the row to done/completed because they see terminal timestamps or clean model completion fragments without the diagnostic recovery context.
Related Issues
Potentially related but not identical:
This report is specifically about diagnostic recovery using terminal-looking app-server notification reasons to abort embedded direct/group runs, causing task outcome loss or ambiguity.
Suggested Fix Shape
- Treat
rawResponseItem/completed, response.completed, output_item.done, and similar item-level events as terminal only for the item/span they describe, not for the whole embedded run.
- Reset or downgrade
terminalProgressStale when newer non-terminal progress follows, including item/started, item/agentMessage/delta, turn/diff/updated, tool activity, browser activity, or assistant delta.
- Add a "system aborted" terminal classification to session/run state when core recovery does abort, so dashboard/API consumers can distinguish core recovery from normal completion and operator abort.
- Add regression coverage for a sequence that alternates:
rawResponseItem/completed
- later assistant/tool/progress events
- queued follow-up work
- no durable run/session completion
The expected result for that sequence should not be abort_embedded_run unless the embedded run is independently proven orphaned.
OpenClaw Diagnostic Recovery Aborts Active Embedded Runs
Date: 2026-05-23
Reporter environment: macOS 26.4.1 arm64, OpenClaw 2026.5.20 (
e510042), Node v24.13.0, OpenAI Codex embedded runner.Summary
OpenClaw core diagnostics repeatedly classify embedded Codex app-server runs as stalled after terminal-looking progress events such as
codex_app_server:notification:rawResponseItem/completed, then runs the core recovery actionabort_embedded_run. This releases the session lane, but it can also cut off long-running direct/group work that was still producing useful navigation, browser, tool, or assistant progress.This is not Taskdash, raw-Codex watch, or our custom outcome-supervisor making a decision. The exact warning and recovery strings are emitted by the installed OpenClaw runtime:
/usr/local/lib/node_modules/openclaw/dist/diagnostic-CgdFvhDv.js/usr/local/lib/node_modules/openclaw/dist/diagnostic-stuck-session-recovery.runtime-C6DQkhmb.jsThe user-visible failure is severe: direct work can appear "done" or simply stop responding after CAPTCHA/browser/navigation work, while the task outcome is incomplete and there is no clear terminal delivery. The operator then has to manually reconcile whether the work finished, was blocked, or was system-aborted.
Impact
Observed Evidence
From local gateway diagnostics, sanitized:
14stuck session recovery: ... action=abort_embedded_run aborted=trueevents in the retained gateway diagnostic log.11of those were direct Telegram sessions.3were group Telegram sessions.59queued_behind_terminal_active_workstall warnings in the same retained diagnostic log.Representative sanitized sequence:
Later examples show the same pattern:
An operator-provided fresh excerpt from 2026-05-23 showed the same session repeatedly alternating between:
long-running session ... lastProgress=codex_app_server:notification:item/agentMessage/deltalong-running session ... lastProgress=codex_app_server:notification:turn/diff/updatedstalled session ... lastProgress=codex_app_server:notification:rawResponseItem/completed ... recovery=checkingstuck session recovery ... action=abort_embedded_run aborted=true drained=trueThat run had recently solved a CAPTCHA and navigated pages, then stopped without a visible final direct response. Live state later showed no active direct run, which is consistent with core recovery having ended the embedded run while task-level work remained unresolved.
Source-Level Suspect
In the installed build, classification treats terminal-looking Codex app-server notifications as stale/terminal active work when queued work exists:
Then separate recovery eligibility permits active abort for stalled embedded runs after the abort threshold:
The recovery runtime then calls
abortAndDrainEmbeddedPiRunand emits:This means a notification such as
rawResponseItem/completedcan become a recovery trigger even when the larger app-server turn/session still has useful later progress or task-level obligations.Expected Behavior
OpenClaw should not abort a direct/group embedded run solely because the last low-level app-server event looks terminal while queued work exists.
Safer behavior:
system_abortedor equivalent, with enough evidence for UI/API consumers to avoid showing normaldone.Actual Behavior
OpenClaw core emits
recovery=checking, callsabort_embedded_run, reportsaborted=true drained=true, and the original user task can become outcome-ambiguous. Downstream tools that reconstruct task state from session rows can then flatten the row to done/completed because they see terminal timestamps or clean model completion fragments without the diagnostic recovery context.Related Issues
Potentially related but not identical:
This report is specifically about diagnostic recovery using terminal-looking app-server notification reasons to abort embedded direct/group runs, causing task outcome loss or ambiguity.
Suggested Fix Shape
rawResponseItem/completed,response.completed,output_item.done, and similar item-level events as terminal only for the item/span they describe, not for the whole embedded run.terminalProgressStalewhen newer non-terminal progress follows, includingitem/started,item/agentMessage/delta,turn/diff/updated, tool activity, browser activity, or assistant delta.rawResponseItem/completedThe expected result for that sequence should not be
abort_embedded_rununless the embedded run is independently proven orphaned.