Diagnostic recovery aborts active embedded runs after terminal-looking app-server progress

# OpenClaw Diagnostic Recovery Aborts Active Embedded Runs

Date: 2026-05-23
Reporter environment: macOS 26.4.1 arm64, OpenClaw 2026.5.20 (`e510042`), Node v24.13.0, OpenAI Codex embedded runner.

## Summary

OpenClaw core diagnostics repeatedly classify embedded Codex app-server runs as stalled after terminal-looking progress events such as `codex_app_server:notification:rawResponseItem/completed`, then runs the core recovery action `abort_embedded_run`. This releases the session lane, but it can also cut off long-running direct/group work that was still producing useful navigation, browser, tool, or assistant progress.

This is not Taskdash, raw-Codex watch, or our custom outcome-supervisor making a decision. The exact warning and recovery strings are emitted by the installed OpenClaw runtime:

- `/usr/local/lib/node_modules/openclaw/dist/diagnostic-CgdFvhDv.js`
- `/usr/local/lib/node_modules/openclaw/dist/diagnostic-stuck-session-recovery.runtime-C6DQkhmb.js`

The user-visible failure is severe: direct work can appear "done" or simply stop responding after CAPTCHA/browser/navigation work, while the task outcome is incomplete and there is no clear terminal delivery. The operator then has to manually reconcile whether the work finished, was blocked, or was system-aborted.

## Impact

- Long direct/group tasks are aborted by core recovery even when the app-server stream still alternates between terminal-looking events and real progress events.
- Queued user turns can resume, but the original task outcome is left uncertain.
- The session/run can look terminal in dashboards even when task-level work did not finish.
- Operators must manually inspect transcripts, browser state, logs, and local files to determine whether work succeeded.

## Observed Evidence

From local gateway diagnostics, sanitized:

- `14` `stuck session recovery: ... action=abort_embedded_run aborted=true` events in the retained gateway diagnostic log.
- `11` of those were direct Telegram sessions.
- `3` were group Telegram sessions.
- `59` `queued_behind_terminal_active_work` stall warnings in the same retained diagnostic log.

Representative sanitized sequence:

```text
2026-05-19T20:07:58.897+03:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> state=processing age=389s queueDepth=1 reason=queued_behind_terminal_active_work classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=3s terminalProgressStale=true recovery=checking
2026-05-19T20:07:58.940+03:00 [diagnostic] stuck session recovery: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> age=389s action=abort_embedded_run aborted=true drained=true released=0
2026-05-19T20:07:58.942+03:00 [diagnostic] stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> activeSessionId=<redacted-session-id> activeWorkKind=embedded_run lane=session:agent:main:telegram:default:direct:<redacted-chat> aborted=true drained=true forceCleared=false released=0
```

Later examples show the same pattern:

```text
2026-05-21T19:49:35.284+03:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> state=processing age=435s queueDepth=1 reason=queued_behind_terminal_active_work classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=1s terminalProgressStale=true recovery=checking
2026-05-21T19:49:35.331+03:00 [diagnostic] stuck session recovery: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> age=435s action=abort_embedded_run aborted=true drained=true released=0

2026-05-21T23:54:05.945+03:00 [diagnostic] stalled session: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> state=processing age=417s queueDepth=1 reason=queued_behind_terminal_active_work classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=2s terminalProgressStale=true recovery=checking
2026-05-21T23:54:05.987+03:00 [diagnostic] stuck session recovery: sessionId=<redacted-session-id> sessionKey=agent:main:telegram:default:direct:<redacted-chat> age=417s action=abort_embedded_run aborted=true drained=true released=0
```

An operator-provided fresh excerpt from 2026-05-23 showed the same session repeatedly alternating between:

- `long-running session ... lastProgress=codex_app_server:notification:item/agentMessage/delta`
- `long-running session ... lastProgress=codex_app_server:notification:turn/diff/updated`
- `stalled session ... lastProgress=codex_app_server:notification:rawResponseItem/completed ... recovery=checking`
- `stuck session recovery ... action=abort_embedded_run aborted=true drained=true`

That run had recently solved a CAPTCHA and navigated pages, then stopped without a visible final direct response. Live state later showed no active direct run, which is consistent with core recovery having ended the embedded run while task-level work remained unresolved.

## Source-Level Suspect

In the installed build, classification treats terminal-looking Codex app-server notifications as stale/terminal active work when queued work exists:

```js
if (params.queueDepth > 0 && params.activity.activeWorkKind === "embedded_run" && isTerminalDiagnosticProgressReason(params.activity.lastProgressReason)) return {
  eventType: "session.stalled",
  reason: "queued_behind_terminal_active_work",
  classification: "stalled_agent_run",
  activeWorkKind: params.activity.activeWorkKind,
  recoveryEligible: false
};
```

Then separate recovery eligibility permits active abort for stalled embedded runs after the abort threshold:

```js
return params.classification?.eventType === "session.stalled" &&
  params.classification.classification === "stalled_agent_run" &&
  params.classification.activeWorkKind === "embedded_run" &&
  params.ageMs >= params.stuckSessionAbortMs;
```

The recovery runtime then calls `abortAndDrainEmbeddedPiRun` and emits:

```text
action=abort_embedded_run aborted=true drained=true
```

This means a notification such as `rawResponseItem/completed` can become a recovery trigger even when the larger app-server turn/session still has useful later progress or task-level obligations.

## Expected Behavior

OpenClaw should not abort a direct/group embedded run solely because the last low-level app-server event looks terminal while queued work exists.

Safer behavior:

1. Distinguish "terminal response item" from "terminal run/session/task".
2. Require a durable run/session terminal event, or a stronger no-progress invariant, before aborting active embedded work.
3. If recovery is necessary, mark the session/run outcome distinctly as `system_aborted` or equivalent, with enough evidence for UI/API consumers to avoid showing normal `done`.
4. Preserve and surface whether a final assistant response was delivered to the original channel.
5. Prefer lane release or queue backpressure mechanisms that do not interrupt active browser/tool/model work unless active work is proven orphaned.

## Actual Behavior

OpenClaw core emits `recovery=checking`, calls `abort_embedded_run`, reports `aborted=true drained=true`, and the original user task can become outcome-ambiguous. Downstream tools that reconstruct task state from session rows can then flatten the row to done/completed because they see terminal timestamps or clean model completion fragments without the diagnostic recovery context.

## Related Issues

Potentially related but not identical:

- https://github.com/openclaw/openclaw/issues/85251
- https://github.com/openclaw/openclaw/issues/84903
- https://github.com/openclaw/openclaw/issues/84569

This report is specifically about diagnostic recovery using terminal-looking app-server notification reasons to abort embedded direct/group runs, causing task outcome loss or ambiguity.

## Suggested Fix Shape

- Treat `rawResponseItem/completed`, `response.completed`, `output_item.done`, and similar item-level events as terminal only for the item/span they describe, not for the whole embedded run.
- Reset or downgrade `terminalProgressStale` when newer non-terminal progress follows, including `item/started`, `item/agentMessage/delta`, `turn/diff/updated`, tool activity, browser activity, or assistant delta.
- Add a "system aborted" terminal classification to session/run state when core recovery does abort, so dashboard/API consumers can distinguish core recovery from normal completion and operator abort.
- Add regression coverage for a sequence that alternates:
  - `rawResponseItem/completed`
  - later assistant/tool/progress events
  - queued follow-up work
  - no durable run/session completion

The expected result for that sequence should not be `abort_embedded_run` unless the embedded run is independently proven orphaned.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Diagnostic recovery aborts active embedded runs after terminal-looking app-server progress #85532

OpenClaw Diagnostic Recovery Aborts Active Embedded Runs

Summary

Impact

Observed Evidence

Source-Level Suspect

Expected Behavior

Actual Behavior

Related Issues

Suggested Fix Shape

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Diagnostic recovery aborts active embedded runs after terminal-looking app-server progress #85532

Description

OpenClaw Diagnostic Recovery Aborts Active Embedded Runs

Summary

Impact

Observed Evidence

Source-Level Suspect

Expected Behavior

Actual Behavior

Related Issues

Suggested Fix Shape

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions