Skip to content

Fix stuck-session recovery aborts for active reply runs#88925

Open
SnowSky1 wants to merge 2 commits into
openclaw:mainfrom
SnowSky1:codex/88870-reply-run-stuck-recovery
Open

Fix stuck-session recovery aborts for active reply runs#88925
SnowSky1 wants to merge 2 commits into
openclaw:mainfrom
SnowSky1:codex/88870-reply-run-stuck-recovery

Conversation

@SnowSky1

@SnowSky1 SnowSky1 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • preserve stuck_recovery through reply-run session-id fallback and embedded-run cancel plumbing so recovery no longer degrades into a fake user abort
  • require stale tracked progress before active embedded/reply work is abort-drained, even when active-abort recovery is enabled
  • add regressions for fresh terminal progress, fresh active reply work, and streamed model progress so long-but-active runs stay alive

Testing

  • node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply-reply.config.ts src/auto-reply/reply/reply-run-registry.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.logging.config.ts src/logging/diagnostic.test.ts src/logging/diagnostic-stuck-session-recovery.integration.test.ts
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-embedded-agent.config.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts

Real behavior proof

  • Behavior or issue addressed: Verify that this patch preserves stuck_recovery through the reply-run session-id fallback and does not abort a live embedded run that still has fresh tracked progress.
  • Real environment tested: Local OpenClaw checkout on Windows 11 using the real modules from this PR branch via a Node/tsx runtime harness after commit 159943a6c0.
  • Exact steps or command run after this patch: Ran pnpm exec tsx - against the real repo modules in this branch with a runtime harness that (1) creates a reply operation and aborts it through abortReplyRunBySessionId(..., { reason: "stuck_recovery" }), then (2) registers an active embedded run, records fresh progress with markDiagnosticRunProgressForTest(...), and calls recoverStuckDiagnosticSession(...) with allowActiveAbort: true.
  • Evidence after fix: Terminal output from the local OpenClaw runtime harness after the fix:
    [reply-registry]
    {"aborted":true,"result":{"kind":"aborted","code":"aborted_by_system","reason":"stuck_recovery"},"abortSignalReason":"Reply operation aborted by system reason=stuck_recovery"}
    [stuck-recovery]
    {"snapshotBefore":{"activeWorkKind":"embedded_run","hasActiveEmbeddedRun":true,"lastProgressAgeMs":1,"lastProgressReason":"model chunk observed from runtime harness"},"outcome":{"status":"skipped","action":"observe_only","reason":"active_embedded_run","sessionId":"session-live-progress","sessionKey":"agent:main:main","activeSessionId":"session-live-progress","activeWorkKind":"embedded_run"},"abortCount":0,"cancelReasons":[],"snapshotAfter":{"activeWorkKind":"embedded_run","hasActiveEmbeddedRun":true,"lastProgressAgeMs":84,"lastProgressReason":"model chunk observed from runtime harness"}}
    [diagnostic] stuck session recovery skipped: sessionId=session-live-progress sessionKey=agent:main:main age=180s queueDepth=1 activeSessionId=session-live-progress
    [diagnostic] stuck session recovery outcome: status=skipped action=observe_only sessionId=session-live-progress sessionKey=agent:main:main activeSessionId=session-live-progress activeWorkKind=embedded_run reason=active_embedded_run
  • Observed result after fix: The reply-run fallback keeps reason: "stuck_recovery" instead of degrading into a user-abort shape. The live-progress recovery path returns status: "skipped", leaves the embedded run active, and does not call abort or cancel on the active handle.
  • What was not tested: I did not rerun a multi-minute end-to-end Feishu + claude-cli channel repro in this environment, and I did not capture UI/video proof from a live chat surface.

Notes

  • diagnostics.stuckSessionWarnMs / diagnostics.stuckSessionAbortMs and the default warnMs x 3 / minimum 5 minute behavior are already documented in the current docs/config help, so this PR focuses on the runtime regressions from #88870.

Closes #88870

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 1, 2026
@clawsweeper

clawsweeper Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 11, 2026, 12:57 AM ET / 04:57 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +129, Tests +153. Total +282 across 10 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] No close action taken because the review did not complete.

Maintainer options:

  1. Decide the mitigation before merge
    Retry the Codex review after fixing the execution failure.
  2. Pause or close
    Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

  • [P1] Review did not complete, so no work-lane recommendation was made.
Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model internal, reasoning high; reviewed against 418d7e1e83c5.

Label changes

Label changes:

  • add rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
  • remove P2: Current review triage priority is none.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🌊 off-meta tidepool, so this older rating label is no longer current.
  • remove merge-risk: 🚨 session-state: Current PR review selected no merge-risk labels.
  • remove merge-risk: 🚨 message-delivery: Current PR review selected no merge-risk labels.
  • remove status: 👀 ready for maintainer look: Current PR status no longer selects a status label.

Label justifications:

  • rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.
Evidence reviewed

PR surface:

Source +129, Tests +153. Total +282 across 10 files.

View PR surface stats
Area Files Added Removed Net
Source 6 167 38 +129
Tests 4 154 1 +153
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 10 321 39 +282

What I checked:

  • failure reason: codex execution failed.
  • codex failure detail: Codex review failed for this PR with exit 1.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 11, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. proof: sufficient ClawSweeper judged the real behavior proof convincing. labels Jun 11, 2026
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P2 Normal backlog priority with limited blast radius. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stuck-session recovery aborts long-but-active agent runs at warnMs×3 (~6min) with misleading "Reply operation aborted by user" reason

1 participant