fix: avoid replaying channel restart recovery turns by levineam · Pull Request #74687 · openclaw/openclaw

levineam · 2026-04-30T00:00:59Z

Summary

Describe the problem and fix in 2–5 bullets:

Problem: restart-aborted main-session recovery can auto-replay a synthetic recovery turn into channel-backed sessions.
Why it matters: channel-backed sessions have external delivery state; hidden replay can re-enter delivery/recovery paths after a restart when manual recovery is safer.
What changed: channel-backed restart-aborted sessions are marked failed for manual recovery instead of replaying a synthetic recovery turn.
What did NOT change (scope boundary): restart recovery for non-channel resumable main sessions remains intact; no config, cron, memory, or QMD behavior is changed.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related Bug: reopen can leave unresolved tail tool calls and relies on local transcript compensation #64530
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: main-session restart recovery treats channel-backed sessions like ordinary resumable local sessions and may enqueue a synthetic hidden agent turn after restart. For externally delivered sessions, that can re-enter channel/session delivery state when the safer behavior is explicit manual recovery.
Missing detection / guardrail: coverage did not assert that channel-backed restart-aborted sessions avoid synthetic replay.
Contributing context (if known): externally backed sessions carry channel metadata and delivery context that should not be implicitly retried as hidden local recovery turns.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/agents/main-session-restart-recovery.test.ts
Scenario the test should lock in: Discord/channel-backed restart-aborted sessions are marked failed for manual recovery and do not enqueue synthetic recovery turns; non-channel restart recovery remains resumable.
Why this is the smallest reliable guardrail: the behavior is decided in restart recovery before any external channel send is attempted.
Existing test that already covers this (if any): existing restart-recovery tests cover resumable non-channel behavior; this PR adds channel-backed coverage.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Channel-backed sessions that were interrupted during restart recovery now require manual recovery instead of hidden automatic replay. Non-channel resumable recovery behavior is unchanged.

Diagram (if applicable)

Before:
[channel-backed interrupted session] -> [restart recovery] -> [synthetic hidden agent turn] -> [delivery/session path re-entered]

After:
[channel-backed interrupted session] -> [restart recovery] -> [failed/manual recovery marker]

Non-channel:
[local resumable interrupted session] -> [restart recovery] -> [existing synthetic recovery path]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: macOS
Runtime/container: local gateway / Node runtime
Model/provider: N/A
Integration/channel (if any): Discord-backed channel session
Relevant config (redacted): channel-backed main session with delivery context

Steps

Have a channel-backed main session that is restart-aborted.
Restart/reopen through main-session recovery.
Observe whether recovery enqueues a synthetic hidden agent turn or marks the session for manual recovery.

Expected

Channel-backed restart-aborted sessions should not auto-replay a synthetic recovery turn.
Non-channel resumable sessions should keep existing restart recovery behavior.

Actual

Before this PR, channel-backed sessions could follow the same synthetic recovery path as non-channel sessions.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Verification run locally:

node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-core.config.ts src/agents/main-session-restart-recovery.test.ts --maxWorkers=1 --reporter dot — 6 passed
node scripts/run-vitest.mjs run --config test/vitest/vitest.agents-core.config.ts src/agents/session-write-lock.test.ts --maxWorkers=1 --reporter dot — 24 passed
pnpm exec oxlint src/agents/main-session-restart-recovery.ts src/agents/main-session-restart-recovery.test.ts — 0 warnings/errors

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: channel-backed restart-aborted recovery path; non-channel resumable recovery path; nearby session write-lock tests.
Edge cases checked: restart-aborted sessions with channel metadata; existing non-channel resumable recovery behavior.
What you did not verify: full live Discord end-to-end restart recovery in production.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: some channel-backed interrupted sessions that previously attempted automatic recovery will now require manual recovery.
- Mitigation: this is intentional for externally delivered sessions; non-channel resumable sessions keep automatic recovery.

clawsweeper · 2026-04-30T00:05:07Z

Codex review: needs real behavior proof before merge. Reviewed May 30, 2026, 12:59 AM ET / 04:59 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +21, Tests +31. Total +52 across 2 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

[P1] No close action taken because the review did not complete.

Maintainer options:

Decide the mitigation before merge
Retry the Codex review after fixing the execution failure.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P1] Review did not complete, so no work-lane recommendation was made.

Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against b352cb2d8e7f.

Label changes

Label justifications:

rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.

Evidence reviewed

PR surface:

Source +21, Tests +31. Total +52 across 2 files.

View PR surface stats

Area	Files	Added	Net
Source	1	21	+21
Tests	1	31	+31
Docs	0	0	0
Config	0	0	0
Generated	0	0	0
Other	0	0	0
Total	2	52	+52

What I checked:

failure reason: codex execution failed.
codex failure detail: Codex review failed for this PR with exit 1.
codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

openclaw-barnacle · 2026-05-30T04:48:27Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

fix: avoid replaying channel restart recovery turns

da09972

openclaw-barnacle Bot added agents Agent runtime and tooling size: S labels Apr 30, 2026

coolmanns mentioned this pull request May 3, 2026

[Bug]: Telegram context-overflow retry replays same inbound message and delivers stale turn #76424

Closed

This was referenced May 4, 2026

[Bug]: ignored an explicit stop/no-action instruction #55044

Open

Gateway self-restart from chat turn drops in-flight Telegram/Discord replies #78380

Open

clawsweeper Bot mentioned this pull request May 14, 2026

[Bug]: Restart recovery can finish with payloads=0 while Control UI only shows tool blocks and no visible error #77883

Open

openclaw-barnacle Bot added stale Marked as stale due to inactivity triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 30, 2026

clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 30, 2026

barnacle-openclaw Bot removed the stale Marked as stale due to inactivity label May 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: avoid replaying channel restart recovery turns#74687

fix: avoid replaying channel restart recovery turns#74687
levineam wants to merge 1 commit into
openclaw:mainfrom
levineam:fix/channel-restart-recovery

levineam commented Apr 30, 2026

Uh oh!

clawsweeper Bot commented Apr 30, 2026 •

edited

Loading

Uh oh!

openclaw-barnacle Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

levineam commented Apr 30, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Uh oh!

clawsweeper Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openclaw-barnacle Bot commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

clawsweeper Bot commented Apr 30, 2026 •

edited

Loading