Keep queued user turns after Telegram supersedes#83790
Conversation
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: yes. at source level: current main attaches opts.abortSignal to queued user followups and the queued runner reuses that source signal when queued.abortSignal is absent. I did not run tests because this review is read-only, but the PR supplies focused before/after regression output for the same path. PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. PR egg Where did the egg go?
Real behavior proof Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge Security Review detailsBest possible solution: Land the focused abort-metadata fix after redacted live Telegram proof shows queued user turns drain after supersede while room-event cancellation still skips aborted ambient work. Do we have a high-confidence way to reproduce the issue? Yes at source level: current main attaches opts.abortSignal to queued user followups and the queued runner reuses that source signal when queued.abortSignal is absent. I did not run tests because this review is read-only, but the PR supplies focused before/after regression output for the same path. Is this the best way to solve the issue? Yes, the code direction is the narrowest maintainable fix: carry abort metadata only on queued room events and stop queued execution from falling back to the captured source signal. The remaining gap is exact live Telegram proof, not a different implementation shape. Label justifications:
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 87aa31956840. |
af605ff to
56b6731
Compare
|
Rebased the draft branch onto current
|
56b6731 to
695478d
Compare
|
Refreshed the draft branch head after the proof-body update so the PR gets a clean check set.
|
|
Queue-mode and before/after proof for the queued-followup fix. Queue mode is still respected
Before proof Setup: checked out parent node scripts/run-vitest.mjs src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply/followup-runner.test.tsResult on old code: After proof Ran the same command on PR head This proves the pre-patch behavior both attached the source abort signal to queued user requests and re-inherited an already-aborted source signal during queued execution; the PR head removes both while leaving the queue-mode decision path intact. |
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
Updated the PR body with redacted current live Telegram topic proof from the active patched gateway ( @clawsweeper re-review |
|
@clawsweeper re-review |
|
Superseded by #83827. Refiled with the RCA, before/after regression proof, queue-mode proof, live Telegram-topic log proof, blocked Mantis proof, and the later topic 22684 client-close classification consolidated in the new PR body. |
Summary
user_requestfollowups from the source-channel abort signal once they have been accepted into the reply queue.room_eventfollowups tied to their source abort signal so ambient/current-event work can still be canceled by the admission fence.opts.abortSignalwhen executing a queued item; queued execution now uses only the abort metadata carried on that queued item.Root Cause
Queued followups were built with
opts.abortSignalunconditionally. In Telegram topics, a later user turn can supersede the source reply fence while an earlier turn is still active. That made an already accepteduser_requestfollowup look aborted before it drained. The followup runner also fell back to the captured sourceopts.abortSignal, so omitting the signal on the queued object alone was not sufficient.Real behavior proof
Behavior addressed: Telegram topic user messages accepted while another turn is active should remain queued and run after the active turn, even if the source reply fence is later superseded.
Real environment tested: temp worktree
/tmp/openclaw-topic-followups.hEwtrgfor the patch and focused tests; current live local Telegram gateway for current-chat proof; real local gateway runtime logs reviewed from/tmp/openclaw/openclaw-2026-05-18.logand topic session trajectories under~/.openclaw/agents/main/sessions.Exact steps or command run after this patch:
node scripts/run-vitest.mjs src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply/followup-runner.test.ts src/auto-reply/reply/queue.collect.test.ts extensions/telegram/src/bot-message-dispatch.test.ts;jq -r 'select((.message // "") | test("topic:8185|threadId=8185")) | select(.time >= "2026-05-18T19:40:00") | [.time, .message] | @tsv' /tmp/openclaw/openclaw-2026-05-18.log; and redacted runtime-log review of the original Telegram-topic failure window.Evidence after fix: focused regression proof shows old code failed and PR head passed. On old code with only the new regression tests applied,
node scripts/run-vitest.mjs src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply/followup-runner.test.tsfailed withTest Files 2 failed (2)andTests 2 failed | 110 passed (112), includingexpected AbortSignal { aborted: true } to be undefinedandexpected AbortSignal { aborted: false } to be undefined. On PR head695478dc167f6cd54cfe318fe037837494bf832a, the same command passed withTest Files 2 passed (2)andTests 112 passed (112).Current live Telegram topic proof from the active patched gateway shows the current chat receiving topic messages and sending replies back into the same Telegram topic after the patch:
The active live runtime also contains the patched abort contract:
Redacted runtime log excerpt from the original real OpenClaw Telegram-topic failure window showed multiple topic inbounds while active queued work was present, followed by prompt-stage app-server closures and queued topic diagnostics:
Additional redacted trajectory evidence showed repeated Telegram topic abort endings matching the loss mode this patch removes for queued user requests:
Observed result after fix: queued
user_requestfollowups no longer carry or inherit the supersedable source abort signal; queuedroom_eventfollowups still preserve their abort signal and continue to be skipped when that signal is already aborted. Focused regression tests pass, and the current live Telegram topic chat shows post-patch inbound topic messages receiving outboundthreadId=8185replies from the active patched gateway.What was not tested: Mantis Telegram Desktop visible proof did not capture because the Telegram bot connection was conflicted and proof messages did not reach OpenClaw. No video/artifact proof from Mantis or Crabbox is attached. Discord was not retested beyond the reporter's confirmation that a Discord server channel was not reproducing the same issue. The current live Telegram log proof is from the operator's active gateway, not a controlled baseline/candidate A/B desktop recording.
Verification
pnpm install --frozen-lockfileto install dependencies in the temp worktree only after the Vitest wrapper could not resolvevitest/package.json.git diff --checknode scripts/run-vitest.mjs src/auto-reply/reply/get-reply-run.media-only.test.ts src/auto-reply/reply/followup-runner.test.ts src/auto-reply/reply/queue.collect.test.ts extensions/telegram/src/bot-message-dispatch.test.ts->Test Files 4 passed (4),Tests 220 passed (220).origin/main(d761b98adc6ff2c824b95d0b2a7a54dfa1177d18) and reran the same focused test command ->Test Files 4 passed (4),Tests 220 passed (220).8185showed inbound Telegram topic messages followed by outboundthreadId=8185replies from the active patched runtime.AUTOREVIEW_AUTO_TESTS=0 .agents/skills/autoreview/scripts/autoreview --mode localreported findings against untouched files (src/auto-reply/inbound-debounce.ts,extensions/telegram/src/sequential-key.ts). I rejected those as not applicable aftergit diff --name-onlyshowed this branch only touches reply-runner files. Manual review did find the followup-runner fallback path, which is fixed here. A second autoreview rerun was stopped after about five minutes because it again expanded into broad unrelated GitHub search instead of completing a local-diff review.What was not tested
pnpm checkwas not run from this Codex worktree.