fix(gateway): clear completed session active runs#87767
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 4:45 PM ET / 20:45 UTC. Summary PR surface: Source +25, Tests +125. Total +150 across 6 files. Reproducibility: yes. Source inspection shows current main can broadcast terminal session changes while Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the gateway-boundary fix after required checks are green or explicitly accepted, keeping the in-memory retry guard while excluding terminal runs from Do we have a high-confidence way to reproduce the issue? Yes. Source inspection shows current main can broadcast terminal session changes while Is this the best way to solve the issue? Yes. Fixing the gateway projection boundary is narrower and more maintainable than another UI-only predicate because AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 607e6c206f4f. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +25, Tests +125. Total +150 across 6 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper review I pushed a follow-up commit addressing the retry/idempotency guard concern by keeping the |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
|
@clawsweeper ask explain this risk a bit more please: The PR intentionally changes gateway session-state timing: terminal lifecycle now hides a run from sessions.list before chat.send finally settles, so maintainers should accept that completed runs are no longer user-abortable during that short retry-guard window. is it mitigable? |
|
🦞👀 I queued a lightweight read-only assist pass. It will post a separate answer comment and will not edit the durable ClawSweeper review comment or trigger close, merge, repair, label, or branch changes. Request: explain this risk a bit more please: The PR intentionally changes gateway session-state timing: terminal lifecycle now hides a run from sessions.list before chat.send finally settles, so maintainers should accept that completed runs are no longer user-abortable during that short retry-guard window. is it mitigable? |
|
ClawSweeper assist: The risk is that It looks partly mitigated already: the follow-up commit preserves the controller entry and only sets Evidence:
Suggested next action: Ask Source: #87767 (comment) |
|
@clawsweeper visualize |
|
🦞👀 I queued a read-only visual pass. It will create or update one marker-backed visual brief comment and will not trigger close, merge, repair, label, or branch changes. Lens: |
|
Source: #87767 (comment) Visual briefLens: state Advisory only. Maintainers remain the final judges. Proof snapshot: Maintainer rulingBenefit: Completed chat runs should stop leaving the Control UI stuck in a running/abortable state. Risk: Gateway session projection now intentionally diverges from raw in-memory abort-controller presence during a short retry/idempotency guard window. Proof needed: Maintainer confidence that no real abort, retry, reconnect, or message-delivery path depends on terminal runs remaining user-visible as abortable during that window; confirm current required checks. Recommended next action: Review the gateway lifecycle ordering and the Question presented: Is it acceptable for terminal runs to remain internally guarded while no longer being advertised through |
|
@clawsweeper automerge |
|
🦞🔧
Draft PRs stay fix-only until GitHub marks them ready for review. Pause with Automerge progress:
|
|
ClawSweeper 🐠 reef update Thanks for the work here. GitHub would not let ClawSweeper push to this branch with the available credentials, so the fix moved to a narrow replacement PR. nothing personal, just permission currents. Why replacement: ClawSweeper could not update the source PR branch directly; GitHub did not grant sufficient push rights to the bot for that branch.
fish notes: model gpt-5.5, reasoning high; reviewed against e4d4995. |
Summary
What problem does this PR solve?
Why does this matter now?
status: "running"withhasActiveRun: true.hasActiveRun: false; this PR fixes the gateway source of the stalehasActiveRun: trueprojection.What is the intended outcome?
sessions.changed, so a follow-upsessions.listcannot revivehasActiveRun: truefor a completed run.What is intentionally out of scope?
What does success look like?
runningrows as active when there is no explicithasActiveRunflag.What should reviewers focus on?
sessions.changedis broadcast.hasActiveRun: true.Linked context
Which issue does this close?
Closes #86939
Which issues, PRs, or discussions are related?
Related #87017
Was this requested by a maintainer or owner?
No direct maintainer request for this follow-up PR; it follows up on the user-visible issue and the merged partial fix in #87017.
Prior fix analysis
Related PR #87017 fixed the UI-side stale-row variant where the gateway returned
status: "running"withhasActiveRun: false.isSessionRunActivenow treats an explicit booleanhasActiveRunas authoritative when the row is not terminal, sostatus: "running" + hasActiveRun: falseno longer keeps the WebChat UI abortable.status: "running" + hasActiveRun: true, which is a different state. With fix(ui): ignore stale running session rows #87017 applied, the UI correctly returns active for that row because an explicit live-run flag means the session is abortable.truecan happen: terminal lifecycle handling broadcasts terminalsessions.changedbefore the chat send.finally()path clearschatAbortControllers; meanwhilesessions.listcomputeshasActiveRundirectly fromchatAbortControllers.sessions.changedbroadcasts.Real behavior proof (required for external PRs)
After-fix recording showing Control UI session correctly exits abortable state after chat completion:
2026-05-29.3.51.07.mov
Targeted tests pass, lint passes, diff whitespace check passes, and autoreview reported no accepted/actionable findings.
sessions.changedbroadcasts, preventing follow-upsessions.listfrom revivinghasActiveRun: truefor the completed run.Before-fix recording showing the stuck abortable state:
2026-05-29.3.45.26.mov
Tests and validation
Which commands did you run?
What regression coverage was added or updated?
src/gateway/server-chat.agent-events.test.tscoverage asserting terminal lifecycle handling clears tracked active runs before terminalsessions.changedbroadcasts.sessions.listcoverage was rerun to confirmhasActiveRunis still projected from live active-run tracking.What failed before this fix, if known?
chatAbortControllersstill contained the completed run, allowing a subsequentsessions.listresponse to reporthasActiveRun: true.If no test was added, why not?
N/A; this PR adds targeted regression coverage.
Risk checklist
Did user-visible behavior change? (
Yes/No)Yes.
Did config, environment, or migration behavior change? (
Yes/No)No.
Did security, auth, secrets, network, or tool execution behavior change? (
Yes/No)No.
What is the highest-risk area?
.finally()cleanup path.How is that risk mitigated?
.finally()cleanup remains idempotent.Current review state
What is the next action?
What is still waiting on author, maintainer, CI, or external proof?
Which bot or reviewer comments were addressed?
hasActiveRun: false; this follow-up handles stalehasActiveRun: trueat the gateway source.