Skip to content

test: guard websocket stale final turn lineage#78147

Closed
100yenadmin wants to merge 2 commits into
openclaw:mainfrom
electricsheephq:test/78055-stale-final-regression
Closed

test: guard websocket stale final turn lineage#78147
100yenadmin wants to merge 2 commits into
openclaw:mainfrom
electricsheephq:test/78055-stale-final-regression

Conversation

@100yenadmin

Copy link
Copy Markdown
Contributor

Summary

  • add a regression/protective test for the [Bug]: Subagent announce can deliver stale output and subagent sessions may inherit unrelated history #78055 stale-final replay shape: final answer A completes, user asks B, WS sends an incremental B suffix, then a stale response.completed carrying turn A metadata arrives before the real turn B completion
  • add lineage validation for OpenAI WS terminal events when native OpenAI turn metadata is echoed, ignoring stale response.completed / response.failed events instead of finalizing the active turn
  • type ResponseObject.metadata so tests and runtime can inspect echoed Responses metadata

Why

This ties directly to #78055 and the duplicate/stale final-answer after tool-chain report. It also overlaps the incremental / previous_response_id surfaces discussed around #76905, #76888, #77642, #78060, and the broader runtime/finalization issues in #76990 / #77445.

Verification

  • ./node_modules/.bin/tsc --noEmit --pretty false --project tsconfig.core.projects.json
  • vitest run src/agents/openai-ws-stream.test.ts -t "stale completed" --maxWorkers=1 --no-fileParallelism --reporter=dot started, the targeted test emitted a passing dot, but the local process was SIGKILLed afterward on this Mac with /System/Volumes/Data at 100% / ~121MiB free. Treat full test run as not completed locally.

Fixes #78055.
Refs #76905, #76888, #77642, #78060, #76990, #77445.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@clawsweeper

clawsweeper Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
Review failed before ClawSweeper could summarize the requested change.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Real behavior proof
Not applicable: Real behavior proof was not assessed because the Codex review failed.

Next step before merge
Review did not complete, so no work-lane recommendation was made.

Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

What I checked:

  • failure reason: codex execution failed.
  • codex failure detail: Command failed: /usr/bin/git status --porcelain=v1 --untracked-files=all.
  • codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

  • unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)

Remaining risk / open question:

  • No close action taken because the review did not complete.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3b626e4e36fe.

@100yenadmin

Copy link
Copy Markdown
Contributor Author

Heads up — this isn't actually test-only. The PR's runtime change introduces a buffer-loss bug that the new test doesn't surface.

The bug. In src/agents/openai-ws-stream.ts, the new "ignore stale terminal event" branches at :1200-1202 (response.completed) and :1236-1238 (response.failed) call:

outputItemPhaseById.clear()
outputTextByPart.clear()
emittedTextByPart.clear()

…then return. Those three maps (declared at :1018-1020 inside the per-request closure) are written by the active turn's deltas (:1165-1166, :1072) and read by the normal completion path (:1205-1207) to assemble the final assistant message before the done event at :1224. Every other clear site in the file is terminal (followed by cleanup() + resolve/reject): abort :1086-1088, WS close :1099-1101, real completed :1205-1207, real failed :1241-1243, error :1255-1257. Only the new stale branches clear AND return — the subscription stays live but the buffer is gone.

So if turn B has accumulated any deltas before a stale response.completed arrives carrying turn-A metadata, the stale-ignore branch destroys them. When turn B's real completion lands, the assistant message is missing every delta that landed before the stale event. State-wise this is strictly worse than no-op — "ignore stale" in the sense of "leave state alone" was the right intent, but the implementation does the opposite.

The added test (openai-ws-stream.test.ts:3075-3158) doesn't surface this because it emits zero deltas between the turn-B send and the stale response.completed.

Smallest fix — six lines deleted, no other changes:

Delete the three .clear() calls in each stale branch (:1200-1202 and :1236-1238). No leak risk: the maps are per-request closure scope (new Map() inside the inner streamFn), so they GC when the request resolves; turn B's items have fresh server-assigned IDs that won't collide with leftover turn-A keys; the phase-replay loop at :1149 filters by ${event.item.id}: prefix, so leftover entries are skipped anyway.

Add the missing interleaved-delta regression test:

emit response.created (turn B)
emit response.output_item.added { item.id: "msg_b", phase: "answer" }
emit response.output_text.delta { item_id: "msg_b", delta: "Hello " }
emit response.output_text.delta { item_id: "msg_b", delta: "world" }
emit response.completed { id: "resp_turn_a", metadata.openclaw_turn_id: "turn_a" }   // STALE
emit response.output_text.done { item_id: "msg_b", text: "Hello world" }
emit response.completed { id: "resp_turn_b", metadata.openclaw_turn_id: "turn_b" }   // REAL
assert finalAssistantMessage.text === "Hello world"

Today this test fails on this PR head; after the six-line removal it passes.

Plus: Real behavior proof needs a captured trace, and check-test-types (= pnpm tsgo:test) needs to clear.

Alternative shape if you'd rather have cleaner history: split into a runtime-fix PR (clear-removal + interleaved-delta test) and keep this one as test-only against the post-fix tree.

…inal events

Stale response.completed / response.failed events arriving mid-turn were
clearing outputItemPhaseById, outputTextByPart, and emittedTextByPart
before returning, which silently dropped subsequent text deltas for the
active turn (phase metadata gone) or duplicated already-emitted text
(emitted-text tracker gone). 'Ignore stale' now leaves the active-turn
state intact so live deltas continue streaming uninterrupted.

Adds a regression test that emits deltas for turn B, sends a stale
response.completed carrying turn-A metadata mid-stream, then asserts
all turn-B text_delta events reach the consumer in order with no drops
or duplicates. The existing stale-completed test (no interleaved
deltas) still passes.
@steipete

steipete commented May 9, 2026

Copy link
Copy Markdown
Contributor

Thanks @100yenadmin. Closing this as superseded by #79726.

This PR was plausible while OpenClaw owned the OpenAI Responses WebSocket event loop, but the better fix is to delete that owner boundary instead of hardening it. #79726 removes the custom src/agents/openai-ws-* transport, the stale terminal-event test surface, the custom WS session cleanup path, and the openaiWsWarmup config/docs surface. Explicit openai-codex/* Responses runs now go through PI native Codex Responses streaming, with OpenClaw only wrapping the stream for auth injection, abort signals, session id propagation, and prompt-cache boundary stripping.

That means the stale response.completed / response.failed metadata validation added here no longer has a runtime call site in OpenClaw. PI now owns the WebSocket/SSE transport behavior; OpenClaw should not keep parallel stale-event handling for a deleted transport.

Proof in #79726:

  • deleted src/agents/openai-ws-connection.ts, src/agents/openai-ws-stream.ts, src/agents/openai-ws-message-conversion.ts, and related tests
  • removed createOpenAIWebSocketStreamFn, releaseWsSession, shouldUseOpenAIWebSocketTransport, and openaiWsWarmup references from the OpenClaw-owned paths
  • rg over src/agents, extensions/openai, docs, src/proxy-capture, and test/helpers finds no remaining custom WS symbols/config/docs
  • targeted PI/OpenAI tests passed, and Testbox pnpm check:changed passed

#78055 is still referenced from #79726, but I am not treating this PR as the canonical issue fix because that issue also contains non-WS subagent announce/session-history contamination evidence.

@steipete steipete closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Subagent announce can deliver stale output and subagent sessions may inherit unrelated history

3 participants