test: guard websocket stale final turn lineage by 100yenadmin · Pull Request #78147 · openclaw/openclaw

100yenadmin · 2026-05-06T00:10:18Z

Summary

add a regression/protective test for the [Bug]: Subagent announce can deliver stale output and subagent sessions may inherit unrelated history #78055 stale-final replay shape: final answer A completes, user asks B, WS sends an incremental B suffix, then a stale response.completed carrying turn A metadata arrives before the real turn B completion
add lineage validation for OpenAI WS terminal events when native OpenAI turn metadata is echoed, ignoring stale response.completed / response.failed events instead of finalizing the active turn
type ResponseObject.metadata so tests and runtime can inspect echoed Responses metadata

Why

This ties directly to #78055 and the duplicate/stale final-answer after tool-chain report. It also overlaps the incremental / previous_response_id surfaces discussed around #76905, #76888, #77642, #78060, and the broader runtime/finalization issues in #76990 / #77445.

Verification

./node_modules/.bin/tsc --noEmit --pretty false --project tsconfig.core.projects.json ✅
vitest run src/agents/openai-ws-stream.test.ts -t "stale completed" --maxWorkers=1 --no-fileParallelism --reporter=dot started, the targeted test emitted a passing dot, but the local process was SIGKILLed afterward on this Mac with /System/Volumes/Data at 100% / ~121MiB free. Treat full test run as not completed locally.

Fixes #78055.
Refs #76905, #76888, #77642, #78060, #76990, #77445.

clawsweeper · 2026-05-06T00:13:35Z

Codex review: needs real behavior proof before merge.

Summary
Review failed before ClawSweeper could summarize the requested change.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Real behavior proof
Not applicable: Real behavior proof was not assessed because the Codex review failed.

Next step before merge
Review did not complete, so no work-lane recommendation was made.

Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

What I checked:

failure reason: codex execution failed.
codex failure detail: Command failed: /usr/bin/git status --porcelain=v1 --untracked-files=all.
codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)

Remaining risk / open question:

No close action taken because the review did not complete.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 3b626e4e36fe.

100yenadmin · 2026-05-08T08:49:17Z

Heads up — this isn't actually test-only. The PR's runtime change introduces a buffer-loss bug that the new test doesn't surface.

The bug. In src/agents/openai-ws-stream.ts, the new "ignore stale terminal event" branches at :1200-1202 (response.completed) and :1236-1238 (response.failed) call:

outputItemPhaseById.clear()
outputTextByPart.clear()
emittedTextByPart.clear()

…then return. Those three maps (declared at :1018-1020 inside the per-request closure) are written by the active turn's deltas (:1165-1166, :1072) and read by the normal completion path (:1205-1207) to assemble the final assistant message before the done event at :1224. Every other clear site in the file is terminal (followed by cleanup() + resolve/reject): abort :1086-1088, WS close :1099-1101, real completed :1205-1207, real failed :1241-1243, error :1255-1257. Only the new stale branches clear AND return — the subscription stays live but the buffer is gone.

So if turn B has accumulated any deltas before a stale response.completed arrives carrying turn-A metadata, the stale-ignore branch destroys them. When turn B's real completion lands, the assistant message is missing every delta that landed before the stale event. State-wise this is strictly worse than no-op — "ignore stale" in the sense of "leave state alone" was the right intent, but the implementation does the opposite.

The added test (openai-ws-stream.test.ts:3075-3158) doesn't surface this because it emits zero deltas between the turn-B send and the stale response.completed.

Smallest fix — six lines deleted, no other changes:

Delete the three .clear() calls in each stale branch (:1200-1202 and :1236-1238). No leak risk: the maps are per-request closure scope (new Map() inside the inner streamFn), so they GC when the request resolves; turn B's items have fresh server-assigned IDs that won't collide with leftover turn-A keys; the phase-replay loop at :1149 filters by ${event.item.id}: prefix, so leftover entries are skipped anyway.

Add the missing interleaved-delta regression test:

emit response.created (turn B)
emit response.output_item.added { item.id: "msg_b", phase: "answer" }
emit response.output_text.delta { item_id: "msg_b", delta: "Hello " }
emit response.output_text.delta { item_id: "msg_b", delta: "world" }
emit response.completed { id: "resp_turn_a", metadata.openclaw_turn_id: "turn_a" }   // STALE
emit response.output_text.done { item_id: "msg_b", text: "Hello world" }
emit response.completed { id: "resp_turn_b", metadata.openclaw_turn_id: "turn_b" }   // REAL
assert finalAssistantMessage.text === "Hello world"

Today this test fails on this PR head; after the six-line removal it passes.

Plus: Real behavior proof needs a captured trace, and check-test-types (= pnpm tsgo:test) needs to clear.

Alternative shape if you'd rather have cleaner history: split into a runtime-fix PR (clear-removal + interleaved-delta test) and keep this one as test-only against the post-fix tree.

…inal events Stale response.completed / response.failed events arriving mid-turn were clearing outputItemPhaseById, outputTextByPart, and emittedTextByPart before returning, which silently dropped subsequent text deltas for the active turn (phase metadata gone) or duplicated already-emitted text (emitted-text tracker gone). 'Ignore stale' now leaves the active-turn state intact so live deltas continue streaming uninterrupted. Adds a regression test that emits deltas for turn B, sends a stale response.completed carrying turn-A metadata mid-stream, then asserts all turn-B text_delta events reach the consumer in order with no drops or duplicates. The existing stale-completed test (no interleaved deltas) still passes.

steipete · 2026-05-09T08:40:05Z

Thanks @100yenadmin. Closing this as superseded by #79726.

This PR was plausible while OpenClaw owned the OpenAI Responses WebSocket event loop, but the better fix is to delete that owner boundary instead of hardening it. #79726 removes the custom src/agents/openai-ws-* transport, the stale terminal-event test surface, the custom WS session cleanup path, and the openaiWsWarmup config/docs surface. Explicit openai-codex/* Responses runs now go through PI native Codex Responses streaming, with OpenClaw only wrapping the stream for auth injection, abort signals, session id propagation, and prompt-cache boundary stripping.

That means the stale response.completed / response.failed metadata validation added here no longer has a runtime call site in OpenClaw. PI now owns the WebSocket/SSE transport behavior; OpenClaw should not keep parallel stale-event handling for a deleted transport.

Proof in #79726:

deleted src/agents/openai-ws-connection.ts, src/agents/openai-ws-stream.ts, src/agents/openai-ws-message-conversion.ts, and related tests
removed createOpenAIWebSocketStreamFn, releaseWsSession, shouldUseOpenAIWebSocketTransport, and openaiWsWarmup references from the OpenClaw-owned paths
rg over src/agents, extensions/openai, docs, src/proxy-capture, and test/helpers finds no remaining custom WS symbols/config/docs
targeted PI/OpenAI tests passed, and Testbox pnpm check:changed passed

#78055 is still referenced from #79726, but I am not treating this PR as the canonical issue fix because that issue also contains non-WS subagent announce/session-history contamination evidence.

test: guard websocket stale final turn lineage

e37ce97

openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026

openclaw-barnacle Bot added size: M and removed size: S labels May 8, 2026

steipete mentioned this pull request May 9, 2026

refactor: use PI Codex Responses transport #79726

Merged

steipete closed this May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test: guard websocket stale final turn lineage#78147

test: guard websocket stale final turn lineage#78147
100yenadmin wants to merge 2 commits into
openclaw:mainfrom
electricsheephq:test/78055-stale-final-regression

100yenadmin commented May 6, 2026

Uh oh!

clawsweeper Bot commented May 6, 2026 •

edited

Loading

Uh oh!

100yenadmin commented May 8, 2026

Uh oh!

steipete commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

100yenadmin commented May 6, 2026

Summary

Why

Verification

Uh oh!

clawsweeper Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

100yenadmin commented May 8, 2026

Uh oh!

steipete commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clawsweeper Bot commented May 6, 2026 •

edited

Loading