Skip to content

fix: reset websocket lineage after final answers#78142

Closed
100yenadmin wants to merge 1 commit into
openclaw:mainfrom
electricsheephq:fix/78055-incremental-lineage-guard
Closed

fix: reset websocket lineage after final answers#78142
100yenadmin wants to merge 1 commit into
openclaw:mainfrom
electricsheephq:fix/78055-incremental-lineage-guard

Conversation

@100yenadmin

Copy link
Copy Markdown
Contributor

Summary

Why

#78055 shows a fresh second final answer replaying an already-completed task after a newer user request and unrelated tool calls. That points at stale previous_response_id lineage being reused across completed final-answer boundaries. This surgical guard forces a full-context send for the first request after a phased final answer, then allows incremental behavior to resume inside the new turn.

Related

Tests

  • git diff --check
  • node scripts/run-vitest.mjs run --config test/vitest/vitest.unit.config.ts src/agents/openai-ws-stream.test.ts --maxWorkers=1 ⚠️ blocked: new worktree had no node_modules / vitest
  • With temporary node_modules symlink from sibling checkout: node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openai-ws-stream.test.ts --maxWorkers=1 ⚠️ blocked by host disk full (ENOSPC: no space left on device, write; root volume had ~120MiB available)
  • Commit used --no-verify because the local hook tried to run missing oxfmt in this worktree.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@clawsweeper

clawsweeper Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR changes OpenAI WebSocket request planning so a new user turn after a phased final answer sends full context instead of reusing previous_response_id, and adds a planner regression test.

Reproducibility: no. high-confidence live reproduction was established here. Source inspection does show the planner-level path on current main: a strict extension after a phased final answer can still reuse previous_response_id for a new user message.

Real behavior proof
Needs real behavior proof before merge: The PR body and comments contain tests/CI status but no after-fix real runtime artifact showing the stale-final replay is fixed. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Contributor action is needed for real behavior proof, and the PR also needs a changelog entry or maintainer exemption before normal review can proceed.

Security
Cleared: The diff only changes OpenAI WebSocket request planning and a unit test; it does not add dependencies, workflows, permissions, secrets handling, or new code-execution surfaces.

Review findings

  • [P2] Add the required changelog entry — src/agents/openai-ws-request.ts:121
Review details

Best possible solution:

Land a narrow version after the contributor adds redacted real behavior proof, a changelog entry, and targeted validation; keep the broader stale subagent/history work tracked separately.

Do we have a high-confidence way to reproduce the issue?

No high-confidence live reproduction was established here. Source inspection does show the planner-level path on current main: a strict extension after a phased final answer can still reuse previous_response_id for a new user message.

Is this the best way to solve the issue?

Yes for the narrow WebSocket lineage slice: forcing one full-context send across a final-answer-to-new-user boundary is targeted and preserves incremental sends for ordinary continuations. It is not merge-ready until proof and changelog are supplied.

Full review comments:

  • [P2] Add the required changelog entry — src/agents/openai-ws-request.ts:121
    This changes user-visible agent behavior by preventing stale OpenAI WebSocket lineage reuse after final answers, but the PR only changes source and tests. OpenClaw policy requires user-facing fixes to include an Unreleased CHANGELOG.md entry, otherwise the fix can ship without release notes.
    Confidence: 0.88

Overall correctness: patch is incorrect
Overall confidence: 0.82

What I checked:

  • Current-main planner behavior: Current main plans an incremental WebSocket payload whenever the full input strictly extends the prior request plus response chain, with no final-answer-to-new-user boundary check before attaching previous_response_id. (src/agents/openai-ws-request.ts:93, 0da9f7e88d6f)
  • PR behavior change: The PR adds final-answer and new-user-turn detection, then falls back to full context when both are true instead of returning the incremental payload. (src/agents/openai-ws-request.ts:121, db85f7f9c450)
  • Regression coverage: The PR adds a planner test asserting that a previous assistant final_answer followed by a new user message produces mode: full_context and omits previous_response_id. (src/agents/openai-ws-stream.test.ts:1835, db85f7f9c450)
  • CI status: The latest head has successful check runs including check-test-types, check-lint, check-prod-types, and the security fast checks. (db85f7f9c450)
  • Real behavior proof missing: The PR body and follow-up comments list git diff --check, blocked local test attempts, and CI status, but no after-fix OpenClaw runtime output, redacted logs, screenshot, recording, or artifact showing the stale-final replay no longer occurs. (db85f7f9c450)
  • Changelog entry missing: The PR patch modifies only src/agents/openai-ws-request.ts and src/agents/openai-ws-stream.test.ts; no CHANGELOG.md entry is added for this user-visible agent behavior fix. (CHANGELOG.md:5, db85f7f9c450)

Likely related people:

  • steipete: Peter Steinberger introduced the OpenAI Responses WebSocket-first stream path and appears throughout recent history for openai-ws-stream.ts, openai-ws-request.ts, and continuation/replay fixes. (role: introduced behavior and recent maintainer; confidence: high; commits: 7ced38b5ef2a, cabdf5bbc4c6, cf511288b84b; files: src/agents/openai-ws-stream.ts, src/agents/openai-ws-request.ts, src/agents/openai-ws-stream.test.ts)
  • vincentkoc: Vincent Koc has nearby merged work on provider request policy, native WebSocket endpoint checks, and prompt-cache boundary handling used by the affected WebSocket request path. (role: adjacent owner; confidence: medium; commits: 64f28906de09, e250ea3668f6, 5ddca5dd56e9; files: src/agents/openai-ws-request.ts, src/agents/openai-ws-stream.ts, src/agents/openai-ws-connection.ts)

Remaining risk / open question:

  • No after-fix real OpenClaw runtime artifact shows the stale final-answer replay is fixed in a real setup.
  • The user-facing fix lacks the required CHANGELOG.md entry.
  • The broader stale subagent/history cluster remains open, so this PR should stay scoped to the OpenAI WebSocket lineage slice.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0da9f7e88d6f.

Re-review progress:

@100yenadmin

Copy link
Copy Markdown
Contributor Author

Merge-tree is clean against current main and the review threads are clean. Three required checks are still red and blocking:

  1. Real behavior proof — the planner regression alone won't satisfy the gate. Needs a captured runtime artifact showing the new full-context-after-final-answer guard firing on a [Bug]: Subagent announce can deliver stale output and subagent sessions may inherit unrelated history #78055-style stream (the trace from the instrumentation in fix: trace OpenAI WebSocket response lineage #78146 would be a natural source for this — happy to coordinate if useful).
  2. check-test-types — runs pnpm tsgo:test (package.json:1336pnpm tsgo:core:test && pnpm tsgo:extensions:test). The PR body says local typecheck couldn't run; once node_modules is back, pnpm tsgo:test:src should reproduce the failing shard fastest.
  3. CHANGELOG entry, or maintainer exemption noted on the PR.

Otherwise this is the cleanest of the three #78055-cluster PRs (#78142 / #78146 / #78147) — surgical guard, narrow blast radius, planner regression in place.

@100yenadmin 100yenadmin force-pushed the fix/78055-incremental-lineage-guard branch from 8b572f2 to db85f7f Compare May 8, 2026 09:47
@steipete

steipete commented May 9, 2026

Copy link
Copy Markdown
Contributor

Thanks @100yenadmin. Closing this as superseded by #79726.

This PR works around a suspected stale replay by changing OpenClaw's custom OpenAI Responses WebSocket lineage planner. We are removing that planner instead. #79726 deletes the custom src/agents/openai-ws-* transport and hands explicit openai-codex/* Responses streaming back to PI's native WebSocket-capable transport.

That is the cleaner boundary: OpenAI/Codex Responses chaining stays inside the PI/Codex transport that owns it, and OpenClaw no longer needs to decide when to disable previous_response_id after a final answer. OpenClaw still wraps the stream for the pieces it owns: auth injection, abort signals, session id propagation, and prompt-cache boundary stripping.

Proof in #79726:

  • removes the deleted planner target (openai-ws-message-conversion.ts) and its tests instead of landing another heuristic in it
  • removes shouldUseOpenAIWebSocketTransport, custom WS session cleanup, and openaiWsWarmup from runtime/docs/tests
  • verifies no remaining custom WS symbols/config/docs with rg
  • targeted PI/OpenAI tests passed, and Testbox pnpm check:changed passed

#78055 is still referenced, but the remaining issue scope includes subagent announce/session-history contamination beyond this custom-WS lineage theory.

@steipete steipete closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants