Skip to content

fix: trace OpenAI WebSocket response lineage#78146

Closed
100yenadmin wants to merge 2 commits into
openclaw:mainfrom
electricsheephq:eva/ws-lineage-tracing-78055
Closed

fix: trace OpenAI WebSocket response lineage#78146
100yenadmin wants to merge 2 commits into
openclaw:mainfrom
electricsheephq:eva/ws-lineage-tracing-78055

Conversation

@100yenadmin

Copy link
Copy Markdown
Contributor

Summary

  • Add redacted debug lineage to OpenAI WebSocket request planning: mode, previous_response_id, baseline/full/suffix lengths, and suffix item summaries without prompt/tool-result text.
  • Log per-request lineage before send, including context tail role/message id/parent id and latest user id when those ids are present.
  • Log completion lineage when a response.completed is accepted into the transcript, tying the generated request id to the accepted response id and replay item count.

Why

This provides safe trajectory evidence for duplicate/stale final-answer replay investigations where the failure appears to involve the OpenAI-Codex WebSocket incremental / previous_response_id path.

References/ties: #78055, #76905, #76888, #77642, #78060, #76990, #77445, #67777, #39032.

Tests

  • node scripts/test-projects.mjs src/agents/openai-ws-stream.test.ts
  • pnpm exec oxfmt --check src/agents/openai-ws-request.ts src/agents/openai-ws-stream.ts src/agents/openai-ws-stream.test.ts
  • git diff --check

Notes

  • pnpm tsgo:test:src was attempted, but the process was SIGKILLed in this local worktree before emitting diagnostics.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels May 6, 2026
@100yenadmin

Copy link
Copy Markdown
Contributor Author

@steipete serious bug in the latest update. users seeing this in telegram and web chat.

@clawsweeper

clawsweeper Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge.

Summary
The PR adds redacted OpenAI WebSocket request/completion lineage logging and planner tests that distinguish chained previous_response_id sends from stripped full-context recovery.

Reproducibility: no. not for the original stale-replay bug: the PR supplies source-level instrumentation and tests, but no real runtime reproduction or after-fix trace. The earlier PR-introduced debug-field issue is source-checkable and appears fixed in the current head.

Real behavior proof
Needs real behavior proof before merge: Needs real behavior proof before merge: the PR body and comments do not include a redacted runtime log, terminal output, screenshot, recording, or linked artifact showing the new lineage output after the fix; screenshots or recordings are preferred when they show the behavior, terminal/log output is fine, and private details such as IP addresses, API keys, phone numbers, and non-public endpoints should be redacted. After updating the PR body with proof, ClawSweeper should re-review automatically; if it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Contributor action is needed: automation cannot supply the required real behavior proof from the contributor's WebSocket setup.

Security
Cleared: The diff adds redacted debug summaries and focused tests only, with no dependency, workflow, package, secret-handling, or code-execution surface changes.

Review details

Best possible solution:

Keep the narrow redacted lineage instrumentation, require a captured runtime trace from the affected WebSocket path, and then let maintainers review it alongside the linked stale-replay work.

Do we have a high-confidence way to reproduce the issue?

No, not for the original stale-replay bug: the PR supplies source-level instrumentation and tests, but no real runtime reproduction or after-fix trace. The earlier PR-introduced debug-field issue is source-checkable and appears fixed in the current head.

Is this the best way to solve the issue?

Yes, with a merge gate: the current code direction is a narrow, redacted diagnostic change that now distinguishes chained versus stripped previous ids. It is not ready to merge until the contributor adds real behavior proof from the WebSocket path.

Acceptance criteria:

  • Contributor should add redacted runtime lineage proof from a real OpenAI WebSocket run.
  • node scripts/test-projects.mjs src/agents/openai-ws-stream.test.ts
  • pnpm exec oxfmt --check src/agents/openai-ws-request.ts src/agents/openai-ws-stream.ts src/agents/openai-ws-stream.test.ts
  • pnpm tsgo:test:src or the appropriate changed test-type lane
  • git diff --check

What I checked:

  • PR head separates chained from stripped previous ids: At the current head, previousResponseId is emitted only for incremental sends, while full-context sends record requestedPreviousResponseIdStripped under a distinct debug field. (src/agents/openai-ws-request.ts:153, c9caa7811ca9)
  • Request and completion logs are wired to the WebSocket send path: The PR creates a per-request lineage id before manager.send and logs completion lineage when response.completed is accepted into the transcript. (src/agents/openai-ws-stream.ts:1030, c9caa7811ca9)
  • Regression tests cover redaction and full-context semantics: The added tests assert that full-context debug does not advertise a wire chain and that tool arguments/output text are represented only by lengths, not raw content. (src/agents/openai-ws-stream.test.ts:1863, c9caa7811ca9)
  • Current main has the underlying planner behavior but not this instrumentation: Current main strips previous_response_id from full-context payloads and sends it only on incremental payloads, so the PR is still a live diagnostic addition rather than already implemented on main. (src/agents/openai-ws-request.ts:100, 838565fe5997)
  • Real behavior proof is still absent: The PR body lists targeted tests, formatting, and git diff --check, and the latest discussion still says the proof gate needs a captured trace; there is no after-fix runtime log, terminal output, screenshot, recording, or linked artifact showing the new lineage output in a real setup. (c9caa7811ca9)
  • Related bug context remains open: The primary linked stale subagent/WebSocket lineage investigation is still open, so this diagnostic PR should remain paired with maintainer review rather than be closed as obsolete.

Likely related people:

  • steipete: Current-main blame on the central WebSocket planner, stream send path, and completion handling points to Peter Steinberger, and prior review context connects the same area to recent OpenAI WebSocket replay and transport work. (role: recent maintainer and feature-history owner; confidence: high; commits: aca43b29e1b2, b666ce692fae, cabdf5bbc4c6; files: src/agents/openai-ws-request.ts, src/agents/openai-ws-stream.ts, src/agents/openai-ws-stream.test.ts)
  • vincentkoc: The prior ClawSweeper review context routes adjacent provider/WebSocket request policy work to this maintainer, which is relevant for review of request-planning diagnostics even though the direct local blame trail is strongest for steipete. (role: adjacent owner; confidence: medium; files: src/agents/openai-ws-request.ts, src/agents/openai-ws-stream.ts)

Remaining risk / open question:

  • The external PR still needs redacted after-fix proof from a real OpenAI WebSocket run before merge.
  • No live reproduction of the original stale-replay symptom is included with this PR; this review is based on source inspection, the provided discussion, and contributor-reported checks.
  • The PR body notes that pnpm tsgo:test:src was SIGKILLed locally, so full test-type validation is not proven by the contributor evidence.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 838565fe5997.

@100yenadmin

Copy link
Copy Markdown
Contributor Author

Three small fixes plus the proof gate would close this out. Verified against pr78146 head:

1. Misleading debug field in the full_context branch. In src/agents/openai-ws-request.ts:205 the destructure strips previous_response_id from the wire payload, but :207-216 still records it in the debug block, which is then surfaced as requestedPreviousResponseId by logWsCompletionLineage (src/agents/openai-ws-stream.ts:134-157). The same field is emitted for both incremental (chained, line 191) and full_context (stripped, line 211), so an on-call reading the log can't tell which case they're in — exactly the failure mode this PR's instrumentation is meant to make easier to debug.

Smallest fix: in openai-ws-request.ts:209-216, set previousResponseId: undefined and add a sibling requestedPreviousResponseIdStripped: params.previousResponseId. Or rename the consumer field to chainedPreviousResponseId and emit it only when mode === "incremental". Either makes "requested but stripped" vs "requested and chained" unambiguous.

2. check-lint (rule unicorn/no-array-reverse). The new [...messages].reverse().find(...) is at src/agents/openai-ws-stream.ts:95 (inside summarizeWsContextLineage). Rule fires via categories.correctness: error in .oxlintrc.json. Project targets ES2023 (engines.node >= 22.14.0) so the idiomatic fix is messages.findLast((message) => …) — one allocation instead of two.

3. One-line test to lock in the contract. Near src/agents/openai-ws-stream.test.ts:1863 (existing mode === "full_context" case): expect(plan.debug.previousResponseId).toBeUndefined();. Asserts that when the wire payload doesn't carry the chain, the debug record cannot advertise one either.

Plus: Real behavior proof needs a captured trace — once these three land, the trace this instrumentation produces is the natural source.

check-test-types runs pnpm tsgo:test (package.json:1336). check-lint will clear with (2).

The full_context branch strips previous_response_id from the wire payload
but the debug record was reporting it under the same field as the
incremental (chained) case, so on-call could not tell whether the chain
landed or was dropped intentionally. Split the field: previousResponseId
is set only when the chain went on the wire, and a new
requestedPreviousResponseIdStripped surfaces 'requested but stripped'
unambiguously. The completion-lineage log mirrors the split with
chainedPreviousResponseId vs requestedPreviousResponseIdStripped.

Replaces [...messages].reverse().find(...) in summarizeWsContextLineage
with messages.findLast(...) (one allocation, ES2023, satisfies
unicorn/no-array-reverse).

Updates the existing full_context planner test to lock in the contract
that the debug record cannot advertise a chain when the wire payload
does not carry one.
@steipete

steipete commented May 9, 2026

Copy link
Copy Markdown
Contributor

Thanks @100yenadmin. Closing this as superseded by #79726.

This instrumentation made sense only if we kept investigating OpenClaw's custom OpenAI Responses WebSocket planner/session lineage. We are not keeping that stack. #79726 deletes the custom src/agents/openai-ws-* implementation, removes the custom WebSocket cleanup/warmup/config surfaces, and routes explicit openai-codex/* Responses runs through PI native Codex Responses streaming instead.

Because the OpenClaw-owned WebSocket request planner and terminal-event loop are gone, the redacted lineage logs introduced here would have no production path to instrument. Any future transport-level lineage diagnostics should live where the transport now lives: PI / Codex Responses streaming, not OpenClaw's removed compatibility layer.

Proof in #79726:

  • removes openai-ws-request.ts, openai-ws-stream.ts, openai-ws-message-conversion.ts, and all related tests
  • removes openaiWsWarmup from OpenAI params/docs/tests
  • keeps only OpenClaw-owned wrapper behavior around PI: auth injection, abort signal propagation, session id propagation, and system-prompt cache boundary stripping
  • targeted PI/OpenAI tests passed, and Testbox pnpm check:changed passed

#78055 remains referenced from #79726, but this diagnostic PR is no longer the right vehicle for that issue.

@steipete steipete closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: M triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants