fix: trace OpenAI WebSocket response lineage#78146
Conversation
|
@steipete serious bug in the latest update. users seeing this in telegram and web chat. |
|
Codex review: needs real behavior proof before merge. Summary Reproducibility: no. not for the original stale-replay bug: the PR supplies source-level instrumentation and tests, but no real runtime reproduction or after-fix trace. The earlier PR-introduced debug-field issue is source-checkable and appears fixed in the current head. Real behavior proof Next step before merge Security Review detailsBest possible solution: Keep the narrow redacted lineage instrumentation, require a captured runtime trace from the affected WebSocket path, and then let maintainers review it alongside the linked stale-replay work. Do we have a high-confidence way to reproduce the issue? No, not for the original stale-replay bug: the PR supplies source-level instrumentation and tests, but no real runtime reproduction or after-fix trace. The earlier PR-introduced debug-field issue is source-checkable and appears fixed in the current head. Is this the best way to solve the issue? Yes, with a merge gate: the current code direction is a narrow, redacted diagnostic change that now distinguishes chained versus stripped previous ids. It is not ready to merge until the contributor adds real behavior proof from the WebSocket path. Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 838565fe5997. |
|
Three small fixes plus the proof gate would close this out. Verified against 1. Misleading debug field in the Smallest fix: in 2. 3. One-line test to lock in the contract. Near Plus:
|
The full_context branch strips previous_response_id from the wire payload but the debug record was reporting it under the same field as the incremental (chained) case, so on-call could not tell whether the chain landed or was dropped intentionally. Split the field: previousResponseId is set only when the chain went on the wire, and a new requestedPreviousResponseIdStripped surfaces 'requested but stripped' unambiguously. The completion-lineage log mirrors the split with chainedPreviousResponseId vs requestedPreviousResponseIdStripped. Replaces [...messages].reverse().find(...) in summarizeWsContextLineage with messages.findLast(...) (one allocation, ES2023, satisfies unicorn/no-array-reverse). Updates the existing full_context planner test to lock in the contract that the debug record cannot advertise a chain when the wire payload does not carry one.
|
Thanks @100yenadmin. Closing this as superseded by #79726. This instrumentation made sense only if we kept investigating OpenClaw's custom OpenAI Responses WebSocket planner/session lineage. We are not keeping that stack. #79726 deletes the custom Because the OpenClaw-owned WebSocket request planner and terminal-event loop are gone, the redacted lineage logs introduced here would have no production path to instrument. Any future transport-level lineage diagnostics should live where the transport now lives: PI / Codex Responses streaming, not OpenClaw's removed compatibility layer. Proof in #79726:
#78055 remains referenced from #79726, but this diagnostic PR is no longer the right vehicle for that issue. |
Summary
previous_response_id, baseline/full/suffix lengths, and suffix item summaries without prompt/tool-result text.response.completedis accepted into the transcript, tying the generated request id to the accepted response id and replay item count.Why
This provides safe trajectory evidence for duplicate/stale final-answer replay investigations where the failure appears to involve the OpenAI-Codex WebSocket incremental /
previous_response_idpath.References/ties: #78055, #76905, #76888, #77642, #78060, #76990, #77445, #67777, #39032.
Tests
node scripts/test-projects.mjs src/agents/openai-ws-stream.test.tspnpm exec oxfmt --check src/agents/openai-ws-request.ts src/agents/openai-ws-stream.ts src/agents/openai-ws-stream.test.tsgit diff --checkNotes
pnpm tsgo:test:srcwas attempted, but the process was SIGKILLed in this local worktree before emitting diagnostics.