fix(agents): preserve commentary/final_answer phase separation#59643
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f545716520
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const blockReplyChunk = replace ? cleanedText : deltaText; | ||
| if (!shouldDeferTextEndBlockReply && blockReplyChunk) { | ||
| if (ctx.blockChunker) { | ||
| ctx.blockChunker.append(blockReplyChunk); | ||
| } else { |
There was a problem hiding this comment.
Append full final text when commentary was deferred
When blockReplyChunking is enabled with blockReplyBreak: "text_end", phase-aware handling appends only deltaText for final_answer blocks (replace ? cleanedText : deltaText). If commentary was previously deferred and the final answer extends that commentary (for example "Hello" -> "Hello world"), the chunk buffer contains only the suffix. On providers/paths that do not emit text_end before message_end (the OpenAI WS path streams text_delta + done), handleMessageEnd drains that buffer and emits a truncated block reply (e.g. " world") instead of the full final answer.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR fixes mixed-phase assistant turns (commentary + final_answer) being flattened into user-visible output by preserving phase separation across storage/replay, OpenAI WS streaming partials, embedded subscribe handling, and visible-text extraction/delivery.
Changes:
- Added phase-aware visible text extraction that prefers
final_answerovercommentary, with legacy fallback. - Updated embedded subscribe streaming/delivery to suppress commentary block replies and replace previews when final output arrives.
- Updated OpenAI WS stream partial emission and replay conversion to carry phase via
textSignatureand split replayed assistant text on phase changes; added regression tests across seams.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/agents/pi-embedded-utils.ts | Adds phase-aware extraction (extractAssistantVisibleText) and shared sanitization helpers. |
| src/agents/pi-embedded-utils.test.ts | Adds unit coverage for visible-text phase preference and legacy fallback behavior. |
| src/agents/pi-embedded-subscribe.handlers.messages.ts | Uses phase-aware extraction during streaming and message_end delivery; suppresses commentary block replies and supports replacement updates. |
| src/agents/pi-embedded-subscribe.handlers.messages.test.ts | Adds regression coverage for preview replacement and message_end replacement emission. |
| src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts | Extends seam tests for text_end/message_end duplication and phase-aware suppression behaviors. |
| src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.ts | Makes tests async and flushes microtasks to validate non-duplication reliably. |
| src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts | Makes parametrized test async and flushes microtasks for text_end behavior. |
| src/agents/openai-ws-stream.ts | Emits accumulated, phase-attributed text partials using output item phase mapping and stable textSignature. |
| src/agents/openai-ws-stream.test.ts | Adds replay splitting + mixed-phase message construction tests and WS streaming phase mapping tests. |
| src/agents/openai-ws-message-conversion.ts | Splits replayed assistant text into separate input items when block signatures indicate phase changes; omits top-level phase for mixed-phase stored messages. |
Comments suppressed due to low confidence (1)
src/agents/pi-embedded-subscribe.handlers.messages.ts:651
- In the message_end block-reply path,
ctx.state.lastBlockReplyTextis set to the rawtextvalue (nostripDowngradedToolCallText+ notrimEnd), butflushBlockReplyBuffer()later emits viaemitBlockChunk(), which compares againstlastBlockReplyTextusing a different normalization (stripDowngradedToolCallText(stripBlockTags(...)).trimEnd()). If the bufferedblockBuffercontains trailing whitespace/newlines or downgraded tool-call markers, the equality check can fail and the buffered block can be emitted again, reintroducing duplicate replies inblockReplyBreak === "message_end"flows.
Consider normalizing lastBlockReplyText the same way emitBlockChunk() does and/or clearing/resetting blockBuffer when emitting the final message_end reply so the subsequent flushBlockReplyBuffer() cannot re-emit the same content.
if (hasBufferedBlockReply && ctx.blockChunker?.hasBuffered()) {
ctx.blockChunker.drain({ force: true, emit: ctx.emitBlockChunk });
ctx.blockChunker.reset();
} else if (text !== ctx.state.lastBlockReplyText) {
// Check for duplicates before emitting (same logic as emitBlockChunk).
const normalizedText = normalizeTextForComparison(text);
if (
isMessagingToolDuplicateNormalized(
normalizedText,
ctx.state.messagingToolSentTextsNormalized,
)
) {
ctx.log.debug(
`Skipping message_end block reply - already sent via messaging tool: ${text.slice(0, 50)}...`,
);
} else {
ctx.state.lastBlockReplyText = text;
emitSplitResultAsBlockReply(ctx.consumeReplyDirectives(text, { final: true }));
}
Greptile SummaryThis PR preserves commentary/final_answer phase separation end-to-end: stored-message conversion now splits mixed-phase text into separate input items during replay, the WebSocket stream layer tracks per-item phases and emits phase-attributed Confidence Score: 5/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 258-260
Comment:
**Double call to `normalizeAssistantPhase`**
`normalizeAssistantPhase(parsed.phase)` is invoked twice — once to check truthiness and once to include the value. Extract it to a local variable to avoid the redundant call.
```suggestion
const phase = normalizeAssistantPhase(parsed.phase);
return {
...(typeof parsed.id === "string" ? { id: parsed.id } : {}),
...(phase ? { phase } : {}),
};
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.messages.ts
Line: 400-412
Comment:
**`lastBlockReplyText` guard scope**
The `!ctx.state.lastBlockReplyText` guard prevents the forced buffer reset when `text_end` carries the final-answer phase, but `lastBlockReplyText` is reset to `undefined` at `message_start` (in `pi-embedded-subscribe.ts`), so this is safe within a single message turn.
That said, the comment explaining why only-first-block-reply-in-this-turn gets the forced reset would help future readers understand this isn't protecting against a cross-turn leak.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 333-345
Comment:
**Up to three content-array passes per `extractAssistantVisibleText` call**
Each `extractAssistantTextForPhase` call does two scans of `msg.content` (`hasExplicitPhasedTextBlocks` scan + filter), so `extractAssistantVisibleText` can trigger up to six iterations over the content array. This is called on every streaming delta from `handleMessageUpdate`, which compounds for long responses. A single-pass accumulator that buckets text by phase would reduce this to O(n) once per call. Not blocking, but worth tracking for high-throughput paths.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Merge branch 'openclaw:main' into fix/co..." | Re-trigger Greptile |
| ...(normalizeAssistantPhase(parsed.phase) | ||
| ? { phase: normalizeAssistantPhase(parsed.phase) } | ||
| : {}), |
There was a problem hiding this comment.
Double call to
normalizeAssistantPhase
normalizeAssistantPhase(parsed.phase) is invoked twice — once to check truthiness and once to include the value. Extract it to a local variable to avoid the redundant call.
| ...(normalizeAssistantPhase(parsed.phase) | |
| ? { phase: normalizeAssistantPhase(parsed.phase) } | |
| : {}), | |
| const phase = normalizeAssistantPhase(parsed.phase); | |
| return { | |
| ...(typeof parsed.id === "string" ? { id: parsed.id } : {}), | |
| ...(phase ? { phase } : {}), | |
| }; |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 258-260
Comment:
**Double call to `normalizeAssistantPhase`**
`normalizeAssistantPhase(parsed.phase)` is invoked twice — once to check truthiness and once to include the value. Extract it to a local variable to avoid the redundant call.
```suggestion
const phase = normalizeAssistantPhase(parsed.phase);
return {
...(typeof parsed.id === "string" ? { id: parsed.id } : {}),
...(phase ? { phase } : {}),
};
```
How can I resolve this? If you propose a fix, please make it concise.| if ( | ||
| !shouldDeferTextEndBlockReply && | ||
| evtType === "text_end" && | ||
| !ctx.state.lastBlockReplyText && | ||
| cleanedText | ||
| ) { | ||
| if (ctx.blockChunker) { | ||
| ctx.blockChunker.reset(); | ||
| ctx.blockChunker.append(cleanedText); | ||
| } else { | ||
| ctx.state.blockBuffer = cleanedText; | ||
| } | ||
| } |
There was a problem hiding this comment.
lastBlockReplyText guard scope
The !ctx.state.lastBlockReplyText guard prevents the forced buffer reset when text_end carries the final-answer phase, but lastBlockReplyText is reset to undefined at message_start (in pi-embedded-subscribe.ts), so this is safe within a single message turn.
That said, the comment explaining why only-first-block-reply-in-this-turn gets the forced reset would help future readers understand this isn't protecting against a cross-turn leak.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.messages.ts
Line: 400-412
Comment:
**`lastBlockReplyText` guard scope**
The `!ctx.state.lastBlockReplyText` guard prevents the forced buffer reset when `text_end` carries the final-answer phase, but `lastBlockReplyText` is reset to `undefined` at `message_start` (in `pi-embedded-subscribe.ts`), so this is safe within a single message turn.
That said, the comment explaining why only-first-block-reply-in-this-turn gets the forced reset would help future readers understand this isn't protecting against a cross-turn leak.
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| export function extractAssistantVisibleText(msg: AssistantMessage): string { | ||
| const finalAnswerText = extractAssistantTextForPhase(msg, "final_answer"); | ||
| if (finalAnswerText.trim()) { | ||
| return finalAnswerText; | ||
| } | ||
|
|
||
| const commentaryText = extractAssistantTextForPhase(msg, "commentary"); | ||
| if (commentaryText.trim()) { | ||
| return commentaryText; | ||
| } | ||
|
|
||
| return extractAssistantTextForPhase(msg); | ||
| } |
There was a problem hiding this comment.
Up to three content-array passes per
extractAssistantVisibleText call
Each extractAssistantTextForPhase call does two scans of msg.content (hasExplicitPhasedTextBlocks scan + filter), so extractAssistantVisibleText can trigger up to six iterations over the content array. This is called on every streaming delta from handleMessageUpdate, which compounds for long responses. A single-pass accumulator that buckets text by phase would reduce this to O(n) once per call. Not blocking, but worth tracking for high-throughput paths.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 333-345
Comment:
**Up to three content-array passes per `extractAssistantVisibleText` call**
Each `extractAssistantTextForPhase` call does two scans of `msg.content` (`hasExplicitPhasedTextBlocks` scan + filter), so `extractAssistantVisibleText` can trigger up to six iterations over the content array. This is called on every streaming delta from `handleMessageUpdate`, which compounds for long responses. A single-pass accumulator that buckets text by phase would reduce this to O(n) once per call. Not blocking, but worth tracking for high-throughput paths.
How can I resolve this? If you propose a fix, please make it concise.|
Short blocker note from triage:
So current CI looks blocked by base/main failures unless a branch-specific regression shows up outside the touched phase-separation slice. |
|
#59643 looks directionally right for the mixed-phase assistant-turn bug family in the main WS -> embedded subscribe path. One thing we found while comparing this against a live webchat/dashboard repro: the remaining leak may be outside the exact scope of this PR, because some non-embedded consumers still appear to render assistant text generically rather than using phase-aware visible-text extraction. In other words, I think this PR fixes the core stream/subscriber path, but there may still be a follow-up needed for webchat/dashboard history/render consumers so #59150 can fully close. |
|
Maintainer note from current
So I am reviewing this PR as the active phase-separation candidate on top of a cleaner |
steipete
left a comment
There was a problem hiding this comment.
Two blockers from review against current main:
-
src/agents/pi-embedded-subscribe.handlers.messages.ts:299-330/:347-429- visible-output suppression is still computed from
evt.message, not fromassistantMessageEvent.partial - this PR then uses
partialAssistantto derivephaseAwareVisibleTextand can emit a commentary preview frompartial.phase === "commentary" - the added test
"replaces commentary preview text when a final_answer partial arrives"locks that behavior in - that is the opposite of the current fix direction from
#61282: commentary partials should stay suppressed, not become visible and later get replaced
- visible-output suppression is still computed from
-
src/agents/openai-ws-stream.ts:556-675- the late-map fallback still emits an unphased
text_deltawhenresponse.output_text.deltaarrives beforeresponse.output_item.added - the added test
"degrades safely when a text delta arrives before its item mapping"explicitly accepts that first unphased partial - for the bug family here, that is not safe: an unphased partial is treated as visible text by downstream consumers, so a commentary chunk can still leak before the phase metadata arrives
- the late-map fallback still emits an unphased
Because both cases still allow commentary-visible partial delivery on realistic fallback/event-order paths, I do not think this is landable yet as the fix for #59150.
f545716 to
501dcb1
Compare
|
Landed.
What changed on top:
Local gates:
Thanks @ringlochid. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 501dcb1f31
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if (event.type === "response.output_text.delta") { | ||
| const key = getOutputTextKey(event.item_id, event.content_index); | ||
| const nextText = `${outputTextByPart.get(key) ?? ""}${event.delta}`; | ||
| outputTextByPart.set(key, nextText); | ||
| if (outputItemPhaseById.has(event.item_id)) { |
There was a problem hiding this comment.
Allow HTTP fallback when phase mapping has not emitted output
When a response.output_text.delta arrives before response.output_item.added, this branch buffers text and emits nothing until phase metadata is known, but sawWsOutput is still marked true earlier in the same handler. If the socket then drops before the item mapping event, auto mode skips both retry/fallback (!sawWsOutput check in the catch path) even though no user-visible partial was emitted, so the turn fails instead of cleanly falling back to HTTP on transient disconnects.
Useful? React with 👍 / 👎.
|
Thanks for landing this and for the follow-up tightening. The merged phase-separation coverage looks solid for the core WS → embedded subscribe path. One narrow test gap I noticed after the merge: the late-map buffering test currently only exercises I'll send a small test-only follow-up PR for that if it's useful. Closed my earlier #61457 since it's now redundant against merged main. |
Summary
commentaryandfinal_answertext can be flattened into one visible output, which leaks commentary into user-facing replies and can produce duplicate or malformed final delivery.final_answerwhile still falling back safely when no final text exists.Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause / Regression History (if applicable)
text_end/message_enddelivery interactions where commentary previews must be suppressed/replaced by final output.git blame, prior PR, issue, or refactor if known): this bug family overlaps longstanding leakage/duplication reports in Commentary text can leak into final assistant replies, and duplicate visible replies can occur after tool sends #59150, ACP stream deliveryMode: final_only still sends duplicate block+final replies #56198, ACP block replies not marked as visible text on Discord, causing full message duplication #58892, [Bug]: Telegram/Codex still leaks commentary-phase tool-call trace text with multilingual garbage tokens in 2026.3.13 #52084, Text between tool calls leaks to messaging channels #25592, and message.send triggers duplicate delivery-mirror replies #44467. Adjacent prior art includes fix: strip raw OpenAI function-call wire format from user-facing text #30479 (stripping raw user-facing protocol leakage) and WhatsApp: send commentary live #57484 (commentary-delivery semantics on a channel path), but those did not fix mixed-phase persistence/replay/delivery end-to-end.Regression Test Plan (if applicable)
src/agents/openai-ws-stream.test.tssrc/agents/pi-embedded-utils.test.tssrc/agents/pi-embedded-subscribe.handlers.messages.test.tssrc/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.tssrc/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.tssrc/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.tsfinal_answer -> commentary -> legacy/unphased; commentarytext_endblock replies are suppressed until final delivery; final replacement atmessage_endworks; and duplicate/prefix-extension regressions remain fixed.User-visible / Behavior Changes
final_answertext when both commentary and final-answer phases exist in one turn.Diagram (if applicable)
Security Impact (required)
Yes, explain risk + mitigation: N/ARepro + Verification
Environment
Steps
commentaryandfinal_answertext blocks.Expected
final_answerwhen present.Actual
Evidence
Attach at least one:
Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)
local transcript evidence showed assistant turns with both commentary and
final_answerblocks in a single messagetargeted regression slice passed:
8 suites / 164 testsfresh independent review loop passed for storage/replay, stream-phase, visible-delivery, and holistic closure before branch submission
Human Verification (required)
What you personally verified (not just CI), and how:
8 suites / 164 tests passedtext_endblock repliesmessage_endafter commentary streamed firstpartial.phasetext_endregressionstsc --noEmiton this host (prior typecheck attempts were memory-constrained)src/agents/tools/sessions-helpers.tsReview Conversations
Compatibility / Migration
Risks and Mitigations
sessions-helpersparity as explicit follow-up watchpoint rather than silently broadening behavior.