Skip to content

fix(agents): preserve commentary/final_answer phase separation#59643

Merged
steipete merged 3 commits into
openclaw:mainfrom
ringlochid:fix/commentary-final-answer-phase-separation
Apr 5, 2026
Merged

fix(agents): preserve commentary/final_answer phase separation#59643
steipete merged 3 commits into
openclaw:mainfrom
ringlochid:fix/commentary-final-answer-phase-separation

Conversation

@ringlochid

@ringlochid ringlochid commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Problem: assistant turns that contain both commentary and final_answer text can be flattened into one visible output, which leaks commentary into user-facing replies and can produce duplicate or malformed final delivery.
  • Why it matters: this breaks the expected final-only user experience, causes duplicate replies after tool/send paths, and corrupts replay/context because mixed-phase text is persisted and replayed ambiguously.
  • What changed: preserved phase separation end-to-end across stored-message conversion, replay/input-item rebuilding, WebSocket partial phase propagation, and visible extraction/delivery so user-visible output prefers final_answer while still falling back safely when no final text exists.
  • What did NOT change (scope boundary): this does not globally redefine every text extractor in the repo, does not change tool-call semantics, and does not attempt a broader phase-aware audit outside the main OpenAI WS -> embedded subscribe -> visible delivery path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause / Regression History (if applicable)

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • src/agents/openai-ws-stream.test.ts
    • src/agents/pi-embedded-utils.test.ts
    • src/agents/pi-embedded-subscribe.handlers.messages.test.ts
    • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts
    • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts
    • src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.ts
  • Scenario the test should lock in: mixed commentary/final stored messages stay phase-separated on replay; stream partials preserve phase; visible extraction prefers final_answer -> commentary -> legacy/unphased; commentary text_end block replies are suppressed until final delivery; final replacement at message_end works; and duplicate/prefix-extension regressions remain fixed.
  • Why this is the smallest reliable guardrail: the bug spans stored-message conversion, stream partial attribution, and delivery seams. Unit-only coverage at one layer would miss the cross-layer collapse that caused the visible duplication/leak.
  • Existing test that already covers this (if any): none before this branch for the mixed-phase end-to-end path.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Visible assistant output now prefers final_answer text when both commentary and final-answer phases exist in one turn.
  • Commentary-only previews are no longer allowed to leak through as the final visible reply in the main embedded delivery path.
  • When commentary streamed first and final text arrives later, the final visible reply replaces the preview instead of duplicating it.
  • If no final-answer text exists, commentary/unphased fallback still works instead of producing an empty reply.

Diagram (if applicable)

Before:
[mixed commentary + final_answer blocks]
  -> [stored/replayed as flattened assistant text]
  -> [visible extractor concatenates all text]
  -> [commentary leak and/or duplicate final reply]

After:
[mixed commentary + final_answer blocks]
  -> [phase preserved in storage/replay/partials]
  -> [visible extractor prefers final_answer]
  -> [commentary preview suppressed/replaced]
  -> [single intended final visible reply]

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Ubuntu (local dev host)
  • Runtime/container: local Node/pnpm repo checkout
  • Model/provider: OpenAI WS / embedded subscribe path
  • Integration/channel (if any): embedded delivery path with Telegram/Discord-adjacent visible-output semantics
  • Relevant config (redacted): default OpenAI WS / embedded subscribe test harnesses; no special secrets required

Steps

  1. Produce or replay an assistant turn containing both commentary and final_answer text blocks.
  2. Observe stored/replayed assistant content and the user-visible delivery path.
  3. Confirm whether visible output leaks commentary, duplicates final delivery, or collapses mixed phases.

Expected

  • Stored and replayed assistant content preserves phase boundaries.
  • User-visible extraction prefers final_answer when present.
  • Commentary previews are suppressed or replaced rather than duplicated.

Actual

  • Before this fix: mixed-phase turns could flatten into one visible assistant reply, leak commentary, or produce duplicate final delivery.
  • On this branch: phase separation is preserved through replay and delivery, and the targeted duplicate/leak regressions are covered by tests.

Evidence

Attach at least one:

  • Failing test/log before + passing after

  • Trace/log snippets

  • Screenshot/recording

  • Perf numbers (if relevant)

  • local transcript evidence showed assistant turns with both commentary and final_answer blocks in a single message

  • targeted regression slice passed: 8 suites / 164 tests

  • fresh independent review loop passed for storage/replay, stream-phase, visible-delivery, and holistic closure before branch submission

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • inspected mixed-phase transcript evidence and confirmed commentary + final-answer coexistence in one assistant turn
    • verified stored-message/replay, stream partial propagation, and visible delivery changes in the touched files
    • ran the targeted regression slice and confirmed 8 suites / 164 tests passed
    • confirmed the branch diff remains scoped to the intended 10 files
  • Edge cases checked:
    • commentary leaking through text_end block replies
    • final replacement at message_end after commentary streamed first
    • legacy/unphased + phased replay collapse
    • signature-only phased partials without top-level partial.phase
    • prefix-extension and duplicate text_end regressions
  • What you did not verify:
    • full-repo tsc --noEmit on this host (prior typecheck attempts were memory-constrained)
    • a broader follow-up audit of other phase-blind helper paths such as src/agents/tools/sessions-helpers.ts

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: other assistant-text consumers outside the main embedded delivery path may still use phase-blind flattening and show adjacent inconsistencies.
    • Mitigation: this PR keeps scope tight to the verified main issue path and leaves sessions-helpers parity as explicit follow-up watchpoint rather than silently broadening behavior.
  • Risk: replay/delivery edge cases could regress around partial/final transitions.
    • Mitigation: regression coverage now locks in mixed-phase replay splitting, phase-aware partial attribution, commentary suppression, final replacement, and duplicate/text-end edge cases.

Copilot AI review requested due to automatic review settings April 2, 2026 11:34
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: XL labels Apr 2, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f545716520

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +391 to +395
const blockReplyChunk = replace ? cleanedText : deltaText;
if (!shouldDeferTextEndBlockReply && blockReplyChunk) {
if (ctx.blockChunker) {
ctx.blockChunker.append(blockReplyChunk);
} else {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Append full final text when commentary was deferred

When blockReplyChunking is enabled with blockReplyBreak: "text_end", phase-aware handling appends only deltaText for final_answer blocks (replace ? cleanedText : deltaText). If commentary was previously deferred and the final answer extends that commentary (for example "Hello" -> "Hello world"), the chunk buffer contains only the suffix. On providers/paths that do not emit text_end before message_end (the OpenAI WS path streams text_delta + done), handleMessageEnd drains that buffer and emits a truncated block reply (e.g. " world") instead of the full final answer.

Useful? React with 👍 / 👎.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes mixed-phase assistant turns (commentary + final_answer) being flattened into user-visible output by preserving phase separation across storage/replay, OpenAI WS streaming partials, embedded subscribe handling, and visible-text extraction/delivery.

Changes:

  • Added phase-aware visible text extraction that prefers final_answer over commentary, with legacy fallback.
  • Updated embedded subscribe streaming/delivery to suppress commentary block replies and replace previews when final output arrives.
  • Updated OpenAI WS stream partial emission and replay conversion to carry phase via textSignature and split replayed assistant text on phase changes; added regression tests across seams.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/agents/pi-embedded-utils.ts Adds phase-aware extraction (extractAssistantVisibleText) and shared sanitization helpers.
src/agents/pi-embedded-utils.test.ts Adds unit coverage for visible-text phase preference and legacy fallback behavior.
src/agents/pi-embedded-subscribe.handlers.messages.ts Uses phase-aware extraction during streaming and message_end delivery; suppresses commentary block replies and supports replacement updates.
src/agents/pi-embedded-subscribe.handlers.messages.test.ts Adds regression coverage for preview replacement and message_end replacement emission.
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts Extends seam tests for text_end/message_end duplication and phase-aware suppression behaviors.
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-duplicate-text-end-repeats-full.test.ts Makes tests async and flushes microtasks to validate non-duplication reliably.
src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts Makes parametrized test async and flushes microtasks for text_end behavior.
src/agents/openai-ws-stream.ts Emits accumulated, phase-attributed text partials using output item phase mapping and stable textSignature.
src/agents/openai-ws-stream.test.ts Adds replay splitting + mixed-phase message construction tests and WS streaming phase mapping tests.
src/agents/openai-ws-message-conversion.ts Splits replayed assistant text into separate input items when block signatures indicate phase changes; omits top-level phase for mixed-phase stored messages.
Comments suppressed due to low confidence (1)

src/agents/pi-embedded-subscribe.handlers.messages.ts:651

  • In the message_end block-reply path, ctx.state.lastBlockReplyText is set to the raw text value (no stripDowngradedToolCallText + no trimEnd), but flushBlockReplyBuffer() later emits via emitBlockChunk(), which compares against lastBlockReplyText using a different normalization (stripDowngradedToolCallText(stripBlockTags(...)).trimEnd()). If the buffered blockBuffer contains trailing whitespace/newlines or downgraded tool-call markers, the equality check can fail and the buffered block can be emitted again, reintroducing duplicate replies in blockReplyBreak === "message_end" flows.

Consider normalizing lastBlockReplyText the same way emitBlockChunk() does and/or clearing/resetting blockBuffer when emitting the final message_end reply so the subsequent flushBlockReplyBuffer() cannot re-emit the same content.

    if (hasBufferedBlockReply && ctx.blockChunker?.hasBuffered()) {
      ctx.blockChunker.drain({ force: true, emit: ctx.emitBlockChunk });
      ctx.blockChunker.reset();
    } else if (text !== ctx.state.lastBlockReplyText) {
      // Check for duplicates before emitting (same logic as emitBlockChunk).
      const normalizedText = normalizeTextForComparison(text);
      if (
        isMessagingToolDuplicateNormalized(
          normalizedText,
          ctx.state.messagingToolSentTextsNormalized,
        )
      ) {
        ctx.log.debug(
          `Skipping message_end block reply - already sent via messaging tool: ${text.slice(0, 50)}...`,
        );
      } else {
        ctx.state.lastBlockReplyText = text;
        emitSplitResultAsBlockReply(ctx.consumeReplyDirectives(text, { final: true }));
      }

@greptile-apps

greptile-apps Bot commented Apr 2, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR preserves commentary/final_answer phase separation end-to-end: stored-message conversion now splits mixed-phase text into separate input items during replay, the WebSocket stream layer tracks per-item phases and emits phase-attributed textSignature-stamped partials, and the delivery layer (handleMessageUpdate/handleMessageEnd) prefers final_answer text via the new extractAssistantVisibleText helper and suppresses commentary block replies until the final phase arrives. All remaining findings are P2.

Confidence Score: 5/5

  • Safe to merge; all findings are minor style and performance suggestions with no correctness impact.
  • The core phase-separation logic is sound across all four layers (conversion, stream, utils, delivery). The lastBlockReplyText guard is safe because it is reset at message_start. Only P2 issues remain: a redundant normalizeAssistantPhase call, a missing explanatory comment on the guard, and up to six content-array passes per streaming delta in extractAssistantVisibleText.
  • No files require special attention.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 258-260

Comment:
**Double call to `normalizeAssistantPhase`**

`normalizeAssistantPhase(parsed.phase)` is invoked twice — once to check truthiness and once to include the value. Extract it to a local variable to avoid the redundant call.

```suggestion
    const phase = normalizeAssistantPhase(parsed.phase);
    return {
      ...(typeof parsed.id === "string" ? { id: parsed.id } : {}),
      ...(phase ? { phase } : {}),
    };
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.messages.ts
Line: 400-412

Comment:
**`lastBlockReplyText` guard scope**

The `!ctx.state.lastBlockReplyText` guard prevents the forced buffer reset when `text_end` carries the final-answer phase, but `lastBlockReplyText` is reset to `undefined` at `message_start` (in `pi-embedded-subscribe.ts`), so this is safe within a single message turn.

That said, the comment explaining why only-first-block-reply-in-this-turn gets the forced reset would help future readers understand this isn't protecting against a cross-turn leak.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 333-345

Comment:
**Up to three content-array passes per `extractAssistantVisibleText` call**

Each `extractAssistantTextForPhase` call does two scans of `msg.content` (`hasExplicitPhasedTextBlocks` scan + filter), so `extractAssistantVisibleText` can trigger up to six iterations over the content array. This is called on every streaming delta from `handleMessageUpdate`, which compounds for long responses. A single-pass accumulator that buckets text by phase would reduce this to O(n) once per call. Not blocking, but worth tracking for high-throughput paths.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Merge branch 'openclaw:main' into fix/co..." | Re-trigger Greptile

Comment on lines +258 to +260
...(normalizeAssistantPhase(parsed.phase)
? { phase: normalizeAssistantPhase(parsed.phase) }
: {}),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Double call to normalizeAssistantPhase

normalizeAssistantPhase(parsed.phase) is invoked twice — once to check truthiness and once to include the value. Extract it to a local variable to avoid the redundant call.

Suggested change
...(normalizeAssistantPhase(parsed.phase)
? { phase: normalizeAssistantPhase(parsed.phase) }
: {}),
const phase = normalizeAssistantPhase(parsed.phase);
return {
...(typeof parsed.id === "string" ? { id: parsed.id } : {}),
...(phase ? { phase } : {}),
};
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 258-260

Comment:
**Double call to `normalizeAssistantPhase`**

`normalizeAssistantPhase(parsed.phase)` is invoked twice — once to check truthiness and once to include the value. Extract it to a local variable to avoid the redundant call.

```suggestion
    const phase = normalizeAssistantPhase(parsed.phase);
    return {
      ...(typeof parsed.id === "string" ? { id: parsed.id } : {}),
      ...(phase ? { phase } : {}),
    };
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +400 to +412
if (
!shouldDeferTextEndBlockReply &&
evtType === "text_end" &&
!ctx.state.lastBlockReplyText &&
cleanedText
) {
if (ctx.blockChunker) {
ctx.blockChunker.reset();
ctx.blockChunker.append(cleanedText);
} else {
ctx.state.blockBuffer = cleanedText;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 lastBlockReplyText guard scope

The !ctx.state.lastBlockReplyText guard prevents the forced buffer reset when text_end carries the final-answer phase, but lastBlockReplyText is reset to undefined at message_start (in pi-embedded-subscribe.ts), so this is safe within a single message turn.

That said, the comment explaining why only-first-block-reply-in-this-turn gets the forced reset would help future readers understand this isn't protecting against a cross-turn leak.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.messages.ts
Line: 400-412

Comment:
**`lastBlockReplyText` guard scope**

The `!ctx.state.lastBlockReplyText` guard prevents the forced buffer reset when `text_end` carries the final-answer phase, but `lastBlockReplyText` is reset to `undefined` at `message_start` (in `pi-embedded-subscribe.ts`), so this is safe within a single message turn.

That said, the comment explaining why only-first-block-reply-in-this-turn gets the forced reset would help future readers understand this isn't protecting against a cross-turn leak.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +333 to +345
export function extractAssistantVisibleText(msg: AssistantMessage): string {
const finalAnswerText = extractAssistantTextForPhase(msg, "final_answer");
if (finalAnswerText.trim()) {
return finalAnswerText;
}

const commentaryText = extractAssistantTextForPhase(msg, "commentary");
if (commentaryText.trim()) {
return commentaryText;
}

return extractAssistantTextForPhase(msg);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Up to three content-array passes per extractAssistantVisibleText call

Each extractAssistantTextForPhase call does two scans of msg.content (hasExplicitPhasedTextBlocks scan + filter), so extractAssistantVisibleText can trigger up to six iterations over the content array. This is called on every streaming delta from handleMessageUpdate, which compounds for long responses. A single-pass accumulator that buckets text by phase would reduce this to O(n) once per call. Not blocking, but worth tracking for high-throughput paths.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-utils.ts
Line: 333-345

Comment:
**Up to three content-array passes per `extractAssistantVisibleText` call**

Each `extractAssistantTextForPhase` call does two scans of `msg.content` (`hasExplicitPhasedTextBlocks` scan + filter), so `extractAssistantVisibleText` can trigger up to six iterations over the content array. This is called on every streaming delta from `handleMessageUpdate`, which compounds for long responses. A single-pass accumulator that buckets text by phase would reduce this to O(n) once per call. Not blocking, but worth tracking for high-throughput paths.

How can I resolve this? If you propose a fix, please make it concise.

@ringlochid

Copy link
Copy Markdown
Contributor Author

Short blocker note from triage:

  • The commentary/final_answer phase-separation slice itself is green on this PR head: the directly touched suites passed locally (6 files / 138 tests).
  • The branch head currently includes a merge from main (f545716), and main is already red in overlapping CI families.
  • Concrete base/main evidence:
    • main CI run 23898123099 is failing in unrelated families including:
      • src/secrets/exec-secret-ref-id-parity.test.ts
      • src/config/doc-baseline.integration.test.ts
      • src/config/schema.base.generated.test.ts
      • src/secrets/target-registry.test.ts
      • src/secrets/runtime.test.ts
      • Windows timeout in src/tasks/task-executor.test.ts
    • Independent local repro on both this PR head and upstream/main:
      • pnpm exec vitest run src/daemon/service-env.test.ts
      • failing assertions: 'gateway service env' does not default NODE_EXTRA_CA_CERTS on non-macOS and 'node service env' does not default NODE_EXTRA_CA_CERTS on non-macOS
      • actual mismatch: expected NODE_EXTRA_CA_CERTS to be undefined, got /etc/ssl/certs/ca-certificates.crt

So current CI looks blocked by base/main failures unless a branch-specific regression shows up outside the touched phase-separation slice.

@100yenadmin

Copy link
Copy Markdown
Contributor

#59643 looks directionally right for the mixed-phase assistant-turn bug family in the main WS -> embedded subscribe path.

One thing we found while comparing this against a live webchat/dashboard repro: the remaining leak may be outside the exact scope of this PR, because some non-embedded consumers still appear to render assistant text generically rather than using phase-aware visible-text extraction.

In other words, I think this PR fixes the core stream/subscriber path, but there may still be a follow-up needed for webchat/dashboard history/render consumers so #59150 can fully close.

@steipete

steipete commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Maintainer note from current main before review:

  • main now also includes 8f9b1ad (fix: remove assistant replay canonicalization repair)
  • that cleanup removed the old assistant replay canonicalization repair from the sanitize/replay path
  • after checking current message-shaping code, that repair layer does not look like the root cause for the mixed commentary / final_answer visible-output bug family this PR is targeting

So I am reviewing this PR as the active phase-separation candidate on top of a cleaner main, not as something blocked on the older replay canonicalization path.

@steipete steipete left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two blockers from review against current main:

  1. src/agents/pi-embedded-subscribe.handlers.messages.ts:299-330 / :347-429

    • visible-output suppression is still computed from evt.message, not from assistantMessageEvent.partial
    • this PR then uses partialAssistant to derive phaseAwareVisibleText and can emit a commentary preview from partial.phase === "commentary"
    • the added test "replaces commentary preview text when a final_answer partial arrives" locks that behavior in
    • that is the opposite of the current fix direction from #61282: commentary partials should stay suppressed, not become visible and later get replaced
  2. src/agents/openai-ws-stream.ts:556-675

    • the late-map fallback still emits an unphased text_delta when response.output_text.delta arrives before response.output_item.added
    • the added test "degrades safely when a text delta arrives before its item mapping" explicitly accepts that first unphased partial
    • for the bug family here, that is not safe: an unphased partial is treated as visible text by downstream consumers, so a commentary chunk can still leak before the phase metadata arrives

Because both cases still allow commentary-visible partial delivery on realistic fallback/event-order paths, I do not think this is landable yet as the fix for #59150.

@steipete steipete force-pushed the fix/commentary-final-answer-phase-separation branch from f545716 to 501dcb1 Compare April 5, 2026 18:03
@steipete steipete merged commit 3181112 into openclaw:main Apr 5, 2026
9 checks passed

steipete commented Apr 5, 2026

Copy link
Copy Markdown
Contributor

Landed.

  • rebased head sha: 501dcb1f315ebbe91e232e331215fbfbea3863ea
  • merge sha: 318111286fbd02526d65dd6cb741e90f746314b6

What changed on top:

  • strict visible-delivery phase resolution for mixed assistant blocks
  • commentary-only partials stay hidden until final_answer
  • OpenAI WS tests tightened: no dangling first-call stream, proxy store=false asserted at payload-builder level, full session cleanup in afterEach

Local gates:

  • pnpm check
  • pnpm build
  • pnpm test src/agents/openai-ws-stream.test.ts
  • pnpm test src/agents/pi-embedded-subscribe.handlers.messages.test.ts src/agents/pi-embedded-utils.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.emits-block-replies-text-end-does-not.test.ts

Thanks @ringlochid.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 501dcb1f31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +1052 to +1056
if (event.type === "response.output_text.delta") {
const key = getOutputTextKey(event.item_id, event.content_index);
const nextText = `${outputTextByPart.get(key) ?? ""}${event.delta}`;
outputTextByPart.set(key, nextText);
if (outputItemPhaseById.has(event.item_id)) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow HTTP fallback when phase mapping has not emitted output

When a response.output_text.delta arrives before response.output_item.added, this branch buffers text and emits nothing until phase metadata is known, but sawWsOutput is still marked true earlier in the same handler. If the socket then drops before the item mapping event, auto mode skips both retry/fallback (!sawWsOutput check in the catch path) even though no user-visible partial was emitted, so the turn fails instead of cleanly falling back to HTTP on transient disconnects.

Useful? React with 👍 / 👎.

@100yenadmin

Copy link
Copy Markdown
Contributor

Thanks for landing this and for the follow-up tightening. The merged phase-separation coverage looks solid for the core WS → embedded subscribe path.

One narrow test gap I noticed after the merge: the late-map buffering test currently only exercises commentary-phase items. There's no coverage for the case where multiple pre-map deltas arrive and the eventual item resolves to final_answer — which is the more realistic streaming race under WS timing jitter.

I'll send a small test-only follow-up PR for that if it's useful. Closed my earlier #61457 since it's now redundant against merged main.

lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling size: XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Commentary text can leak into final assistant replies, and duplicate visible replies can occur after tool sends

4 participants