Skip to content

fix: canonicalize embedded reply payloads#80075

Merged
steipete merged 1 commit into
mainfrom
codex/chat-delivery-hygiene
May 10, 2026
Merged

fix: canonicalize embedded reply payloads#80075
steipete merged 1 commit into
mainfrom
codex/chat-delivery-hygiene

Conversation

@steipete

@steipete steipete commented May 10, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fixes the embedded runner payload boundary so final chat delivery uses the canonical final assistant answer when accumulated assistant text also contains pre-tool progress, internal/status text, or raw-looking tool output.
  • Preserves raw final reply directives when they carry media/audio metadata, so canonicalization does not drop attachments.
  • Adds regressions for Telegram-visible duplicate/progress replay, raw path-like output replay, trailing context-maintenance status text, and media directive preservation.
  • Updates the incomplete-turn guard expectation now that a completed tool-use turn emits one canonical final payload.

Real behavior proof

  • Behavior or issue addressed: one assistant turn could produce multiple Telegram-visible replies because accumulated assistantTexts entries were emitted as separate final payloads; this covers Telegram direct replies can duplicate because accumulated assistantTexts are emitted as separate messages after context-engine maintenance #79621 duplicate replies and the [Bug] Subagent announce leaks raw tool output to Telegram and duplicates parent progress reply #79986 raw path/status leakage shape.
  • Real environment tested: local OpenClaw source checkout on macOS arm64, Node 22 via repo node --import tsx, branch codex/chat-delivery-hygiene at 23214e22bd1d6cdb7635ae1f4ebd818352468662.
  • Exact steps or command run after this patch: node --import tsx -e 'import { buildEmbeddedRunPayloads } from "./src/agents/pi-embedded-runner/run/payloads.ts"; const payloads = buildEmbeddedRunPayloads({ assistantTexts:["Done.", "Background task done: Context engine turn maintenance. Rewrote 0 transcript entries and freed 0 bytes."], toolMetas:[], lastAssistant:{ role:"assistant", stopReason:"stop", content:[{ type:"text", text:"Done." }] }, sessionKey:"session:telegram", inlineToolResultsAllowed:false, verboseLevel:"off", reasoningLevel:"off", toolResultFormat:"plain" }); console.log(JSON.stringify(payloads));'
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): console output: [ { "text": "Done.", "replyToTag": false, "audioAsVoice": false } ] (JSON minified by the command as [{"text":"Done.","replyToTag":false,"audioAsVoice":false}]).
  • Observed result after fix: the accumulated internal/status text is not emitted as a second payload; the same helper also returns one final payload for the raw /root/openclaw/...protocol-schemas.ts:181... repro shape.
  • What was not tested: live Telegram Bot API delivery from a real chat, because the shared payload helper now proves the channel-facing payload list before Telegram transport receives it.
  • Before evidence (optional but encouraged): before the patch, the same helper returned one payload per accumulated assistantTexts entry, which would hand Telegram two final sends for the status-text repro and two final sends for the raw path-like repro.

Root Cause

buildEmbeddedRunPayloads treated every non-empty assistantTexts entry as a separately deliverable final reply. The subscriber intentionally accumulates intermediate assistant text for streaming/block-reply bookkeeping, and Telegram then delivered each entry as its own outbound message. That made pre-tool progress, internal maintenance/status strings, or raw-looking subagent/tool fragments visible as duplicate Telegram replies.

What Went Well

  • The best fix was at the shared payload-selection boundary, not in Telegram-specific regex/dedupe logic. That makes the behavior consistent for all final delivery surfaces while keeping Telegram adapters focused on transport.
  • The existing final-answer extraction and reply-directive parser already gave us a trustworthy canonical source; the patch reuses those contracts instead of adding a downstream filter.
  • The fix(agents): surface exec failures after claimed success #80003 exec-failure warning path remains intact: failed mutating tools still get a concise visible warning when the final assistant answer claims success, but raw stderr stays hidden unless verbose details are enabled.
  • The Guard Telegram runtime progress and report delivery #80021 draft pointed at the visible symptom, but the stronger upstream fix avoids over-deduping successful media sends or recording delivery attempts before transport success.

Verification

  • pnpm docs:list
  • pnpm test src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts
  • pnpm test:changed
  • git diff --check
  • Testbox pnpm check:changed on tbx_01kr7yhs5tfp3xxk4cp7f73rfc
  • Manual node --import tsx repro: raw path/status accumulated text now produces one final payload

Fixes #79621.
Fixes #79986.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 10, 2026
@clawsweeper

clawsweeper Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs changes before merge.

Summary
The branch changes embedded runner payload selection, tests, an incomplete-turn expectation, and the changelog to prefer a canonical final assistant answer over accumulated assistant text.

Reproducibility: yes. for the source-level bug: current main selects every non-empty accumulated assistantTexts entry, and final dispatch sends each payload item. I did not live-test Telegram, but the deterministic helper path plus linked issue evidence are enough to reproduce the code path.

Real behavior proof
Sufficient (terminal): The PR body includes copied terminal output from a local OpenClaw checkout showing the changed helper returns one final payload after accumulated status text.

Next step before merge
A narrow predicate and regression-test repair remains; no product or security decision is needed for that fix.

Security
Cleared: The diff only changes TypeScript payload selection, focused tests, and changelog text, with no dependency, CI, permission, secret, or package-resolution changes.

Review findings

  • [P2] Handle media-only canonical final replies — src/agents/pi-embedded-runner/run/payloads.ts:369-372
Review details

Best possible solution:

Update the canonical-selection predicate to treat raw final answers with media or audio directives as deliverable even when cleaned text is empty, then add a regression for accumulated progress plus silent media/voice.

Do we have a high-confidence way to reproduce the issue?

Yes for the source-level bug: current main selects every non-empty accumulated assistantTexts entry, and final dispatch sends each payload item. I did not live-test Telegram, but the deterministic helper path plus linked issue evidence are enough to reproduce the code path.

Is this the best way to solve the issue?

No, not yet. The shared payload boundary is the right place to solve this, but the current predicate excludes media-only canonical final replies and should prefer raw media/audio directives even without parsed text.

Full review comments:

  • [P2] Handle media-only canonical final replies — src/agents/pi-embedded-runner/run/payloads.ts:369-372
    This only switches to the canonical final answer when the cleaned final text is non-empty. If the real final reply is an attachment-only directive, such as a silent MEDIA/voice payload after a progress block, this falls through to nonEmptyAssistantTexts and still emits the progress text plus the media payload. Treat raw final answers with media/audio directives as canonical even when their parsed text is empty.
    Confidence: 0.87

Overall correctness: patch is incorrect
Overall confidence: 0.87

Acceptance criteria:

  • pnpm test src/agents/pi-embedded-runner/run/payloads.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/agents/pi-embedded-subscribe.subscribe-embedded-pi-session.does-not-append-text-end-content-is.test.ts
  • pnpm test:changed
  • git diff --check

What I checked:

Likely related people:

  • Andy Ye: Local blame ties the current payload selection and media-preservation block to commit 0a4ef37. (role: introduced current behavior; confidence: high; commits: 0a4ef37f6580; files: src/agents/pi-embedded-runner/run/payloads.ts)
  • steipete: The PR author also appears in prior ClawSweeper-provided current-main payload-surface history for media/no-reply preservation and commentary fallback suppression. (role: recent area contributor; confidence: medium; commits: 14eab13ab492, 9eed092baa50, 5e2f6ce2943e; files: src/agents/pi-embedded-runner/run/payloads.ts, src/agents/pi-embedded-subscribe.handlers.messages.ts)
  • jbetala7: Provided related history shows recent work on the same buildEmbeddedRunPayloads helper for user-visible exec failure payload behavior. (role: recent payload contributor; confidence: medium; commits: 658a30b42f95; files: src/agents/pi-embedded-runner/run/payloads.ts, src/agents/pi-embedded-runner/run/payloads.errors.test.ts)
  • mcaxtr: Provided review history links media directive preservation work directly adjacent to the media-only edge in this review. (role: adjacent media payload contributor; confidence: medium; commits: 18c98316f7ac; files: src/agents/pi-embedded-runner/run/payloads.ts)

Remaining risk / open question:

  • Live Telegram Bot API delivery was not re-run in this read-only review; the proof is at the shared pre-transport payload helper.
  • The branch still needs the media-only final-answer edge fixed before merge.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c61abfab302f.

@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 10, 2026
@steipete steipete marked this pull request as ready for review May 10, 2026 03:49
@steipete steipete force-pushed the codex/chat-delivery-hygiene branch from 23214e2 to 512c314 Compare May 10, 2026 03:52
@steipete steipete merged commit 83a1080 into main May 10, 2026
111 checks passed
@steipete steipete deleted the codex/chat-delivery-hygiene branch May 10, 2026 04:00
@steipete

Copy link
Copy Markdown
Contributor Author

Landed via rebase onto main and squash merge.

  • Source SHA: 512c314
  • Landed SHA: 83a1080
  • Gates: focused payload/incomplete-turn/subscriber tests, git diff --check, Testbox pnpm check:changed, PR CI including Critical Quality agent/network boundaries.

Thanks @nonzeroclaw and @dudaefj for the reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR proof: sufficient ClawSweeper judged the real behavior proof convincing. size: S

Projects

None yet

1 participant