Skip to content

fix(agents): stream phased text deltas incrementally#88771

Merged
vincentkoc merged 1 commit into
mainfrom
codex/local-model-lean-delta-stream-checkpoints
Jun 6, 2026
Merged

fix(agents): stream phased text deltas incrementally#88771
vincentkoc merged 1 commit into
mainfrom
codex/local-model-lean-delta-stream-checkpoints

Conversation

@vincentkoc

@vincentkoc vincentkoc commented May 31, 2026

Copy link
Copy Markdown
Member

Summary

  • stop rereading full OpenAI Responses phased partial text on every same-item final_answer delta
  • keep the same user-visible sanitizer path for incremental phased deltas, including split tool-call XML payloads
  • preserve full-partial reads for item boundaries and text_end, where the stream needs authoritative final text

Refs #86599.

Verification

  • node scripts/run-vitest.mjs src/agents/embedded-agent-subscribe.handlers.messages.test.ts src/agents/embedded-agent-subscribe.subscribe-embedded-agent-session.emits-block-replies-text-end-does-not.test.ts src/agents/embedded-agent-subscribe.subscribe-embedded-agent-session.filters-final-suppresses-output-without-start-tag.test.ts --reporter=dot: passed 1 shard / 69 tests after rebasing onto origin/main e8f3bce9f0b
  • node_modules/.bin/oxfmt --check --threads=1 src/agents/embedded-agent-utils.ts src/agents/embedded-agent-subscribe.handlers.messages.ts src/agents/embedded-agent-subscribe.handlers.messages.test.ts: passed
  • node scripts/run-oxlint.mjs src/agents/embedded-agent-utils.ts src/agents/embedded-agent-subscribe.handlers.messages.ts src/agents/embedded-agent-subscribe.handlers.messages.test.ts: passed
  • git diff --check origin/main...HEAD: passed
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main: clean, no accepted/actionable findings
  • AWS Crabbox changed gate: node scripts/crabbox-wrapper.mjs run --provider aws --label delta-stream-refresh-check-changed-20260601 --shell -- "env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed" (provider=aws, lease=cbx_edcabced29f2, slug=jade-krill, run=run_f0ac5fd87be3, exit 0)
  • GitHub PR checks on head da2e177396f743a2e2fa66800091f1a946257562: pending after force-push when this body was refreshed

Real behavior proof

Behavior addressed: same-item phased OpenAI Responses final-answer deltas stream incrementally without rereading full partial assistant text, while sanitizer context is preserved for hidden split tool-call payloads.

Real environment tested: linked OpenClaw gwt worktree, focused local Vitest wrapper, structured branch autoreview, AWS Crabbox Linux changed gate, and GitHub PR CI on the pushed head.

Exact steps or command run after this patch: rebased on current origin/main, reran focused subscriber regression tests, formatter, lint, diff check, structured autoreview, remote pnpm check:changed, and pushed the refreshed PR head.

Evidence after fix: local subscriber tests passed 69 tests; Crabbox run run_f0ac5fd87be3 selected core, coreTests, including core/core-test typecheck, changed-file lint, import-cycle checks, and guards, then exited 0.

Observed result after fix: repeated same-item final-answer deltas emit appended user-visible text without touching the full partial text getter; split hidden tool-call XML does not leak arguments into assistant events.

What was not tested: true live Ollama UI repro; this PR covers the deterministic subscriber hot path behind the reported #86599 stream-density issue.

@vincentkoc vincentkoc self-assigned this May 31, 2026
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 31, 2026
@clawsweeper

clawsweeper Bot commented May 31, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 1, 2026, 2:44 AM ET / 06:44 UTC.

Summary
The PR changes embedded agent message-update handling so same-item phased final_answer deltas stream incrementally through sanitizer state while preserving full partial reads for item boundaries and text_end.

PR surface: Source +43, Tests +135. Total +178 across 3 files.

Reproducibility: yes. for the narrow subscriber defect from source inspection and branch regression tests; no direct live Windows/Ollama reproduction was established in this review.

Review metrics: 1 noteworthy metric.

  • Subscriber regression cases: 3 added. The new cases target same-item phased incremental deltas and split hidden tool-call XML sanitizer state, which are the main message-delivery risks in this patch.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🐚 platinum hermit
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Mantis proof suggestion
A short visible local-provider run would materially demonstrate the user-facing streaming behavior if maintainers want proof beyond deterministic subscriber tests. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task proof: verify a local-provider final-answer stream updates incrementally without duplicate text or leaked tool-call XML.

Risk before merge

Maintainer options:

  1. Accept focused subscriber proof (recommended)
    Maintainers can land with the current focused tests and Crabbox changed gate if they accept that this PR fixes the deterministic stream accumulation path rather than proving the full Windows/Ollama repro.
  2. Request one live stream proof
    Before merge, maintainers can ask for a redacted local-provider run showing incremental final-answer updates without duplicate text or leaked hidden tool-call XML.

Next step before merge

  • No automated repair is indicated because this protected maintainer P1 hot-path PR has no concrete blocking defect and needs maintainer merge/proof judgment.

Security
Cleared: The diff only changes agent stream text handling and focused tests, with no dependency, workflow, secret, package, or code-execution surface added.

Review details

Best possible solution:

Land after maintainer signoff that the focused subscriber tests and Crabbox changed gate are enough for this narrow stream-density fix, or request one redacted live local-provider proof before merge.

Do we have a high-confidence way to reproduce the issue?

Yes for the narrow subscriber defect from source inspection and branch regression tests; no direct live Windows/Ollama reproduction was established in this review.

Is this the best way to solve the issue?

Yes for the narrow fix: the patch follows the Codex/OpenAI incremental-delta contract while preserving authoritative full reads at item boundaries and text end.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 01124cfca90e.

Label changes

Label justifications:

  • P1: The PR addresses a P1 local-provider responsiveness and streaming issue affecting agent message delivery in a hot path.
  • merge-risk: 🚨 message-delivery: Changing assistant delta accumulation and sanitizer state could drop, duplicate, suppress, or leak assistant text if an edge case is missed.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🐚 platinum hermit and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body provides concrete after-fix command output and AWS Crabbox changed-gate proof for the deterministic subscriber path, while clearly noting the live Ollama UI symptom was not tested.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body provides concrete after-fix command output and AWS Crabbox changed-gate proof for the deterministic subscriber path, while clearly noting the live Ollama UI symptom was not tested.
Evidence reviewed

PR surface:

Source +43, Tests +135. Total +178 across 3 files.

View PR surface stats
Area Files Added Removed Net
Source 2 45 2 +43
Tests 1 135 0 +135
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 3 180 2 +178

What I checked:

  • current-main behavior: Current main uses extractAssistantVisibleText(partialAssistant) whenever phase-aware block replies are active, so same-item phased updates can reread the growing full partial assistant text on each delta. (src/agents/embedded-agent-subscribe.handlers.messages.ts:667, 01124cfca90e)
  • proposed stream path: The PR limits full partial reads to stream-item changes, text_end, or missing chunks; same-item chunked phased deltas instead use the incremental chunk with existing block-tag and visible-text sanitizer state. (src/agents/embedded-agent-subscribe.handlers.messages.ts:683, da2e177396f7)
  • regression coverage: The added tests cover not reading the full partial getter for same-item phased deltas, preserving split hidden tool-call XML sanitization, and later visible text emission after hidden stream fragments. (src/agents/embedded-agent-subscribe.handlers.messages.test.ts:454, da2e177396f7)
  • OpenAI Responses stream contract in OpenClaw: The OpenAI Responses parser appends response.output_text.delta to current block text and emits text-delta events with the incremental delta, then emits authoritative full text on output item done/text end. (src/llm/providers/openai-responses-shared.ts:621, 01124cfca90e)
  • Codex protocol contract: Sibling Codex source shows AgentMessageContentDeltaEvent carries incremental delta tied to item_id, and the app-server README states clients append streamed text deltas for the same item id. (../codex/codex-rs/protocol/src/protocol.rs:1724, cf0911076f23)
  • verification supplied by PR author: The PR body reports focused subscriber regression tests, oxfmt, oxlint, diff check, structured autoreview, and AWS Crabbox changed gate run_f0ac5fd87be3 passing on the pushed head; it explicitly leaves the true live Ollama UI repro untested. (da2e177396f7)

Likely related people:

  • vincentkoc: Authored the current PR and previously landed adjacent work avoiding full stream replay on text deltas in the same embedded-agent subscriber area. (role: recent stream hot-path contributor; confidence: high; commits: 0f6be951e0df, da2e177396f7; files: src/agents/embedded-agent-subscribe.handlers.messages.ts, src/agents/embedded-agent-subscribe.handlers.messages.test.ts)
  • steipete: Recent history shows the broad agent runtime internalization touched the embedded-agent subscriber area that owns this message-update path. (role: recent agent-runtime refactor owner; confidence: medium; commits: bb46b79d3c14; files: src/agents/embedded-agent-subscribe.handlers.messages.ts)
  • latensified: Recent OpenAI Responses replay fixes are adjacent to the textSignature and Responses item behavior this PR relies on. (role: adjacent OpenAI Responses replay contributor; confidence: medium; commits: 6653193fdb90; files: src/llm/providers/openai-responses-shared.ts)
  • zhanghang02: Recent work on disabled Responses store replay behavior touched the provider stream/replay contract that feeds these subscriber deltas. (role: adjacent OpenAI Responses replay contributor; confidence: medium; commits: 03dec8bb3a00; files: src/llm/providers/openai-responses-shared.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. labels May 31, 2026
@vincentkoc vincentkoc force-pushed the codex/local-model-lean-delta-stream-checkpoints branch from 25830b2 to 0dbf38c Compare May 31, 2026 22:16
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 31, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

Rebased and revalidated on current origin/main.

Behavior addressed: same-item phased final_answer streams now stay on the incremental delta path for normal text deltas, while full partial reads are preserved for item changes, empty chunks, and text_end. This keeps the subscriber from re-reading the full growing partial text on every streamed token.

Real environment tested: clean linked worktree plus AWS Crabbox changed gate.

Exact steps or command run after this patch:

node scripts/run-vitest.mjs src/agents/embedded-agent-subscribe.handlers.messages.test.ts --reporter=dot
node_modules/.bin/oxfmt --check --threads=1 src/agents/embedded-agent-subscribe.handlers.messages.ts src/agents/embedded-agent-subscribe.handlers.messages.test.ts src/agents/embedded-agent-utils.ts
node scripts/run-oxlint.mjs src/agents/embedded-agent-subscribe.handlers.messages.ts src/agents/embedded-agent-subscribe.handlers.messages.test.ts src/agents/embedded-agent-utils.ts
git diff --check origin/main...HEAD
node scripts/crabbox-wrapper.mjs run --provider aws --label pr88771-check-changed --shell -- "pnpm check:changed"

Evidence after fix: focused Vitest passed 1 file / 38 tests; oxfmt passed; oxlint passed; diff check passed; AWS Crabbox changed gate passed with provider aws, lease cbx_5d42deaf2939, run run_28a5b2034174, slug jade-crayfish, exit 0.

Observed result after fix: pnpm check:changed selected core, coreTests for the three touched files and completed successfully on AWS Crabbox in 3m16s command time / 5m32s total time.

What was not tested: full GitHub CI is still red on this head because current origin/main has unrelated failures in check-additional-boundaries-bcd (extensions/feishu/src/monitor.webhook.test-helpers.ts:29 raw fetch guard) and check-additional-extension-bundled (extensions/qa-lab/src/mantis/visual-task.runtime.ts:519 no-promise-executor-return). I did not fold those unrelated fixes into this PR.

@vincentkoc vincentkoc force-pushed the codex/local-model-lean-delta-stream-checkpoints branch 2 times, most recently from 0dcd8f1 to 8cc28f2 Compare June 1, 2026 01:01
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels Jun 1, 2026
@vincentkoc vincentkoc force-pushed the codex/local-model-lean-delta-stream-checkpoints branch from 8cc28f2 to da2e177 Compare June 1, 2026 06:38
@osolmaz

osolmaz commented Jun 1, 2026

Copy link
Copy Markdown
Member

Attribution note: this phased-delta PR continues the split-out stream subscriber work from the broader #87558 / #86599 dense-stream effort. The shared production direction is to consume incremental deltas on hot paths and reserve full partial reads for boundaries or cases where authoritative final text is required.

#87558 is being closed in favor of smaller focused PRs. Please keep the #87558 context linked for attribution and reviewer history.

@vincentkoc

Copy link
Copy Markdown
Member Author

Pre-merge verification for head da2e177396f743a2e2fa66800091f1a946257562:

  • GitHub PR checks: current rollup has no failed and no pending contexts; mergeStateStatus=CLEAN, mergeable=MERGEABLE.
  • Structured autoreview: /Users/m1/GIT/_Perso/clawrouter/.agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main from a detached review worktree at the PR head; result was clean with no accepted/actionable findings.
  • Existing PR proof remains valid from the PR body: focused subscriber Vitest, oxfmt, oxlint, git diff --check, and AWS Crabbox changed gate run_f0ac5fd87be3 / lease cbx_edcabced29f2.

Known proof gap: no fresh local test rerun in this turn because the main machine is at ~200 MiB free disk and cannot create a new review worktree safely. The PR already has focused local and AWS Crabbox proof recorded on the same head.

@vincentkoc vincentkoc merged commit 4ee50ce into main Jun 6, 2026
165 checks passed
@vincentkoc vincentkoc deleted the codex/local-model-lean-delta-stream-checkpoints branch June 6, 2026 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling maintainer Maintainer-authored PR merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: S status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants