Skip to content

fix(agents): trim dense text delta snapshots#91580

Merged
vincentkoc merged 2 commits into
mainfrom
fix-86599-dense-stream-overhead
Jun 9, 2026
Merged

fix(agents): trim dense text delta snapshots#91580
vincentkoc merged 2 commits into
mainfrom
fix-86599-dense-stream-overhead

Conversation

@vincentkoc

@vincentkoc vincentkoc commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

  • Drops full accumulated assistant-message snapshots from plain text_delta events for OpenAI-compatible, OpenAI Responses, and Ollama streams.
  • Keeps full snapshots on stream start/end/done plus tool/thinking events, so final state and structured events still have the full message.
  • Updates the agent loop to reconstruct current assistant snapshots from lightweight text deltas.
  • Adds regression coverage for OpenAI-compatible, Responses, Ollama, and agent-loop consumers.

Linked context

Fixes #86599

Related #86633

Requested by maintainer follow-up on #86599.

Maintainer contract decision

Accept the exported stream contract change: plain AssistantMessageEvent text_delta events may omit partial to avoid retaining one full assistant snapshot per token. Consumers that need the live assistant message should replay delta from the latest start/end partial checkpoint or use the reconstructed message_update.message; text_start, text_end, thinking, tool, terminal, and final events still carry full snapshots.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: local/OpenAI-compatible dense streams no longer carry the full answer-so-far inside every plain text delta.
  • Real environment tested: Blacksmith Testbox / Crabbox on Linux, using a fake OpenAI-compatible local backend that emits 30,000 one-character SSE chunks.
  • Exact steps or command run after this patch: built the CLI through tsdown/runtime-postbuild, then ran node dist/entry.js infer model run --local --model local/dense --prompt hi --json against the fake backend.
  • Evidence after fix:
OpenClaw 2026.6.2 (aef1fad)
curl_elapsed=0.05 maxrss_kb=11264
openclaw_local_elapsed=3.51 maxrss_kb=658044
4830164 /tmp/oc86599runtime/curl.sse
  30204 /tmp/oc86599runtime/local.json
served_chunks=30000 elapsed_ms=56
served_chunks=30000 elapsed_ms=52
  • Observed result after fix: the fake backend served the dense stream in about 52-56ms, raw curl completed in 0.05s, and the built OpenClaw local model run completed in 3.51s without carrying 30k full partial snapshots.
  • What was not tested: Windows desktop with a real Ollama/llama.cpp model.
  • Proof limitations or environment constraints: Gateway infer model run --gateway was attempted in isolated Testbox state, but the client was blocked by Gateway device-pairing scope approval. Full pnpm build reached tsdown and runtime-postbuild, then the sparse GWT/Testbox omitted ui/config/control-ui-chunking.ts, so the UI build was not counted as passing proof.
  • Before evidence: prior current-main Crabbox repro for the same fake 30k-chunk backend measured raw curl at 0.03s, local OpenClaw at 4.11s, and gateway at 10.31s; the root issue was the repeated full partial payload work rather than backend generation time.

Tests and validation

  • Exact-head changed gate: node scripts/crabbox-wrapper.mjs run --provider blacksmith-testbox --timing-json -- env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed on tbx_01ktn8y7eph04qd9x71f4ez3we / violet-crab: passed in 9m21s.
  • Review-fix targeted proof: crabbox run --provider blacksmith-testbox --timing-json --shell -- 'git diff --check && node scripts/run-vitest.mjs run packages/agent-core/src/agent-loop.test.ts src/agents/openai-transport-stream.test.ts extensions/ollama/src/stream-runtime.test.ts' on tbx_01ktn8tztvfyjc9azkj8ayhpf3 / silver-shrimp: passed 347 tests in 54.3s.
  • Earlier exact-head targeted run before the review fix: same 3 Vitest shards passed on tbx_01ktn5hrk4akdp5558h8f5k8ws / swift-hermit.
  • ClawSweeper/autoreview on 9569e072e535d677a0dfa2c747a0b12a4ad8c667: ready for maintainer look, proof sufficient, no concrete contributor-facing blocker left.
  • GitHub checks on the exact PR head: pass or skip only; no pending/failing checks at merge review time.

Regression coverage added:

  • agent loop rebuilds message_update.message for immutable, partial-less text_delta snapshots.
  • OpenAI-compatible and Responses text deltas omit accumulated partial snapshots.
  • Ollama text deltas omit accumulated partial snapshots while text_end and done keep final content.

Risk checklist

Did user-visible behavior change? Yes

Did config, environment, or migration behavior change? No

Did security, auth, secrets, network, or tool execution behavior change? No

What is the highest-risk area?

  • Stream consumers that previously assumed every text delta carried partial.

How is that risk mitigated?

  • The type now marks only plain text-delta partial as optional and documents the replay contract.
  • Agent-loop reconstruction keeps message updates populated from lightweight deltas.
  • Tool calls, thinking blocks, text start/end, and final done events still carry snapshots.
  • Existing and added provider/agent tests cover the touched stream surfaces.

@vincentkoc

Copy link
Copy Markdown
Member Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

@vincentkoc vincentkoc self-assigned this Jun 9, 2026
@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: ollama size: S maintainer Maintainer-authored PR labels Jun 9, 2026
@clawsweeper

clawsweeper Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 9, 2026, 12:13 AM ET / 04:13 UTC.

Summary
The PR makes plain text stream deltas omit accumulated assistant-message snapshots for OpenAI-compatible, Responses, and Ollama streams, then updates agent-loop consumption and regression tests.

PR surface: Source +40, Tests +149. Total +189 across 7 files.

Reproducibility: yes. from source and PR proof: dense streams currently carry accumulated snapshots on every plain text delta, and the PR body reports a 30,000-chunk Testbox reproduction. I did not rerun it in this read-only review.

Review metrics: 1 noteworthy metric.

  • Stream contract change: 1 exported event field changed from required to optional. AssistantMessageEvent.text_delta.partial is part of the stream/event contract consumed by agent and plugin code.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Risk before merge

  • [P1] Existing plugin or internal stream consumers that read assistantMessageEvent.partial on every text_delta will now see it missing for the affected providers unless they replay deltas or use the reconstructed message.
  • [P1] The PR body reports strong Testbox proof, but maintainers should still confirm the exact current head has green required checks before merge.

Maintainer options:

  1. Accept the optional text-delta contract (recommended)
    If maintainers agree text deltas may be lightweight, this PR’s producer/agent-loop/test shape is the right owner boundary for the performance fix.
  2. Keep snapshots in the public contract
    If external stream consumers must keep seeing partial on every text delta, replace this with an internal lightweight event or a separate dense-stream mode.
  3. Pause until exact-head checks are green
    If the current waiting-on-author state reflects unresolved checks, hold merge until the exact head has green targeted and changed validation.

Next step before merge

  • [P2] The remaining action is maintainer review of the exported stream-contract compatibility risk plus exact-head checks, not an automated repair.

Security
Cleared: The diff changes TypeScript stream handling and tests only; no dependency, workflow, secret, permission, or code-download surface is introduced.

Review details

Best possible solution:

Land the lightweight-delta path after maintainers accept the exported stream-contract change and the exact PR head passes the targeted and changed checks.

Do we have a high-confidence way to reproduce the issue?

Yes from source and PR proof: dense streams currently carry accumulated snapshots on every plain text delta, and the PR body reports a 30,000-chunk Testbox reproduction. I did not rerun it in this read-only review.

Is this the best way to solve the issue?

Yes, pending maintainer acceptance of the contract change: dropping repeated full snapshots at the provider stream boundary and reconstructing in the agent loop is narrower than adding config, throttling, or per-provider fallback branches.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against f05e9873c6d2.

Label changes

Label changes:

  • add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • add status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body provides after-fix Testbox terminal output for a built CLI run against a fake dense OpenAI-compatible backend, plus before/current-main timing context.
  • remove rating: 🦪 silver shellfish: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.
  • remove status: ⏳ waiting on author: Current PR status label is status: 👀 ready for maintainer look.

Label justifications:

  • P1: The PR targets a P1 local-model responsiveness bug and changes a core stream contract used by agent/session consumers.
  • merge-risk: 🚨 compatibility: The exported AssistantMessageEvent.text_delta.partial shape changes from required to optional, which can affect existing stream consumers.
  • merge-risk: 🚨 session-state: Partial-less text deltas require consumers to reconstruct assistant message state correctly during active sessions.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body provides after-fix Testbox terminal output for a built CLI run against a fake dense OpenAI-compatible backend, plus before/current-main timing context.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body provides after-fix Testbox terminal output for a built CLI run against a fake dense OpenAI-compatible backend, plus before/current-main timing context.
Evidence reviewed

PR surface:

Source +40, Tests +149. Total +189 across 7 files.

View PR surface stats
Area Files Added Removed Net
Source 4 51 11 +40
Tests 3 157 8 +149
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 7 208 19 +189

What I checked:

  • Repository policy applied: Root plus scoped agent and extension AGENTS.md files were read fully; the review applied the OpenClaw guidance that agent/plugin stream contracts and bundled-provider behavior are compatibility-sensitive. (AGENTS.md:27, f05e9873c6d2)
  • Public SDK contract change: openclaw/plugin-sdk/llm exports AssistantMessageEvent and AssistantMessageEventStreamContract, so making text_delta.partial optional is plugin-facing, not only an internal implementation detail. (src/plugin-sdk/llm.ts:20, f05e9873c6d2)
  • Optional text-delta partial contract: The PR head documents and types plain text_delta as allowing partial?: AssistantMessage, while other start/end/thinking/tool events keep required snapshots. (packages/llm-core/src/types.ts:369, 9569e072e535)
  • Agent-loop reconstruction: The second PR commit replaces the previous event.partial ?? partialMessage fallback with delta replay, so message_update events advance from partial-less text deltas instead of staying stale. (packages/agent-core/src/agent-loop.ts:69, 9569e072e535)
  • Provider producers trimmed dense snapshots: Responses and completions still maintain the internal output object but omit partial only from plain text_delta pushes. (src/agents/openai-transport-stream.ts:1577, 9569e072e535)
  • Ollama producer trimmed dense snapshots: The Ollama stream keeps start/end snapshots and omits partial from each visible text_delta after computing the incremental delta. (extensions/ollama/src/stream.ts:1325, 9569e072e535)

Likely related people:

  • vincentkoc: Current blame and -S history show this person on the LLM stream event contract, agent-loop update path, OpenAI transport seam, and the current PR commits. (role: recent area contributor; confidence: high; commits: 1019b591d55f, 4798e125f40c, 9569e072e535; files: packages/llm-core/src/types.ts, packages/agent-core/src/agent-loop.ts, src/agents/openai-transport-stream.ts)
  • Eva H: History search ties this person to Ollama thinking/streaming support that shares the touched Ollama stream runtime. (role: adjacent owner; confidence: medium; commits: d7bf97adb330; files: extensions/ollama/src/stream.ts)
  • Jakub Rusz: History search ties this person to the Ollama streaming text event implementation changed by this PR. (role: adjacent owner; confidence: medium; commits: 8f44bd642660; files: extensions/ollama/src/stream.ts)
  • steipete: History shows provider runtime movement and repeated touches across the central agent/provider stream files, so this is a useful routing candidate for owner-boundary review. (role: adjacent owner; confidence: medium; commits: 64bf80d4d50c; files: extensions/ollama/src/stream.ts, src/agents/openai-transport-stream.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels Jun 9, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels Jun 9, 2026
@vincentkoc

Copy link
Copy Markdown
Member Author

Land-ready verification for 9569e072e535d677a0dfa2c747a0b12a4ad8c667:

  • ClawSweeper/autoreview: ready for maintainer look, proof sufficient, no concrete contributor-facing blocker left.
  • Crabbox/Testbox changed gate: node scripts/crabbox-wrapper.mjs run --provider blacksmith-testbox --timing-json -- env OPENCLAW_CHECK_CHANGED_REMOTE_CHILD=1 OPENCLAW_CHANGED_LANES_RAW_SYNC=1 corepack pnpm check:changed on tbx_01ktn8y7eph04qd9x71f4ez3we / violet-crab, passed in 9m21s.
  • Targeted review-fix proof: git diff --check && node scripts/run-vitest.mjs run packages/agent-core/src/agent-loop.test.ts src/agents/openai-transport-stream.test.ts extensions/ollama/src/stream-runtime.test.ts on tbx_01ktn8tztvfyjc9azkj8ayhpf3 / silver-shrimp, passed 347 tests in 54.3s.
  • GitHub checks: pass or skip only at merge review time; no pending or failing checks.

Maintainer decision recorded in the PR body: accept optional partial on plain text_delta events, with consumers replaying deltas or using reconstructed message_update.message when they need the live assistant snapshot.

@vincentkoc vincentkoc marked this pull request as ready for review June 9, 2026 04:16
@vincentkoc vincentkoc merged commit 6fcc945 into main Jun 9, 2026
231 of 239 checks passed
@vincentkoc vincentkoc deleted the fix-86599-dense-stream-overhead branch June 9, 2026 04:21
@osolmaz

osolmaz commented Jun 9, 2026

Copy link
Copy Markdown
Member

Adding attribution/context now that this landed and closed #86599:

This is directly related to the earlier root-cause analysis in #86599 (comment) and the broader prior implementation attempt in #87558.

The merged shape here is smaller and cleaner than #87558: it changes the stream contract so plain text_delta events may omit the full accumulated partial, and reconstructs live assistant state in the agent loop instead of adding the original createAssistantStreamAccumulator helper. But it addresses the same core issue diagnosed there: dense local-provider streams were repeatedly carrying/reprocessing the full answer-so-far on every text delta.

So I would treat #87558 and the issue analysis as relevant precursor work, while #91580 is the production-ready implementation shape that actually landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling extensions: ollama maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: M status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Local model provider calls thread block gateway event loop on Windows beta; trivial infer run takes ~4 minutes

2 participants