fix(agents): count streamed model deltas incrementally by vincentkoc · Pull Request #87856 · openclaw/openclaw

vincentkoc · 2026-05-29T02:34:54Z

Summary

Count streamed text_delta/thinking_delta/toolcall_delta diagnostics from their incremental delta bytes instead of serializing the full event.
Drop growing partial snapshots from fallback stream byte accounting so diagnostics do not repeatedly walk the accumulated assistant message.
Add regression coverage proving text-delta accounting does not serialize large partial snapshots and keeps streams alive if byte inspection throws.
Document the bounded responseStreamBytes semantics for high-frequency streamed deltas and align OTEL docs/metric descriptions with that bounded payload contract.

Verification

Rebasing onto current origin/main completed on head c1783a4997e.
node scripts/check-docs-mdx.mjs docs/gateway/opentelemetry.md docs/logging.md: passed.
node_modules/.bin/oxfmt --check --threads=1 docs/gateway/opentelemetry.md docs/logging.md extensions/diagnostics-otel/src/service.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts: passed.
node scripts/run-oxlint.mjs extensions/diagnostics-otel/src/service.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts: passed.
git diff --check origin/main...HEAD: passed.
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts extensions/diagnostics-otel/src/service.test.ts --reporter=dot: passed, 2 shards / 86 tests.
AWS Crabbox pnpm check:changed: passed, provider=aws, lease=cbx_fc87d734de03, run=run_dafa0367419e, exit 0; selected lanes core, coreTests, extensions, extensionTests, docs.

Real behavior proof

Behavior addressed: dense model streams no longer make model-call diagnostics serialize the full growing partial assistant message on every delta.

Real environment tested: focused local Vitest wrapper in a linked OpenClaw gwt worktree, plus AWS Crabbox Linux changed-gate proof.

Exact steps or command run after this patch: node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts extensions/diagnostics-otel/src/service.test.ts --reporter=dot; docs mdx, oxfmt, oxlint, diff-check; node scripts/crabbox-wrapper.mjs run --provider aws --label diagnostic-delta-accounting-check-changed --shell -- "pnpm check:changed".

Evidence after fix: focused tests passed, including assertions that text_delta accounting does not call partial.toJSON, counts incremental delta bytes, and keeps stream delivery alive when byte inspection cannot read a chunk. AWS check:changed passed the selected core, core test, extension, extension test, and docs lanes.

Observed result after fix: responseStreamBytes is computed from incremental delta bytes for high-frequency stream deltas and from snapshotless chunks for other partial-bearing stream events, while OTEL metadata and docs describe the exported histogram as bounded streamed response payload bytes.

What was not tested: a live long-running local-model session.

clawsweeper · 2026-05-29T02:41:44Z

Codex review: needs maintainer review before merge. Reviewed June 6, 2026, 10:08 AM ET / 14:08 UTC.

Summary
The PR changes embedded-agent model-call diagnostics to count text, thinking, and tool-call stream deltas from incremental delta bytes, updates focused tests, and aligns logging/OTEL docs with the bounded byte metric semantics.

PR surface: Source +18, Tests -11, Docs +2. Total +9 across 5 files.

Reproducibility: yes. from source inspection: current main serializes the snapshotless delta event shape for response byte accounting, while the PR head counts the incremental delta string. I did not run a live long-running local-model reproduction in this read-only review.

Review metrics: 1 noteworthy metric.

Exported byte metric semantics: 1 exported response-byte metric baseline changed. The runtime field feeds OTEL histograms and plugin hooks, so maintainers should accept the lower dense-delta values before merge.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🌊 off-meta tidepool
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

[P2] Get maintainer acceptance that the bounded response-byte metric semantics are the desired upgrade behavior before merge.

Risk before merge

[P1] responseStreamBytes and openclaw.model_call.response_bytes will drop for dense delta streams, so dashboards, alert thresholds, snapshots, or plugin hooks that treated the previous event-shape estimate as their baseline need maintainer acceptance.
[P1] The PR body explicitly notes that a live long-running local-model session was not tested; focused source coverage and the AWS changed gate support the accounting change, but they do not prove the full Windows local-provider starvation repro.

Maintainer options:

Accept Bounded Metric Semantics (recommended)
Land the PR if maintainers are comfortable making responseStreamBytes and openclaw.model_call.response_bytes measure incremental streamed payload bytes for dense deltas.
Preserve Historical Metric Baseline
If existing dashboards or plugin hooks need the old event-shape estimate, revise before merge by keeping that value separately or documenting a migration path for observability consumers.

Next step before merge

[P2] Protected maintainer PR with a compatibility-sensitive exported metric semantic change; the next action is maintainer acceptance rather than an automated repair.

Security
Cleared: No security or supply-chain concern found; the patch touches diagnostic byte accounting, tests, and docs without changing dependencies, workflows, secrets, permissions, or raw content export.

Review details

Best possible solution:

Land this after maintainers explicitly accept the bounded response-byte metric semantics, or preserve the old estimate under a separate metric if existing observability consumers need it.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection: current main serializes the snapshotless delta event shape for response byte accounting, while the PR head counts the incremental delta string. I did not run a live long-running local-model reproduction in this read-only review.

Is this the best way to solve the issue?

Yes, this is the best scoped fix for the diagnostic accounting path: it changes the central embedded-stream diagnostic wrapper and updates the exported docs/OTEL description using the existing stream event contract. The remaining question is compatibility acceptance for the changed metric baseline, not a better code location.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 5b84ebfc56c9.

Label changes

Label changes:

add rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
remove rating: 🦞 diamond lobster: Current PR rating is rating: 🐚 platinum hermit, so this older rating label is no longer current.

Label justifications:

P1: This PR targets the P1 dense local-provider stream responsiveness work by removing repeated diagnostic serialization from hot model stream deltas.
merge-risk: 🚨 compatibility: Merging changes the observable byte values for responseStreamBytes and openclaw.model_call.response_bytes, which can affect existing dashboards, thresholds, and plugin hook consumers.
rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🌊 off-meta tidepool and patch quality is 🦞 diamond lobster.
status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Not applicable: The external-contributor real-proof gate is not applicable to this maintainer PR; the PR body still records focused tests, docs/lint checks, and an AWS Crabbox changed gate, with a live long-running local-model session left untested.

Evidence reviewed

PR surface:

Source +18, Tests -11, Docs +2. Total +9 across 5 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	2	21	3	+18
Tests	1	2	13	-11
Docs	2	4	2	+2
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	5	27	18	+9

What I checked:

Current main still uses event-shape byte accounting: On current main, responseStreamChunkByteLengthUnchecked removes partial but then serializes the remaining chunk shape for byte accounting, so text_delta contributes JSON event bytes rather than only the text delta bytes. (src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts:112, 5b84ebfc56c9)
PR counts high-frequency deltas directly: On the PR head, streamDeltaByteLength handles text_delta, thinking_delta, and toolcall_delta with string delta values before the fallback partial-stripping path. (src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts:116, 347f5430ac5c)
Stream event contract supports the delta-only path: The shared AssistantMessageEvent contract defines text, thinking, and tool-call delta events with delta: string plus an accumulated partial, matching the PR's predicate. (packages/llm-core/src/types.ts:369, 5b84ebfc56c9)
Runtime caller is the embedded model stream wrapper: The embedded attempt runner wraps the provider stream with wrapStreamFnWithDiagnosticModelCallEvents, so the changed accounting sits at the central diagnostic observation point for model stream chunks. (src/agents/embedded-agent-runner/run/attempt.ts:2807, 5b84ebfc56c9)
Exported metric surface consumes the changed value: The OTEL plugin records responseStreamBytes into openclaw.model_call.response_bytes, making the byte baseline an externally visible diagnostics/observability surface. (extensions/diagnostics-otel/src/service.ts:390, 5b84ebfc56c9)
Regression coverage matches the intended behavior: The PR-head test asserts dense text_delta accounting does not serialize a large partial snapshot and expects responseStreamBytes to equal Buffer.byteLength("abc", "utf8"). (src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts:360, 347f5430ac5c)

Likely related people:

vincentkoc: Authored the PR commits for this diagnostic accounting change and also carried the related merged Ollama dense-stream yield fix referenced from the same root issue. (role: recent area contributor and current PR implementer; confidence: high; commits: 311f49ad5adb, 347f5430ac5c, c7b190beec73; files: src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts, src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts, docs/logging.md)
joshavant: Current-main blame in this checkout points the existing diagnostic accounting helper, OTEL metric description, and related docs to a broad commit by this author, so they are a useful routing signal for the current baseline. (role: current-main source provenance; confidence: medium; commits: fbaa5a6f0a57; files: src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts, extensions/diagnostics-otel/src/service.ts, docs/logging.md)
osolmaz: Provided maintainer context that this PR is split out from the broader dense-stream work and should preserve the reviewer and attribution history from the closed broader branch. (role: related split-stack contributor; confidence: medium; commits: aac174b329d4; files: src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts, src/agents/runtime/proxy.ts, src/plugin-sdk/provider-stream-shared.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

vincentkoc · 2026-05-31T22:01:32Z

@clawsweeper re-review

clawsweeper · 2026-05-31T22:01:35Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26725738022
Updated: 2026-05-31T22:08:40.146Z

osolmaz · 2026-06-01T07:18:46Z

Attribution note: this diagnostics PR is one of the split-out pieces from the broader #87558 / #86599 dense-stream work. #87558 carried the same production direction: keep hot-path stream accounting proportional to new deltas instead of repeatedly walking full growing partial snapshots.

#87558 is being closed in favor of the smaller split PR stack. Please keep the #87558 context linked for attribution and reviewer history.

vincentkoc · 2026-06-06T14:09:35Z

Land-ready verification for head 347f5430ac5c26c1de11c902b4db252660d9df92.

Local focused proof: node scripts/run-vitest.mjs run src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts -> 1 file passed, 13 tests passed.
CI proof: GitHub Actions run 27064304709 is green; CodeQL run 27064304710 is green; OpenGrep PR diff run 27064304716 is green.
Maintainer decision: accepting the intentional diagnostics baseline change noted by ClawSweeper. responseStreamBytes / openclaw.model_call.response_bytes should count incremental streamed deltas instead of the prior event-shape estimate for dense streams.
Proof gap: no live long-running local-model session was rerun in this landing pass; the regression is covered by focused diagnostics tests plus the green PR matrix.

vincentkoc self-assigned this May 29, 2026

openclaw-barnacle Bot added agents Agent runtime and tooling size: S maintainer Maintainer-authored PR labels May 29, 2026

clawsweeper Bot added the rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. label May 29, 2026

clawsweeper Bot mentioned this pull request May 29, 2026

fix(ollama): yield during dense stream processing #87818

Merged

vincentkoc force-pushed the codex/model-stream-diagnostics-delta branch from e5c4273 to 79c1a48 Compare May 29, 2026 03:37

vincentkoc marked this pull request as ready for review May 29, 2026 03:38

RomneyDa added dependency-guard-backfill and removed dependency-guard-backfill labels May 29, 2026

openclaw-barnacle Bot added the docs Improvements or additions to documentation label May 29, 2026

vincentkoc force-pushed the codex/model-stream-diagnostics-delta branch 2 times, most recently from 8e88916 to 93683b4 Compare May 29, 2026 19:23

vincentkoc force-pushed the codex/model-stream-diagnostics-delta branch 4 times, most recently from d025738 to 8e7c6b7 Compare May 29, 2026 20:27

clawsweeper Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026

openclaw-barnacle Bot added the gateway Gateway runtime label May 31, 2026

clawsweeper Bot removed the status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. label May 31, 2026

openclaw-barnacle Bot removed the size: S label May 31, 2026

clawsweeper Bot added the status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. label May 31, 2026

openclaw-barnacle Bot added the size: XS label May 31, 2026

clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels May 31, 2026

vincentkoc force-pushed the codex/model-stream-diagnostics-delta branch 2 times, most recently from 1888fdf to c1783a4 Compare June 1, 2026 02:05

clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels Jun 1, 2026

vincentkoc mentioned this pull request Jun 1, 2026

fix(agents): project cron tool schemas for providers #88878

Draft

This was referenced Jun 1, 2026

fix(gateway): keep dense stream updates incremental #87558

Closed

fix(gateway): keep dense stream updates incremental #87940

Closed

vincentkoc force-pushed the codex/model-stream-diagnostics-delta branch from c1783a4 to 9fd5ad3 Compare June 6, 2026 13:51

vincentkoc added 4 commits June 6, 2026 16:01

fix(agents): count streamed model deltas incrementally

311f49a

docs(logging): clarify stream byte accounting

c76ae11

fix(agents): keep diagnostic byte accounting best-effort

6d90c9c

fix(diagnostics): align response byte metric description

347f543

vincentkoc force-pushed the codex/model-stream-diagnostics-delta branch from 9fd5ad3 to 347f543 Compare June 6, 2026 14:01

clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. labels Jun 6, 2026

vincentkoc merged commit 4af444a into main Jun 6, 2026
168 of 170 checks passed

vincentkoc deleted the codex/model-stream-diagnostics-delta branch June 6, 2026 14:09

vincentkoc mentioned this pull request Jun 6, 2026

fix(diagnostics): account stream deltas incrementally #87619

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): count streamed model deltas incrementally#87856

fix(agents): count streamed model deltas incrementally#87856
vincentkoc merged 4 commits into
mainfrom
codex/model-stream-diagnostics-delta

vincentkoc commented May 29, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 29, 2026 •

edited

Loading

Uh oh!

vincentkoc commented May 31, 2026

Uh oh!

clawsweeper Bot commented May 31, 2026 •

edited

Loading

Uh oh!

osolmaz commented Jun 1, 2026

Uh oh!

vincentkoc commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

vincentkoc commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Real behavior proof

Uh oh!

clawsweeper Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vincentkoc commented May 31, 2026

Uh oh!

clawsweeper Bot commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

osolmaz commented Jun 1, 2026

Uh oh!

vincentkoc commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vincentkoc commented May 29, 2026 •

edited

Loading

clawsweeper Bot commented May 29, 2026 •

edited

Loading

clawsweeper Bot commented May 31, 2026 •

edited

Loading