fix(agents): count streamed model deltas incrementally#87856
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 6, 2026, 10:08 AM ET / 14:08 UTC. Summary PR surface: Source +18, Tests -11, Docs +2. Total +9 across 5 files. Reproducibility: yes. from source inspection: current main serializes the snapshotless delta event shape for response byte accounting, while the PR head counts the incremental Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land this after maintainers explicitly accept the bounded response-byte metric semantics, or preserve the old estimate under a separate metric if existing observability consumers need it. Do we have a high-confidence way to reproduce the issue? Yes from source inspection: current main serializes the snapshotless delta event shape for response byte accounting, while the PR head counts the incremental Is this the best way to solve the issue? Yes, this is the best scoped fix for the diagnostic accounting path: it changes the central embedded-stream diagnostic wrapper and updates the exported docs/OTEL description using the existing stream event contract. The remaining question is compatibility acceptance for the changed metric baseline, not a better code location. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 5b84ebfc56c9. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +18, Tests -11, Docs +2. Total +9 across 5 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
e5c4273 to
79c1a48
Compare
8e88916 to
93683b4
Compare
d025738 to
8e7c6b7
Compare
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
1888fdf to
c1783a4
Compare
|
Attribution note: this diagnostics PR is one of the split-out pieces from the broader #87558 / #86599 dense-stream work. #87558 carried the same production direction: keep hot-path stream accounting proportional to new deltas instead of repeatedly walking full growing partial snapshots. #87558 is being closed in favor of the smaller split PR stack. Please keep the #87558 context linked for attribution and reviewer history. |
c1783a4 to
9fd5ad3
Compare
9fd5ad3 to
347f543
Compare
|
Land-ready verification for head
|
Summary
text_delta/thinking_delta/toolcall_deltadiagnostics from their incrementaldeltabytes instead of serializing the full event.partialsnapshots from fallback stream byte accounting so diagnostics do not repeatedly walk the accumulated assistant message.responseStreamBytessemantics for high-frequency streamed deltas and align OTEL docs/metric descriptions with that bounded payload contract.Refs #86599.
Verification
origin/maincompleted on headc1783a4997e.node scripts/check-docs-mdx.mjs docs/gateway/opentelemetry.md docs/logging.md: passed.node_modules/.bin/oxfmt --check --threads=1 docs/gateway/opentelemetry.md docs/logging.md extensions/diagnostics-otel/src/service.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts: passed.node scripts/run-oxlint.mjs extensions/diagnostics-otel/src/service.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.ts: passed.git diff --check origin/main...HEAD: passed.node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts extensions/diagnostics-otel/src/service.test.ts --reporter=dot: passed, 2 shards / 86 tests.pnpm check:changed: passed,provider=aws,lease=cbx_fc87d734de03,run=run_dafa0367419e, exit 0; selected lanescore,coreTests,extensions,extensionTests,docs.Real behavior proof
Behavior addressed: dense model streams no longer make model-call diagnostics serialize the full growing
partialassistant message on every delta.Real environment tested: focused local Vitest wrapper in a linked OpenClaw gwt worktree, plus AWS Crabbox Linux changed-gate proof.
Exact steps or command run after this patch:
node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.model-diagnostic-events.test.ts extensions/diagnostics-otel/src/service.test.ts --reporter=dot; docs mdx, oxfmt, oxlint, diff-check;node scripts/crabbox-wrapper.mjs run --provider aws --label diagnostic-delta-accounting-check-changed --shell -- "pnpm check:changed".Evidence after fix: focused tests passed, including assertions that
text_deltaaccounting does not callpartial.toJSON, counts incrementaldeltabytes, and keeps stream delivery alive when byte inspection cannot read a chunk. AWScheck:changedpassed the selected core, core test, extension, extension test, and docs lanes.Observed result after fix:
responseStreamBytesis computed from incremental delta bytes for high-frequency stream deltas and from snapshotless chunks for other partial-bearing stream events, while OTEL metadata and docs describe the exported histogram as bounded streamed response payload bytes.What was not tested: a live long-running local-model session.