fix(gateway): keep dense stream updates incremental#87558
Conversation
|
Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 1:58 PM ET / 17:58 UTC. Summary PR surface: Source +408, Tests +1461, Docs +8, Generated 0, Other +602. Total +2479 across 36 files. Reproducibility: Do we have a high-confidence way to reproduce the issue? Partly: the linked issue logs and current-main source show the dense stream/full-partial hot path, and the PR adds a live dense Gateway harness, but this review did not reproduce the exact Windows/WSL2 real-Ollama failure. Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the delta-first stream accumulator only after a maintainer accepts the plugin stream contract change and the Windows/WSL2 proof scope, with channel draft-preview clear semantics left as an explicit follow-up contract. Do we have a high-confidence way to reproduce the issue? Do we have a high-confidence way to reproduce the issue? Partly: the linked issue logs and current-main source show the dense stream/full-partial hot path, and the PR adds a live dense Gateway harness, but this review did not reproduce the exact Windows/WSL2 real-Ollama failure. Is this the best way to solve the issue? Is this the best way to solve the issue? Likely yes, but it needs maintainer acceptance because it fixes the hot path at the shared stream contract and consumer projections instead of only yielding in one provider parser. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 6d362dbe9a51. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +408, Tests +1461, Docs +8, Generated 0, Other +602. Total +2479 across 36 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg: ✨ hatched 🥚 common Sunspot Signal Puff. Rarity: 🥚 common. Trait: finds missing screenshots. DetailsShare on X: post this hatch
About:
|
aad3df3 to
aac174b
Compare
|
Closing this broad PR in favor of the smaller split PR stack. This branch proved the larger shape: dense stream work should be delta-first, diagnostics should avoid re-walking full growing partial snapshots, and full assistant snapshots should be kept for boundaries/checkpoints rather than every hot-path delta. Since then, the work has been split into narrower PRs that are easier to review and land:
The remaining accumulator/lightweight-partial contract work should come back only as a smaller successor PR after the landed split PRs settle, not by carrying this stale/conflicting branch forward. Attribution note: this PR was the original broad implementation/proof branch for the dense stream accumulator direction and should remain linked from direct derivative/successor PRs for context and credit. |
Opened on behalf of Onur Solmaz (@osolmaz). Windows, WSL2, and real Ollama-daemon validation are still open proof gaps. Maintainer AWS Crabbox proof is green on Linux for the refreshed branch.
Dense local-provider streams could make the gateway slow to answer unrelated requests. This change makes streaming delta-first for the hot path: providers emit cheap deltas, the shared accumulator owns the full assistant message, and full snapshots are reserved for boundaries. It also keeps the live gateway repro so reviewers can prove the delay behavior instead of relying only on unit tests.
Refs #86599.
Related #86633.
Summary
Dense local-provider streams could starve the event loop even though the response was asynchronous. The expensive part was CPU work repeated for every tiny chunk, especially parsing dense Ollama-compatible output and repeatedly carrying or inspecting full growing assistant messages.
The fix keeps ordinary streaming work proportional to the new delta, not the whole answer so far. Ollama still yields cooperatively while parsing dense NDJSON, but the long-term fix is the shared accumulator plus delta-first subscribers.
What Changed
scripts/repro-gateway-dense-stream-latency.tsas an opt-in live gateway harness with--assert-responsivep99 latency assertion.createAssistantStreamAccumulatorin core and re-exported it throughopenclaw/plugin-sdk/provider-stream-shared.partialpayloads ontext_deltaandthinking_delta; boundary/final events still carry full content.text_deltahandling usesdeltaand the streaming directive accumulator instead of repeatedly extracting full partial text.onPartialReplyuntil draft-preview consumers have an explicit clear contract.Testing
Latest verification for pushed head
51734f90063dad50f8c22210e822f8aa251edbaa:origin/maincde6aff62284adcdf91a375c6bd64e02c389aac4; resolved the generated Plugin SDK API checksum conflict by regenerating the baseline. GitHub base later advanced toe95fbc05aa611eacea68a1bfe78309f9639e72b6; the PR is mergeable and GitHub CI is covering that moving base.node scripts/run-vitest.mjs src/plugin-sdk/provider-stream-shared.test.ts src/agents/embedded-agent-subscribe.handlers.messages.test.ts src/gateway/server-chat.agent-events.test.ts src/tui/embedded-backend.test.ts src/agents/btw.test.ts extensions/googlechat/src/targets.test.ts extensions/matrix/src/matrix/sdk/transport.test.ts- local node wrapper, 9 files / 300 tests passed.node --max-old-space-size=8192 --import tsx scripts/generate-plugin-sdk-api-baseline.ts --check- passed after conflict resolution.node scripts/format-docs.mjs --check docs/plugins/sdk-provider-plugins.md docs/plugins/sdk-subpaths.md- passed.git diff --check- passed.aws, leasecbx_3fbfabae8a2b, runrun_5f108939a7c2,pnpm check:changedpassed, exit0, cleanupstopped=true.aws, leasecbx_4fee4ef18d6e, runrun_9fef81435b43, 19 files / 621 tests passed, exit0, cleanupstopped=true..agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main- clean before the final generated-baseline rebase; final rebase only regenerated docs/baseline conflict output.Earlier focused/live proof from the original patch included
pnpm tsgo:core,pnpm tsgo:extensions,pnpm build, andnode --import tsx scripts/repro-gateway-dense-stream-latency.ts --assert-responsive.What was not tested in this refresh: Windows, WSL2, and a real Ollama daemon producing naturally dense output.
Real behavior proof
Behavior addressed: dense local-provider stream gateway responsiveness delay related to #86599 and #86633.
Real environment tested: local macOS OpenClaw checkout on
codex/ollama-dense-gateway-repro, real OpenClaw gateway process, local Ollama-compatible dense NDJSON endpoint. This does not use a real Ollama daemon for the dense output.Exact steps or command run after this patch:
node --import tsx scripts/repro-gateway-dense-stream-latency.ts --assert-responsiveEvidence after fix:
{ "liveGateway": true, "providerProtocol": "ollama-native-compatible", "realProviderDaemon": false, "chunks": 10000, "chunkBytes": 64, "serverBatchSize": 100, "finalExpectedChars": 640000, "chatRequests": 1, "sendStatus": "started", "runStatus": "ok", "ackMs": 105, "totalMs": 36414, "baseline": { "count": 20, "min": 36, "p50": 41, "p95": 56, "p99": 56, "max": 79, "over100ms": 0, "over500ms": 0, "over1000ms": 0 }, "during": { "count": 465, "min": 30, "p50": 34, "p95": 279, "p99": 323, "max": 2436, "over100ms": 87, "over500ms": 3, "over1000ms": 1, "errors": 0 }, "chatEvents": 77, "finalChars": 500000 }Observed result after fix: the dense run completed with p99 health latency
323msand no health errors. Max latency still had outliers; a one-chunk control run also showed a >1s max outlier, so the fixed-branch assertion intentionally uses p99 rather than max.What was not tested: Windows, WSL2, and a real Ollama daemon producing naturally dense output.
Risks
The main compatibility risk is any consumer that incorrectly treated
partialon every delta as the canonical full message. The stream contract now says ordinary delta consumers should usedelta, while boundary/final events carry full content.text_deltaandthinking_deltaevents now carry lightweight partial content;text_end,thinking_end, anddonestill carry full content.onPartialReplydoes not receive empty replacement clears yet; channel draft-preview clear semantics should be added as a separate explicit contract.