Skip to content

fix(gateway): keep dense stream updates incremental#87558

Closed
osolmaz wants to merge 21 commits into
mainfrom
codex/ollama-dense-gateway-repro
Closed

fix(gateway): keep dense stream updates incremental#87558
osolmaz wants to merge 21 commits into
mainfrom
codex/ollama-dense-gateway-repro

Conversation

@osolmaz

@osolmaz osolmaz commented May 28, 2026

Copy link
Copy Markdown
Member

Opened on behalf of Onur Solmaz (@osolmaz). Windows, WSL2, and real Ollama-daemon validation are still open proof gaps. Maintainer AWS Crabbox proof is green on Linux for the refreshed branch.

Dense local-provider streams could make the gateway slow to answer unrelated requests. This change makes streaming delta-first for the hot path: providers emit cheap deltas, the shared accumulator owns the full assistant message, and full snapshots are reserved for boundaries. It also keeps the live gateway repro so reviewers can prove the delay behavior instead of relying only on unit tests.

Refs #86599.
Related #86633.

Summary

Dense local-provider streams could starve the event loop even though the response was asynchronous. The expensive part was CPU work repeated for every tiny chunk, especially parsing dense Ollama-compatible output and repeatedly carrying or inspecting full growing assistant messages.

The fix keeps ordinary streaming work proportional to the new delta, not the whole answer so far. Ollama still yields cooperatively while parsing dense NDJSON, but the long-term fix is the shared accumulator plus delta-first subscribers.

What Changed

  • Added scripts/repro-gateway-dense-stream-latency.ts as an opt-in live gateway harness with --assert-responsive p99 latency assertion.
  • Added createAssistantStreamAccumulator in core and re-exported it through openclaw/plugin-sdk/provider-stream-shared.
  • Migrated Ollama native streaming to emit lightweight partial payloads on text_delta and thinking_delta; boundary/final events still carry full content.
  • Migrated proxy streaming to the same accumulator in snapshot mode.
  • Updated assistant stream subscription so unphased text_delta handling uses delta and the streaming directive accumulator instead of repeatedly extracting full partial text.
  • Preserved replacement stream clears through agent parser state, gateway projection, SDK projection, UI stream state, embedded TUI projection, and plain-text tool-call promotion.
  • Kept empty replacement clears out of channel onPartialReply until draft-preview consumers have an explicit clear contract.
  • Scoped plain-text tool-call replacement clears by content index so a later tool-call candidate cannot erase earlier visible text.
  • Added focused accumulator, proxy, gateway coalescing, replacement-clear, Ollama stream, SDK projection, TUI projection, UI projection, provider wrapper, and subscriber tests.

Testing

Latest verification for pushed head 51734f90063dad50f8c22210e822f8aa251edbaa:

  • Rebases/conflict resolution: rebased onto origin/main cde6aff62284adcdf91a375c6bd64e02c389aac4; resolved the generated Plugin SDK API checksum conflict by regenerating the baseline. GitHub base later advanced to e95fbc05aa611eacea68a1bfe78309f9639e72b6; the PR is mergeable and GitHub CI is covering that moving base.
  • node scripts/run-vitest.mjs src/plugin-sdk/provider-stream-shared.test.ts src/agents/embedded-agent-subscribe.handlers.messages.test.ts src/gateway/server-chat.agent-events.test.ts src/tui/embedded-backend.test.ts src/agents/btw.test.ts extensions/googlechat/src/targets.test.ts extensions/matrix/src/matrix/sdk/transport.test.ts - local node wrapper, 9 files / 300 tests passed.
  • node --max-old-space-size=8192 --import tsx scripts/generate-plugin-sdk-api-baseline.ts --check - passed after conflict resolution.
  • node scripts/format-docs.mjs --check docs/plugins/sdk-provider-plugins.md docs/plugins/sdk-subpaths.md - passed.
  • git diff --check - passed.
  • AWS Crabbox changed gate: provider aws, lease cbx_3fbfabae8a2b, run run_5f108939a7c2, pnpm check:changed passed, exit 0, cleanup stopped=true.
  • AWS Crabbox focused stream proof before the final generated-doc rebase: provider aws, lease cbx_4fee4ef18d6e, run run_9fef81435b43, 19 files / 621 tests passed, exit 0, cleanup stopped=true.
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main - clean before the final generated-baseline rebase; final rebase only regenerated docs/baseline conflict output.

Earlier focused/live proof from the original patch included pnpm tsgo:core, pnpm tsgo:extensions, pnpm build, and node --import tsx scripts/repro-gateway-dense-stream-latency.ts --assert-responsive.

What was not tested in this refresh: Windows, WSL2, and a real Ollama daemon producing naturally dense output.

Real behavior proof

Behavior addressed: dense local-provider stream gateway responsiveness delay related to #86599 and #86633.
Real environment tested: local macOS OpenClaw checkout on codex/ollama-dense-gateway-repro, real OpenClaw gateway process, local Ollama-compatible dense NDJSON endpoint. This does not use a real Ollama daemon for the dense output.
Exact steps or command run after this patch: node --import tsx scripts/repro-gateway-dense-stream-latency.ts --assert-responsive
Evidence after fix:

{
  "liveGateway": true,
  "providerProtocol": "ollama-native-compatible",
  "realProviderDaemon": false,
  "chunks": 10000,
  "chunkBytes": 64,
  "serverBatchSize": 100,
  "finalExpectedChars": 640000,
  "chatRequests": 1,
  "sendStatus": "started",
  "runStatus": "ok",
  "ackMs": 105,
  "totalMs": 36414,
  "baseline": {
    "count": 20,
    "min": 36,
    "p50": 41,
    "p95": 56,
    "p99": 56,
    "max": 79,
    "over100ms": 0,
    "over500ms": 0,
    "over1000ms": 0
  },
  "during": {
    "count": 465,
    "min": 30,
    "p50": 34,
    "p95": 279,
    "p99": 323,
    "max": 2436,
    "over100ms": 87,
    "over500ms": 3,
    "over1000ms": 1,
    "errors": 0
  },
  "chatEvents": 77,
  "finalChars": 500000
}

Observed result after fix: the dense run completed with p99 health latency 323ms and no health errors. Max latency still had outliers; a one-chunk control run also showed a >1s max outlier, so the fixed-branch assertion intentionally uses p99 rather than max.
What was not tested: Windows, WSL2, and a real Ollama daemon producing naturally dense output.

Risks

The main compatibility risk is any consumer that incorrectly treated partial on every delta as the canonical full message. The stream contract now says ordinary delta consumers should use delta, while boundary/final events carry full content.

  • Ollama text_delta and thinking_delta events now carry lightweight partial content; text_end, thinking_end, and done still carry full content.
  • Proxy streaming keeps snapshot partials, so existing proxy consumers should not see a shape change.
  • Phase-aware OpenAI Responses behavior falls back to reconstructed message metadata when delta partials are lightweight.
  • Explicit replacement clears bypass the normal non-empty text gate across gateway, UI, SDK, local embedded TUI, and plain-text tool-call promotion paths, while suppressed control lead fragments remain suppressed.
  • Channel onPartialReply does not receive empty replacement clears yet; channel draft-preview clear semantics should be added as a separate explicit contract.
  • The live repro is opt-in and should not run in normal CI.

@openclaw-barnacle openclaw-barnacle Bot added scripts Repository scripts size: L maintainer Maintainer-authored PR labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 29, 2026, 1:58 PM ET / 17:58 UTC.

Summary
The PR adds a shared assistant stream accumulator, changes dense Ollama/Gateway streaming to delta-first updates with replacement-clear propagation across agent, SDK, Gateway, UI, and TUI consumers, documents the stream contract, and adds focused tests plus a live latency repro.

PR surface: Source +408, Tests +1461, Docs +8, Generated 0, Other +602. Total +2479 across 36 files.

Reproducibility: Do we have a high-confidence way to reproduce the issue? Partly: the linked issue logs and current-main source show the dense stream/full-partial hot path, and the PR adds a live dense Gateway harness, but this review did not reproduce the exact Windows/WSL2 real-Ollama failure.

Review metrics: 1 noteworthy metric.

  • Plugin stream contract: 1 SDK helper exported; delta partial semantics changed. Third-party and bundled provider consumers need maintainer-visible signoff because delta events no longer guarantee full partial content.

Merge readiness
Overall: 🦞 diamond lobster
Proof: 🦞 diamond lobster
Patch quality: 🦞 diamond lobster
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

  • [P1] Public plugin/SDK consumers that treated partial on every delta as the canonical full assistant message may regress; the PR documents delta as the source of truth, but that is still a compatibility-sensitive contract change.
  • [P1] Replacement and empty-clear behavior now spans agent-core, embedded subscribers, Gateway projection, SDK projection, Control UI, TUI, and plain-text tool-call promotion; a missed consumer could leave stale visible text or clear the wrong segment.
  • [P1] The linked user report is specifically Windows/WSL2 local-provider Gateway starvation; current proof covers macOS, Linux/AWS, a synthetic dense endpoint, and a local Ollama provider smoke, but not the exact Windows/WSL2 real-Ollama Gateway path.

Maintainer options:

  1. Require stream-contract signoff (recommended)
    Have the maintainer owning the plugin SDK/Gateway stream contract explicitly accept lightweight delta partial semantics and the remaining Windows/WSL2 proof gap before merge.
  2. Request exact Windows proof
    Ask for a Windows/WSL2 real-Ollama Gateway run of the live repro before merge if maintainers do not want to accept the current macOS/Linux proof scope.
  3. Accept the compatibility risk
    Maintainers may intentionally land the new delta-first contract with the docs and tests included here, then handle any third-party consumer fallout as plugin SDK follow-up.

Next step before merge

  • Human maintainer review is the remaining action because the PR has a protected maintainer label, changes plugin-facing stream semantics, and leaves the exact Windows/WSL2 proof scope to owner judgment.

Security
Cleared: The diff adds an opt-in local repro script and stream/runtime code without new dependencies, workflow permissions, lockfile changes, secret handling, or package execution changes that introduce a concrete supply-chain concern.

Review details

Best possible solution:

Land the delta-first stream accumulator only after a maintainer accepts the plugin stream contract change and the Windows/WSL2 proof scope, with channel draft-preview clear semantics left as an explicit follow-up contract.

Do we have a high-confidence way to reproduce the issue?

Do we have a high-confidence way to reproduce the issue? Partly: the linked issue logs and current-main source show the dense stream/full-partial hot path, and the PR adds a live dense Gateway harness, but this review did not reproduce the exact Windows/WSL2 real-Ollama failure.

Is this the best way to solve the issue?

Is this the best way to solve the issue? Likely yes, but it needs maintainer acceptance because it fixes the hot path at the shared stream contract and consumer projections instead of only yielding in one provider parser.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 6d362dbe9a51.

Label changes

Label changes:

  • add rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • remove rating: 🐚 platinum hermit: Current PR rating is rating: 🦞 diamond lobster, so this older rating label is no longer current.

Label justifications:

  • P1: The PR addresses an urgent local-provider Gateway starvation bug that can make unrelated Gateway and channel operations stall for real users.
  • merge-risk: 🚨 compatibility: The diff changes public assistant stream event semantics and exports a new provider-stream helper through a plugin SDK subpath.
  • merge-risk: 🚨 message-delivery: The diff changes replacement and empty-clear propagation across Gateway, SDK, UI, TUI, and agent subscribers where stale or wrongly cleared assistant text would be user-visible.
  • rating: 🦞 diamond lobster: Overall readiness is 🦞 diamond lobster; proof is 🦞 diamond lobster and patch quality is 🦞 diamond lobster.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body and comments include after-fix live output from a real Gateway dense-stream harness, AWS Crabbox changed gate, focused tests, and a local Ollama daemon provider-stream smoke; exact Windows/WSL2 Gateway proof remains a maintainer proof-scope decision.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body and comments include after-fix live output from a real Gateway dense-stream harness, AWS Crabbox changed gate, focused tests, and a local Ollama daemon provider-stream smoke; exact Windows/WSL2 Gateway proof remains a maintainer proof-scope decision.
Evidence reviewed

PR surface:

Source +408, Tests +1461, Docs +8, Generated 0, Other +602. Total +2479 across 36 files.

View PR surface stats
Area Files Added Removed Net
Source 18 727 319 +408
Tests 14 1483 22 +1461
Docs 2 9 1 +8
Config 0 0 0 0
Generated 1 2 2 0
Other 1 602 0 +602
Total 36 2823 344 +2479

What I checked:

  • Repository policy applied: Root and scoped AGENTS.md files were read; the root policy treats plugin APIs, provider routing, gateway protocol, and channel delivery behavior as compatibility-sensitive, and the plugin-sdk guide requires docs, exports, baseline, and tests to stay aligned when expanding public SDK subpaths. (AGENTS.md:20, 6d362dbe9a51)
  • Current main still has the old stream contract: Current main's AssistantMessageEvent type still defines text_delta as delta plus partial, without replace semantics or a lightweight-partial contract, so the PR's public stream semantics are not already implemented on main. (src/llm/types.ts:328, 6d362dbe9a51)
  • Current main still builds full Ollama partial snapshots: Current main's Ollama stream path builds current content snapshots from accumulated thinking/text before emitting stream events, which matches the repeated-growing-partial hot path the PR is replacing. (extensions/ollama/src/stream.ts:1179, 6d362dbe9a51)
  • PR introduces the shared accumulator: The PR head adds createAssistantStreamAccumulator with empty versus snapshot delta partial modes and replace-aware text deltas, making the full assistant message owned by the accumulator rather than every delta consumer. (src/llm/assistant-stream-accumulator.ts:77, aac174b329d4)
  • PR updates Gateway replacement delivery: The PR head makes Gateway chat deltas replace-aware and allows explicit replacement clears to bypass the normal non-empty/throttle path, which is the user-visible message-delivery surface that needs maintainer review. (src/gateway/server-chat.ts:501, aac174b329d4)
  • Proof supplied in PR discussion: The PR body and comments report a live macOS Gateway dense-stream repro, AWS Crabbox changed gate, focused stream tests, and a real local Ollama daemon provider-stream smoke; the explicit remaining proof gap is exact Windows/WSL2 real-Ollama Gateway reproduction. (aac174b329d4)

Likely related people:

  • steipete: Git history shows repeated recent work across Gateway chat projection, provider stream SDK seams, and Ollama stream hardening, including commits touching the central files and contracts. (role: recent area contributor; confidence: high; commits: 42f9737e592f, 43b36bfe8cc7, be526d642312; files: src/gateway/server-chat.ts, src/gateway/live-chat-projector.ts, src/plugin-sdk/provider-stream-shared.ts)
  • Takhoffman: Recent Gateway session and chat lifecycle history includes multiple commits preserving chat/session metadata on the same server-chat surface this PR changes. (role: adjacent gateway/session contributor; confidence: medium; commits: 8aace2b44828, e890cde041f6, 9a57bdfdf14b; files: src/gateway/server-chat.ts, src/gateway/live-chat-projector.ts)
  • vincentkoc: Current-main history includes provider-stream helper alignment, and the PR timeline/commits show Vincent carrying the latest branch refreshes and rebase repairs on this stream fix. (role: recent provider stream and branch follow-up owner; confidence: medium; commits: 9c42e6424d25, 8f2f65a59ade, 8f8a86ea64f5; files: src/plugin-sdk/provider-stream-shared.ts, extensions/ollama/src/stream.ts, src/gateway/server-chat.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@osolmaz osolmaz changed the title test(gateway): add Ollama dense stream repro test(gateway): add dense stream latency repro May 28, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P2 Normal backlog priority with limited blast radius. labels May 28, 2026
@clawsweeper

clawsweeper Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg: ✨ hatched 🥚 common Sunspot Signal Puff. Rarity: 🥚 common. Trait: finds missing screenshots.

Details

Share on X: post this hatch
Copy: My PR egg hatched a 🥚 common Sunspot Signal Puff in ClawSweeper.
Hatchability:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

About:

  • Eggs appear after real-behavior proof passes. They are collectible flavor only.
  • Review momentum changes the shell state: follow-up work warms it, re-review makes it wobble, and a clean final review lets it hatch.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling extensions: ollama labels May 28, 2026
@osolmaz osolmaz changed the title test(gateway): add dense stream latency repro fix(gateway): keep dense stream updates incremental May 28, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. P2 Normal backlog priority with limited blast radius. labels May 28, 2026
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: XL and removed size: L labels May 28, 2026
@clawsweeper clawsweeper Bot added the merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. label May 28, 2026
@osolmaz osolmaz requested a review from vincentkoc May 28, 2026 08:18
@vincentkoc vincentkoc self-assigned this May 28, 2026
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels May 28, 2026
@openclaw-barnacle openclaw-barnacle Bot added the docs Improvements or additions to documentation label May 28, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 28, 2026
@vincentkoc vincentkoc force-pushed the codex/ollama-dense-gateway-repro branch from aad3df3 to aac174b Compare May 29, 2026 17:50
@clawsweeper clawsweeper Bot added rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 29, 2026
@osolmaz

osolmaz commented Jun 1, 2026

Copy link
Copy Markdown
Member Author

Closing this broad PR in favor of the smaller split PR stack.

This branch proved the larger shape: dense stream work should be delta-first, diagnostics should avoid re-walking full growing partial snapshots, and full assistant snapshots should be kept for boundaries/checkpoints rather than every hot-path delta. Since then, the work has been split into narrower PRs that are easier to review and land:

The remaining accumulator/lightweight-partial contract work should come back only as a smaller successor PR after the landed split PRs settle, not by carrying this stale/conflicting branch forward.

Attribution note: this PR was the original broad implementation/proof branch for the dense stream accumulator direction and should remain linked from direct derivative/successor PRs for context and credit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling app: web-ui App: web-ui docs Improvements or additions to documentation extensions: ollama gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦞 diamond lobster Very strong PR readiness with only minor maintainer review expected. scripts Repository scripts size: XL status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants