feat(providers): streaming-native chat-client stack with composable routing by Aaronontheweb · Pull Request #1313 · netclaw-dev/netclaw

Aaronontheweb · 2026-06-03T08:51:39Z

What

Redesigns the IChatClient stack so the streaming-vs-non-streaming transport distinction stops leaking into callers, and resilience/observability stop being entangled with transport. The OpenAI Codex 400 {"detail":"Stream must be set to true"} on memory formation was a symptom of this structural debt (the streaming-only conclusion had already been reached twice independently — main loop and SubAgentActor).

Supersedes #1289 and #1309.

Why streaming-everywhere

All 8 providers stream cleanly; non-streaming is the unreliable path — Codex rejects it, and some SDK paths drop reasoning content in it. Standardizing on streaming as the only transport Netclaw issues collapses the provider-capability question entirely (we only wanted capabilities to know who supports non-streaming).

The redesign

Transport — Netclaw issues only streaming requests.

RetryingChatClient now retries the streaming path, pre-first-chunk (reusing RetryPolicy + the failover iterator pattern). Once a chunk is emitted a failure propagates, so streamed output is never duplicated by a restart. Retry is now a property of the stack, not of the transport a caller happened to pick.
The five auxiliary callers (title gen, memory distillation, compaction observation, curation, memory extraction) aggregate the stream via the existing StreamingResponseReader (empty-Messages guard + diagnostics) and read response.Text, replacing the non-streaming GetResponseAsync path. This also closes the response.Text vs Messages[^1].Text inconsistency and the empty-stream throw.

Composition — ChatClientBuilder. Per-(provider, model) pipelines are composed via Microsoft.Extensions.AI.ChatClientBuilder (Logging → Retry → VendorOptions → raw) in PipelineChatClientFactory. The .Use ordering is asserted by a test (it flips silently on a bad bump).

Routing — failover is the first policy, not a special case. A router selects which composed pipeline to invoke per call. RoutingChatClient walks an ordered candidate list, so failover = a list of length > 1 and single-provider alerting = the terminal failure of a length-1 walk — this absorbs and deletes FailoverChatClient, AlertingChatClientDecorator, NetclawChatClientProvider, and ResilientChatClientProviderDecorator. The router takes a ChatRoutingContext, so future per-session / per-provider routing slots in as a new policy without a rewrite (built the seam, not the feature). Today's RoleBasedFailoverRouter reproduces the prior wiring.

Logging — stateless + session-correlated. LoggingChatClient is now stateless (the per-session-intended delta/cumulative token counters that sat on a process singleton are gone — real per-session usage already lives in LlmSessionActor). It tags log lines with the ambient session id via SessionLoggingScope, keyed SessionId to match the WithContext("SessionId", …) key actors already use, so one Seq attribute correlates actor and chat-client logs.

Testing

Full solution build: 0 warnings, 0 errors
Netclaw.Daemon.Tests: 668/668
Netclaw.Actors.Tests: 2186/2186
dotnet slopwatch analyze: clean; file headers present
New/ported tests: streaming-retry (pre/post-first-chunk, max-retries, cancel), PipelineChatClientFactory order guard, RoutingChatClient failover/unreachable walk, RoleBasedFailoverRouter candidate selection, LoggingChatClient session-scope + stateless usage.

Notes / not in scope

Per-session/per-provider routing — the seam (ChatRoutingContext) is in; the policy is a future drop-in.
Mid-stream (post-first-chunk) retry — deliberately not done (would duplicate emitted tokens).
Adds the Microsoft.Extensions.AI core package (daemon-only) for ChatClientBuilder. Note: it ships its own LoggingChatClient, which collides with Netclaw's in test files importing both namespaces — resolved with a using alias (production uses same-namespace precedence).
Not run here (need a live model): the Codex end-to-end smoke and the eval suite. Behaviorally the aggregated response text is identical to the prior non-streaming path; only the transport changed.

…outing Redesign the IChatClient stack so the streaming-vs-non-streaming transport distinction no longer leaks into callers, and resilience/observability are no longer entangled with transport. Supersedes netclaw-dev#1289 and netclaw-dev#1309. Transport: Netclaw now issues only streaming requests. RetryingChatClient retries the streaming path (pre-first-chunk; a mid-stream failure propagates so emitted output is never duplicated), reusing RetryPolicy. The auxiliary callers (title generation, memory distillation, compaction observation, memory curation, memory extraction) aggregate the stream via the existing StreamingResponseReader (empty-Messages guard + diagnostics) and read response.Text, instead of the non-streaming GetResponseAsync path that the OpenAI Codex backend rejects (400 'Stream must be set to true') and that drops reasoning content on some providers. Composition: per-(provider,model) pipelines are composed with Microsoft.Extensions.AI ChatClientBuilder (Logging -> Retry -> VendorOptions -> raw) in PipelineChatClientFactory; the .Use ordering is asserted by test. Routing: a router selects which composed pipeline to invoke per call. RoutingChatClient walks an ordered candidate list, so failover is 'more than one candidate' and single-provider alerting is the terminal failure of a one-entry walk -- absorbing and deleting FailoverChatClient, AlertingChatClientDecorator, NetclawChatClientProvider and ResilientChatClientProviderDecorator. The router takes a ChatRoutingContext so future per-session/per-provider routing slots in as a new policy; today's RoleBasedFailoverRouter reproduces the prior wiring. Logging: LoggingChatClient is now stateless (the per-session-intended delta/cumulative token counters on the process singleton are gone) and tags log lines with the ambient session id via SessionLoggingScope, keyed 'SessionId' to match the WithContext('SessionId', ...) key used across actors so one Seq attribute correlates actor and chat-client logs. Verified: full solution build (0 warnings); Netclaw.Daemon.Tests 668/668; Netclaw.Actors.Tests 2186/2186; slopwatch clean; file headers present.

…ndex, streaming-failover tests Self-review of the stack redesign found three issues: 1. RetryingChatClient.BackoffAsync logged retry warnings without the SessionId scope, so the hottest failure path reached Seq uncorrelated while LoggingChatClient/RoutingChatClient logs carried it. Open the scope around the warning. 2. ChatRoutingContext.AttemptIndex was structurally unfulfillable: the per-role RoutingChatClient context is frozen at construction and the router is called once per call (not per candidate), so it could never be non-zero. Removed it; SessionId remains the genuine per-session-routing seam. 3. The streaming terminal-unreachable branch of RoutingChatClient (last candidate fails pre-first-chunk -> provider.unreachable + rethrow) and single-candidate / streaming-cancellation behavior were untested. Added three streaming tests.

…t retry Reconcile the two pre-first-chunk retry layers (review follow-up): the actor's per-turn streaming retry and the transport's RetryingChatClient overlapped on the main session loop. Make the transport the single owner. RetryPolicy.ShouldRetry now also recognizes the curated ProviderException (StatusCode 408/429/5xx, walking inner exceptions) that provider transports throw — previously only a raw HttpRequestException was retried, so the transport retry barely fired for real provider 429/5xx. That was the broader coverage the actor's LlmFailureClassifier.IsTransientStreaming provided. The transport RetryPolicy is now configured from Session:Tuning:StreamingRetryPolicy (resolves the prior TODO; preserves the knob), making RetryingChatClient the single, configurable, uniform retry layer for every caller (main loop + sidecars). Remove the actor per-turn retry: the LlmCallFailed pre-content retry block, the RetryLlmCallAfterBackoff message+handler, _streamingRetryAttempt, the streaming-retry timer, and the now-unused IsTransientStreaming. Context-overflow rollback and terminal FailCurrentTurn handling are unchanged. Re-home coverage: the actor-level Streaming_retry_recovers/_exhaustion integration tests are removed (the actor no longer retries); a ProviderException(502) recover case is added to RetryingChatClientTests at the transport layer. Verified: full build (0 warnings); Daemon 672/672; Actors 2184/2184; slopwatch + headers clean.

…ming-stack

…aming-only The streaming-native chat-client stack converts every auxiliary caller (title generation, memory extraction, compaction) to the streaming transport, so no code path invokes the non-streaming GetResponseAsync on the Codex client anymore. The StreamingOnlyChatClient wrapper (added in 0e5ba60 to serve those calls via streaming under the hood) is now dead; the Codex 'Stream must be set to true' 400 is structurally unreachable. A comment on the OAuth path records why no special handling is needed.

Aaronontheweb

LGTM

Aaronontheweb · 2026-06-04T13:16:59Z

+    /// Netclaw issues) and aggregates with the same empty-response fallback, so the
+    /// caller can read <c>result.Response.Text</c> without an empty-Messages guard.
+    /// </summary>
+    public static Task<StreamReadResult> ReadAsync(


Aaronontheweb · 2026-06-04T13:17:24Z

-/// Timer-fired message that triggers an LLM call retry after exponential backoff.
-/// Carries the attempt number for observability logging.
-/// </summary>
-internal sealed record RetryLlmCallAfterBackoff(int Attempt) : INoSerializationVerificationNeeded;


this has been moved to the IChatClient layer

Aaronontheweb · 2026-06-04T13:17:57Z

all LLM retry logic for connectivity issues lives in the IChatClient layer now.

Aaronontheweb · 2026-06-04T13:20:39Z

simplified this to remove all mutable state

The streaming-native chat-client stack (#1313) introduced the routing seam — IChatClientRouter.Route(ChatRoutingContext) — as groundwork for concurrent multi-model / multi-provider routing, but the seam comments described that future work without linking the tracking issue. Cite #648 at the two authoritative seam locations (the ChatRoutingContext type and the IChatClientRouter interface) so whoever picks up #648 lands on the seam they extend. Comment-only.

Aaronontheweb added 7 commits June 3, 2026 08:51

Merge remote-tracking branch 'origin/dev' into feat/chat-client-strea…

2c18963

…ming-stack

Merge branch 'dev' into feat/chat-client-streaming-stack

1806907

Merge branch 'dev' into feat/chat-client-streaming-stack

7f6583b

Aaronontheweb enabled auto-merge (squash) June 4, 2026 12:43

Aaronontheweb added reliability Retries, resilience, graceful degradation providers Provider integrations and capability detection across OpenAI-compatible backends. labels Jun 4, 2026

Aaronontheweb disabled auto-merge June 4, 2026 12:43

Aaronontheweb commented Jun 4, 2026

View reviewed changes

Aaronontheweb merged commit 3bc95a9 into netclaw-dev:dev Jun 4, 2026
15 checks passed

Aaronontheweb deleted the feat/chat-client-streaming-stack branch June 4, 2026 13:22

This was referenced Jun 4, 2026

chore: release 0.23.0-beta.4 #1328

Merged

docs(providers): cite #648 at the chat-client routing seam #1335

Merged

Aaronontheweb mentioned this pull request Jun 6, 2026

feat(providers): concurrent multi-model / multi-provider architecture for per-role and per-agent routing #648

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): streaming-native chat-client stack with composable routing#1313

feat(providers): streaming-native chat-client stack with composable routing#1313
Aaronontheweb merged 7 commits into
netclaw-dev:devfrom
Aaronontheweb:feat/chat-client-streaming-stack

Aaronontheweb commented Jun 3, 2026

Uh oh!

Aaronontheweb left a comment

Uh oh!

Aaronontheweb Jun 4, 2026

Uh oh!

Aaronontheweb Jun 4, 2026

Uh oh!

Aaronontheweb Jun 4, 2026

Uh oh!

Aaronontheweb Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Aaronontheweb commented Jun 3, 2026

What

Why streaming-everywhere

The redesign

Testing

Notes / not in scope

Uh oh!

Aaronontheweb left a comment

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant