fix: stream everywhere for auxiliary LLM calls + session id on LLM logs#1309
fix: stream everywhere for auxiliary LLM calls + session id on LLM logs#1309Aaronontheweb wants to merge 4 commits into
Conversation
Auxiliary LLM calls (title generation, memory distillation, compaction) used the non-streaming IChatClient.GetResponseAsync path, which the OpenAI Codex backend rejects with 400 {"detail":"Stream must be set to true"}. The interactive session loop already streams, so only the sidecars broke.
Make streaming the single request mode at the provider boundary: a new StreamFirstChatClient serves GetResponseAsync by aggregating the streaming endpoint (GetStreamingResponseAsync(...).ToChatResponseAsync(...)), applied to every client in ProviderPluginFactory.Create. No per-provider capability model -- streaming is near-universal among supported providers, and any environment that breaks streaming would already break the main loop.
Separately, these LLM client logs reached Seq with no session id: the id lives in SessionDiagnosticsContext (AsyncLocal) but the OTLP exporter never projected it onto the LogRecord. Fix it at the logger -- LoggingChatClient now opens a "session.id" logging scope from the ambient context around both response paths, riding the already-enabled IncludeScopes so the id surfaces as a filterable Seq attribute with no telemetry changes.
Supersedes netclaw-dev#1289.
… doc A review found StreamFirstChatClient.GetResponseAsync could return a ChatResponse with zero messages when the underlying stream yields no updates (empty/aborted completion). Auxiliary callers index Messages[^1] (title generation, memory distillation, compaction), so an empty stream was silent data loss. Make GetResponseAsync async and append an empty assistant message when the aggregate is empty, matching StreamingResponseReader's guard; add a regression test. Also correct the BeginSessionScope doc comment: the session.id scope only adds correlation on the OTLP export path, not every sink (the rolling-file provider ignores scopes and routes session.log independently).
Extract SessionLoggingScope so the session.id logging scope is shared, and route FailoverChatClient's failover/outage log lines through it — those are emitted outside LoggingChatClient's scope and previously reached Seq without session correlation. (AlertingChatClientDecorator emits only notification-sink alerts, no MEL logs, so needs nothing.) Use a single-element KeyValuePair array instead of a per-call Dictionary for the scope state. Test cleanup: share one ScopeCapturingLogger across the chat-client tests; add a failover session-id test.
Aaronontheweb
left a comment
There was a problem hiding this comment.
Needs work
| /// </summary> | ||
| internal static class SessionLoggingScope | ||
| { | ||
| private const string SessionIdKey = "session.id"; |
There was a problem hiding this comment.
is this consistent with where we've done it elsewhere?
| if (response.Messages.Count == 0) | ||
| response.Messages.Add(new ChatMessage(ChatRole.Assistant, [])); | ||
|
|
||
| return response; |
There was a problem hiding this comment.
this is a bad design - better to have everything consume the IAsyncEnumerable than to hide it in a long-running Task like this. Inverse of what we should be doing here.
|
Converting to draft — pausing this approach. The Codex Rather than pile on, we're doing a focused design pass on the chat-client stack first. This PR (and #1289) will be superseded by that work. |
…outing (#1313) * feat(providers): streaming-native chat-client stack with composable routing Redesign the IChatClient stack so the streaming-vs-non-streaming transport distinction no longer leaks into callers, and resilience/observability are no longer entangled with transport. Supersedes #1289 and #1309. Transport: Netclaw now issues only streaming requests. RetryingChatClient retries the streaming path (pre-first-chunk; a mid-stream failure propagates so emitted output is never duplicated), reusing RetryPolicy. The auxiliary callers (title generation, memory distillation, compaction observation, memory curation, memory extraction) aggregate the stream via the existing StreamingResponseReader (empty-Messages guard + diagnostics) and read response.Text, instead of the non-streaming GetResponseAsync path that the OpenAI Codex backend rejects (400 'Stream must be set to true') and that drops reasoning content on some providers. Composition: per-(provider,model) pipelines are composed with Microsoft.Extensions.AI ChatClientBuilder (Logging -> Retry -> VendorOptions -> raw) in PipelineChatClientFactory; the .Use ordering is asserted by test. Routing: a router selects which composed pipeline to invoke per call. RoutingChatClient walks an ordered candidate list, so failover is 'more than one candidate' and single-provider alerting is the terminal failure of a one-entry walk -- absorbing and deleting FailoverChatClient, AlertingChatClientDecorator, NetclawChatClientProvider and ResilientChatClientProviderDecorator. The router takes a ChatRoutingContext so future per-session/per-provider routing slots in as a new policy; today's RoleBasedFailoverRouter reproduces the prior wiring. Logging: LoggingChatClient is now stateless (the per-session-intended delta/cumulative token counters on the process singleton are gone) and tags log lines with the ambient session id via SessionLoggingScope, keyed 'SessionId' to match the WithContext('SessionId', ...) key used across actors so one Seq attribute correlates actor and chat-client logs. Verified: full solution build (0 warnings); Netclaw.Daemon.Tests 668/668; Netclaw.Actors.Tests 2186/2186; slopwatch clean; file headers present. * fix(providers): review fixes — SessionId on retry logs, drop AttemptIndex, streaming-failover tests Self-review of the stack redesign found three issues: 1. RetryingChatClient.BackoffAsync logged retry warnings without the SessionId scope, so the hottest failure path reached Seq uncorrelated while LoggingChatClient/RoutingChatClient logs carried it. Open the scope around the warning. 2. ChatRoutingContext.AttemptIndex was structurally unfulfillable: the per-role RoutingChatClient context is frozen at construction and the router is called once per call (not per candidate), so it could never be non-zero. Removed it; SessionId remains the genuine per-session-routing seam. 3. The streaming terminal-unreachable branch of RoutingChatClient (last candidate fails pre-first-chunk -> provider.unreachable + rethrow) and single-candidate / streaming-cancellation behavior were untested. Added three streaming tests. * refactor(providers): single retry layer — transport owns LLM transient retry Reconcile the two pre-first-chunk retry layers (review follow-up): the actor's per-turn streaming retry and the transport's RetryingChatClient overlapped on the main session loop. Make the transport the single owner. RetryPolicy.ShouldRetry now also recognizes the curated ProviderException (StatusCode 408/429/5xx, walking inner exceptions) that provider transports throw — previously only a raw HttpRequestException was retried, so the transport retry barely fired for real provider 429/5xx. That was the broader coverage the actor's LlmFailureClassifier.IsTransientStreaming provided. The transport RetryPolicy is now configured from Session:Tuning:StreamingRetryPolicy (resolves the prior TODO; preserves the knob), making RetryingChatClient the single, configurable, uniform retry layer for every caller (main loop + sidecars). Remove the actor per-turn retry: the LlmCallFailed pre-content retry block, the RetryLlmCallAfterBackoff message+handler, _streamingRetryAttempt, the streaming-retry timer, and the now-unused IsTransientStreaming. Context-overflow rollback and terminal FailCurrentTurn handling are unchanged. Re-home coverage: the actor-level Streaming_retry_recovers/_exhaustion integration tests are removed (the actor no longer retries); a ProviderException(502) recover case is added to RetryingChatClientTests at the transport layer. Verified: full build (0 warnings); Daemon 672/672; Actors 2184/2184; slopwatch + headers clean. * refactor(openai): drop StreamingOnlyChatClient shim — Netclaw is streaming-only The streaming-native chat-client stack converts every auxiliary caller (title generation, memory extraction, compaction) to the streaming transport, so no code path invokes the non-streaming GetResponseAsync on the Codex client anymore. The StreamingOnlyChatClient wrapper (added in 0e5ba60 to serve those calls via streaming under the hood) is now dead; the Codex 'Stream must be set to true' 400 is structurally unreachable. A comment on the OAuth path records why no special handling is needed.
|
superseded via #1309 |
What
Two independent fixes behind the OpenAI Codex memory-formation errors (
400 {"detail":"Stream must be set to true"}).1. Stream everywhere at the provider boundary (supersedes #1289)
The Codex backend is streaming-only — it rejects non-streaming Responses requests. Netclaw's interactive session loop streams, but auxiliary calls (title generation, memory distillation, compaction, curation) use the non-streaming
IChatClient.GetResponseAsyncpath, so they 400 against Codex while the main loop works fine.Rather than a per-provider capability model, make streaming the single request mode the daemon issues at the provider boundary: a new
StreamFirstChatClient(DelegatingChatClient) servesGetResponseAsyncby aggregating the streaming endpoint (GetStreamingResponseAsync(...).ToChatResponseAsync(...)), applied to every client inProviderPluginFactory.Create. Streaming is near-universal among supported providers, and any environment that breaks streaming would already break the main session loop (which streams too).Messages[^1].StreamFirstChatClientappends an empty assistant message in that case, matchingStreamingResponseReader's existing guard, so an empty completion degrades to empty text instead of an unguarded index throw.2. Session id on LLM logs in Seq
LLM client logs (
LoggingChatClient,FailoverChatClient) reached Seq with no session correlation: the id lives inSessionDiagnosticsContext(an AsyncLocal pushed by every session-owned path) but the OTLP exporter never projected it onto theLogRecord. A sharedSessionLoggingScopeopens asession.idlogging scope (read from the ambient context) around the LLM calls and the failover/outage log lines, riding the already-enabledIncludeScopesso it surfaces as a filterable Seq attribute — no telemetry-wiring change.Testing
StreamFirstChatClient: aggregation + streaming passthrough + empty-stream guard.LoggingChatClient/FailoverChatClient:session.idscope attached (and absent when no session is in scope).Configurationsuite: 119/119 passing; build clean (0 warnings); slopwatch + file headers clean.Notes
LoggingChatClient'sdelta/cumulativetoken counters are mutable state on a singleton, so those two values in theLLM call completeddebug-log line mix sessions. They feed nothing else — the real per-response usage that driveschat -p, OTel metrics,stats, the session catalog, and compaction all comes from the per-sessionLlmSessionActorand is unaffected. Worth a separate cleanup, not this PR.