fix(openai): serve non-streaming Codex calls via streaming (stop-gap) by CumpsD · Pull Request #1289 · netclaw-dev/netclaw

CumpsD · 2026-06-01T21:08:15Z

What

Stop-gap fix so the OpenAI Codex backend works with auxiliary (non-streaming) LLM calls.

Problem

The Codex backend rejects non-streaming Responses requests:

400 {"detail":"Stream must be set to true"}

Netclaw's session loop streams, but auxiliary calls — title generation, memory extraction, compaction — use the non-streaming IChatClient.GetResponseAsync path, so they fail against Codex.

Fix

Wrap the Codex ResponsesClient in a new StreamingOnlyChatClient (a DelegatingChatClient) that serves GetResponseAsync by consuming the streaming endpoint and aggregating the updates (GetStreamingResponseAsync(...).ToChatResponseAsync(...)). Streaming calls pass straight through.

Scope / status

This is a stop-gap while the broader streaming/non-streaming provider abstraction is being refactored. Once the structural fix lands, this wrapper should be removed in favor of it.

Self-contained: the wrapper lives in OpenAiProviderPlugin.cs and uses only Microsoft.Extensions.AI types.
Netclaw.Providers builds clean (0 errors, 0 warnings).

🤖 Generated with Claude Code

…hood The OpenAI Codex backend rejects non-streaming Responses requests with 400 {"detail":"Stream must be set to true"}. Netclaw's session loop streams, but auxiliary calls (title generation, memory extraction, compaction) use the non-streaming GetResponseAsync path and therefore fail against Codex. Wrap the Codex ResponsesClient in a StreamingOnlyChatClient (a DelegatingChatClient) that serves GetResponseAsync by consuming the streaming endpoint and aggregating the updates. Streaming calls pass straight through. Stop-gap: this keeps the Codex backend working while the broader streaming/non-streaming provider abstraction is being refactored. Once that lands, the wrapper should be removed in favor of the structural fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Aaronontheweb · 2026-06-03T08:28:24Z

@CumpsD working on a bigger re-architecture fix for this but going to merge this for now to unblock codex users.

…outing (#1313) * feat(providers): streaming-native chat-client stack with composable routing Redesign the IChatClient stack so the streaming-vs-non-streaming transport distinction no longer leaks into callers, and resilience/observability are no longer entangled with transport. Supersedes #1289 and #1309. Transport: Netclaw now issues only streaming requests. RetryingChatClient retries the streaming path (pre-first-chunk; a mid-stream failure propagates so emitted output is never duplicated), reusing RetryPolicy. The auxiliary callers (title generation, memory distillation, compaction observation, memory curation, memory extraction) aggregate the stream via the existing StreamingResponseReader (empty-Messages guard + diagnostics) and read response.Text, instead of the non-streaming GetResponseAsync path that the OpenAI Codex backend rejects (400 'Stream must be set to true') and that drops reasoning content on some providers. Composition: per-(provider,model) pipelines are composed with Microsoft.Extensions.AI ChatClientBuilder (Logging -> Retry -> VendorOptions -> raw) in PipelineChatClientFactory; the .Use ordering is asserted by test. Routing: a router selects which composed pipeline to invoke per call. RoutingChatClient walks an ordered candidate list, so failover is 'more than one candidate' and single-provider alerting is the terminal failure of a one-entry walk -- absorbing and deleting FailoverChatClient, AlertingChatClientDecorator, NetclawChatClientProvider and ResilientChatClientProviderDecorator. The router takes a ChatRoutingContext so future per-session/per-provider routing slots in as a new policy; today's RoleBasedFailoverRouter reproduces the prior wiring. Logging: LoggingChatClient is now stateless (the per-session-intended delta/cumulative token counters on the process singleton are gone) and tags log lines with the ambient session id via SessionLoggingScope, keyed 'SessionId' to match the WithContext('SessionId', ...) key used across actors so one Seq attribute correlates actor and chat-client logs. Verified: full solution build (0 warnings); Netclaw.Daemon.Tests 668/668; Netclaw.Actors.Tests 2186/2186; slopwatch clean; file headers present. * fix(providers): review fixes — SessionId on retry logs, drop AttemptIndex, streaming-failover tests Self-review of the stack redesign found three issues: 1. RetryingChatClient.BackoffAsync logged retry warnings without the SessionId scope, so the hottest failure path reached Seq uncorrelated while LoggingChatClient/RoutingChatClient logs carried it. Open the scope around the warning. 2. ChatRoutingContext.AttemptIndex was structurally unfulfillable: the per-role RoutingChatClient context is frozen at construction and the router is called once per call (not per candidate), so it could never be non-zero. Removed it; SessionId remains the genuine per-session-routing seam. 3. The streaming terminal-unreachable branch of RoutingChatClient (last candidate fails pre-first-chunk -> provider.unreachable + rethrow) and single-candidate / streaming-cancellation behavior were untested. Added three streaming tests. * refactor(providers): single retry layer — transport owns LLM transient retry Reconcile the two pre-first-chunk retry layers (review follow-up): the actor's per-turn streaming retry and the transport's RetryingChatClient overlapped on the main session loop. Make the transport the single owner. RetryPolicy.ShouldRetry now also recognizes the curated ProviderException (StatusCode 408/429/5xx, walking inner exceptions) that provider transports throw — previously only a raw HttpRequestException was retried, so the transport retry barely fired for real provider 429/5xx. That was the broader coverage the actor's LlmFailureClassifier.IsTransientStreaming provided. The transport RetryPolicy is now configured from Session:Tuning:StreamingRetryPolicy (resolves the prior TODO; preserves the knob), making RetryingChatClient the single, configurable, uniform retry layer for every caller (main loop + sidecars). Remove the actor per-turn retry: the LlmCallFailed pre-content retry block, the RetryLlmCallAfterBackoff message+handler, _streamingRetryAttempt, the streaming-retry timer, and the now-unused IsTransientStreaming. Context-overflow rollback and terminal FailCurrentTurn handling are unchanged. Re-home coverage: the actor-level Streaming_retry_recovers/_exhaustion integration tests are removed (the actor no longer retries); a ProviderException(502) recover case is added to RetryingChatClientTests at the transport layer. Verified: full build (0 warnings); Daemon 672/672; Actors 2184/2184; slopwatch + headers clean. * refactor(openai): drop StreamingOnlyChatClient shim — Netclaw is streaming-only The streaming-native chat-client stack converts every auxiliary caller (title generation, memory extraction, compaction) to the streaming transport, so no code path invokes the non-streaming GetResponseAsync on the Codex client anymore. The StreamingOnlyChatClient wrapper (added in 0e5ba60 to serve those calls via streaming under the hood) is now dead; the Codex 'Stream must be set to true' 400 is structurally unreachable. A comment on the OAuth path records why no special handling is needed.

Aaronontheweb mentioned this pull request Jun 3, 2026

fix: stream everywhere for auxiliary LLM calls + session id on LLM logs #1309

Closed

Merge branch 'dev' into fix/openai-codex-streaming-only

aaa7c8a

Aaronontheweb merged commit 0e5ba60 into netclaw-dev:dev Jun 3, 2026

Aaronontheweb mentioned this pull request Jun 3, 2026

feat(providers): streaming-native chat-client stack with composable routing #1313

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): serve non-streaming Codex calls via streaming (stop-gap)#1289

fix(openai): serve non-streaming Codex calls via streaming (stop-gap)#1289
Aaronontheweb merged 2 commits into
netclaw-dev:devfrom
CumpsD:fix/openai-codex-streaming-only

CumpsD commented Jun 1, 2026

Uh oh!

Aaronontheweb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CumpsD commented Jun 1, 2026

What

Problem

Fix

Scope / status

Uh oh!

Aaronontheweb commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants