fix(streaming): rebuild Anthropic client on stream cleanup instead of OpenAI client#28240
Open
EloquentBrush0x wants to merge 1 commit into
Open
Conversation
… OpenAI client interruptible_streaming_api_call() has three connection-pool cleanup sites that called _replace_primary_openai_client() unconditionally. For api_mode=anthropic_messages this has two consequences: 1. _replace_primary_openai_client() fails (OPENAI_API_KEY unset on Anthropic-only configs), so dead connections are never purged. 2. The stale-stream detector's outer-poll site (L1977) is the only mechanism that can interrupt the worker thread while it blocks in for event in stream:. Because the Anthropic client is never closed, the thread stays blocked until the 900 s httpx read-timeout fires, producing a visible 15-minute hang for Telegram/gateway users on claude-opus-4-7. Fix: mirror the existing interrupt-path pattern (L1989-1997) at all three cleanup sites — if api_mode == "anthropic_messages", call _anthropic_client.close() + _rebuild_anthropic_client() instead of _replace_primary_openai_client(). _rebuild_anthropic_client() handles both direct Anthropic and Bedrock-hosted Claude correctly, unlike the inline build_anthropic_client() calls in open PR NousResearch#14430. PR NousResearch#14430 (open) covers only the outer stale-detector site (L1977). PR NousResearch#23678 (open) covers only the inner retry sites (L1774, L1833). This PR covers all three sites and uses _rebuild_anthropic_client() for Bedrock parity. Fixes NousResearch#28161
Collaborator
This was referenced May 23, 2026
Open
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Root cause:
interruptible_streaming_api_call()has three connection-pool cleanup sites that called_replace_primary_openai_client()unconditionally — at the mid-tool retry path (~L1774), the transient-error retry path (~L1833), and the stale-stream outer-poll loop (~L1977).For
api_mode=anthropic_messagesthis has two consequences:_replace_primary_openai_client()silently fails (noOPENAI_API_KEYon Anthropic-only configs), so the dead connection pool is never purged before the next retry.for event in stream:. Because neither the Anthropic client is closed nor the stream's underlying transport dropped, the thread stays blocked until the 900 s httpx read-timeout fires — producing the 15-minute hang reported in gateways runningclaude-opus-4-7.Fix: Mirror the existing interrupt-path pattern at L1989–1997 (already correct) at all three cleanup sites. For
api_mode == "anthropic_messages", call_anthropic_client.close()+_rebuild_anthropic_client()instead of_replace_primary_openai_client()._rebuild_anthropic_client()handles both direct Anthropic and Bedrock-hosted Claude correctly.Open PR #14430 covers only the outer stale-detector site and does not use
_rebuild_anthropic_client()(Bedrock not handled). Open PR #23678 covers only the two inner retry sites, leaving the stale-stream hang unaddressed. This PR covers all three sites.Related Issue
Fixes #28161
Type of Change
Changes Made
agent/chat_completion_helpers.py: guard all three_replace_primary_openai_client()call sites withapi_mode != "anthropic_messages"; add_anthropic_client.close() + _rebuild_anthropic_client()branch for Anthropic mode (+18 lines)tests/run_agent/test_28161_anthropic_stream_pool_cleanup.py: two new tests — stream retry cleanup and stale-stream detector cleanup for Anthropic modeHow to Test
Checklist