fix(streaming): handle Anthropic client in stale stream detector#14430
fix(streaming): handle Anthropic client in stale stream detector#14430xssssrf wants to merge 1 commit into
Conversation
The streaming stale detector in _run_streaming_api_call() only handled the OpenAI/chat-completions path when killing a stale connection. When api_mode is 'anthropic_messages', request_client_holder['client'] is always None (the Anthropic SDK uses its own internal transport), so the existing code was a no-op: the stale Anthropic stream was never actually closed, and no fresh connection was ever established. This meant that after the stale detector fired, the inner _call_anthropic() thread would keep waiting on the same dead stream indefinitely. Each subsequent stale detector tick would emit another 'Reconnecting...' status message but do nothing to reset the connection, causing the agent to appear stuck for the full duration of the stale stream's lifetime. The fix mirrors the existing pattern already used in two other locations in the same file (the non-streaming stale detector at ~line 5235 and the interrupt handler at ~line 6199): branch on api_mode and call self._anthropic_client.close() + rebuild via build_anthropic_client() for the Anthropic path, while keeping the existing OpenAI path unchanged. The _replace_primary_openai_client() call is also guarded to only run on the non-Anthropic path, since it operates on the OpenAI client pool and is irrelevant when using the Anthropic SDK.
|
Thanks for tracking this down. I hit the same failure mode on a Linux gateway using The root cause here matches what I saw. One suggestion before merge: it would be worth adding regression coverage for the Validation from that variant:
Also minor implementation note: current |
|
Same bug in prod with One nit before merge: re-implementing the Anthropic rebuild inline skips two things try:
if self.api_mode == "anthropic_messages":
try:
self._anthropic_client.close()
except Exception:
pass
self._rebuild_anthropic_client()
else:
rc = request_client_holder.get("client")
if rc is not None:
self._close_request_openai_client(rc, reason="stale_stream_kill")
except Exception:
passI have a 4-case dispatch test (addresses @im-sham's coverage note) — can attach if useful. |
Summary
The streaming stale detector in
_run_streaming_api_call()does not handle theanthropic_messagesapi_mode. When the stale timeout fires, the code attempts to closerequest_client_holder["client"]— but for Anthropic, this holder is alwaysNone(the Anthropic SDK manages its own transport internally). As a result, the stale connection is never actually closed and no fresh connection is ever established.Root Cause
Three connection-reset sites exist in the streaming path:
The streaming stale detector was copied from the OpenAI path without being extended to cover Anthropic.
Symptom
When the agent is running on the
anthropic_messagesapi_mode and a stream stalls (e.g. due to an SSE keep-alive with no real chunks), the stale detector fires and emits"⚠️ No response from provider... Reconnecting..."— but the underlying Anthropic client is never closed. The inner_call_anthropic()thread keeps waiting on the same dead stream. Each subsequent detector tick emits another status message while doing nothing, leaving the agent stuck until the stream eventually errors out on its own.Fix
Mirror the pattern already used by the non-streaming stale detector (~line 5235) and the interrupt handler (~line 6199): branch on
self.api_modeand callself._anthropic_client.close()+ rebuild viabuild_anthropic_client()for the Anthropic path, keeping the existing OpenAI path unchanged. Guard_replace_primary_openai_client()to the non-Anthropic path only, since it operates on the OpenAI connection pool.Testing
python -m py_compile run_agent.pypassespytest tests/agent/ -q— 1735 passed, 1 skipped. The 1 pre-existing failure (test_minimax_provider.py::TestMinimaxSwitchModelCredentialGuard) reproduces identically on the unmodified upstreamrun_agent.py, confirming it is unrelated to this change.anthropic_messagesapi_mode via direct Anthropic API key