Bug: stale-stream handler tries to rebuild OpenAI client when provider is Anthropic
Repo
NousResearch/hermes-agent
Summary
When the stream-stale detector fires during an Anthropic Messages API call,
the recovery path unconditionally calls
agent._replace_primary_openai_client(...). That helper requires
OPENAI_API_KEY, which is typically unset in Anthropic-only setups, so it
raises and logs a misleading WARNING that suggests the recovery itself
failed. In reality the Anthropic client is rebuilt by a separate code path
and the turn completes — but the noise pollutes errors.log and confuses
debugging.
Repro
model.provider = anthropic, model.default = claude-opus-4-7 (or any
Anthropic model), OPENAI_API_KEY unset.
- Send a tool-heavy turn that takes >
HERMES_STREAM_STALE_TIMEOUT (180s
default) to deliver its first stream chunk.
- Stale-stream handler at
agent/chat_completion_helpers.py:2033 fires.
Observed log
WARNING agent.chat_completion_helpers: Stream stale for 180s (threshold 180s)
— no chunks received. model=claude-opus-4-7 context=~12,247 tokens.
Killing connection.
WARNING run_agent: Failed to rebuild shared OpenAI client
(stale_stream_pool_cleanup) thread=asyncio_1:6173863936
provider=anthropic base_url=https://api.anthropic.com model=claude-opus-4-7
error=The api_key client option must be set either by passing api_key to
the client or by setting the OPENAI_API_KEY environment variable
The user-facing _emit_status banner ("⚠️ No response from provider for
180s … Reconnecting…") also reaches messaging platforms (Discord/Telegram),
which is correct behaviour but worth knowing.
Expected
On the stale-stream recovery path:
- If
agent.api_mode == "anthropic_messages", rebuild the Anthropic client
(agent._anthropic_client.close() + agent._rebuild_anthropic_client())
and skip the OpenAI pool rebuild.
- Otherwise behave as today.
This mirrors the existing branching at line ~2066 (interrupt path) and at
line ~242 in the non-stream stale handler, which already do the right
thing.
Suggested patch
agent/chat_completion_helpers.py around line 2047–2056:
try:
if agent.api_mode == "anthropic_messages":
try:
agent._anthropic_client.close()
except Exception:
pass
agent._rebuild_anthropic_client()
else:
_close_request_client_once("stale_stream_kill")
try:
agent._replace_primary_openai_client(
reason="stale_stream_pool_cleanup"
)
except Exception:
pass
except Exception:
pass
Severity
Low (cosmetic / log noise). Recovery still works because the Anthropic
client gets rebuilt on the next outer-retry iteration via the existing
error-handling path.
Environment
- hermes-agent commit: cc93053
- macOS 26.4.1, Python 3.11.15
- provider=anthropic, model=claude-opus-4-7
Bug: stale-stream handler tries to rebuild OpenAI client when provider is Anthropic
Repo
NousResearch/hermes-agent
Summary
When the stream-stale detector fires during an Anthropic Messages API call,
the recovery path unconditionally calls
agent._replace_primary_openai_client(...). That helper requiresOPENAI_API_KEY, which is typically unset in Anthropic-only setups, so itraises and logs a misleading WARNING that suggests the recovery itself
failed. In reality the Anthropic client is rebuilt by a separate code path
and the turn completes — but the noise pollutes
errors.logand confusesdebugging.
Repro
model.provider = anthropic,model.default = claude-opus-4-7(or anyAnthropic model),
OPENAI_API_KEYunset.HERMES_STREAM_STALE_TIMEOUT(180sdefault) to deliver its first stream chunk.
agent/chat_completion_helpers.py:2033fires.Observed log
The user-facing⚠️ No response from provider for
_emit_statusbanner ("180s … Reconnecting…") also reaches messaging platforms (Discord/Telegram),
which is correct behaviour but worth knowing.
Expected
On the stale-stream recovery path:
agent.api_mode == "anthropic_messages", rebuild the Anthropic client(
agent._anthropic_client.close()+agent._rebuild_anthropic_client())and skip the OpenAI pool rebuild.
This mirrors the existing branching at line ~2066 (interrupt path) and at
line ~242 in the non-stream stale handler, which already do the right
thing.
Suggested patch
agent/chat_completion_helpers.pyaround line 2047–2056:Severity
Low (cosmetic / log noise). Recovery still works because the Anthropic
client gets rebuilt on the next outer-retry iteration via the existing
error-handling path.
Environment