Skip to content

Stale-stream handler tries to rebuild OpenAI client when provider is Anthropic #31128

@deonkretch

Description

@deonkretch

Bug: stale-stream handler tries to rebuild OpenAI client when provider is Anthropic

Repo

NousResearch/hermes-agent

Summary

When the stream-stale detector fires during an Anthropic Messages API call,
the recovery path unconditionally calls
agent._replace_primary_openai_client(...). That helper requires
OPENAI_API_KEY, which is typically unset in Anthropic-only setups, so it
raises and logs a misleading WARNING that suggests the recovery itself
failed. In reality the Anthropic client is rebuilt by a separate code path
and the turn completes — but the noise pollutes errors.log and confuses
debugging.

Repro

  • model.provider = anthropic, model.default = claude-opus-4-7 (or any
    Anthropic model), OPENAI_API_KEY unset.
  • Send a tool-heavy turn that takes > HERMES_STREAM_STALE_TIMEOUT (180s
    default) to deliver its first stream chunk.
  • Stale-stream handler at agent/chat_completion_helpers.py:2033 fires.

Observed log

WARNING agent.chat_completion_helpers: Stream stale for 180s (threshold 180s)
  — no chunks received. model=claude-opus-4-7 context=~12,247 tokens.
  Killing connection.
WARNING run_agent: Failed to rebuild shared OpenAI client
  (stale_stream_pool_cleanup) thread=asyncio_1:6173863936
  provider=anthropic base_url=https://api.anthropic.com model=claude-opus-4-7
  error=The api_key client option must be set either by passing api_key to
  the client or by setting the OPENAI_API_KEY environment variable

The user-facing _emit_status banner ("⚠️ No response from provider for
180s … Reconnecting…") also reaches messaging platforms (Discord/Telegram),
which is correct behaviour but worth knowing.

Expected

On the stale-stream recovery path:

  • If agent.api_mode == "anthropic_messages", rebuild the Anthropic client
    (agent._anthropic_client.close() + agent._rebuild_anthropic_client())
    and skip the OpenAI pool rebuild.
  • Otherwise behave as today.

This mirrors the existing branching at line ~2066 (interrupt path) and at
line ~242 in the non-stream stale handler, which already do the right
thing.

Suggested patch

agent/chat_completion_helpers.py around line 2047–2056:

try:
    if agent.api_mode == "anthropic_messages":
        try:
            agent._anthropic_client.close()
        except Exception:
            pass
        agent._rebuild_anthropic_client()
    else:
        _close_request_client_once("stale_stream_kill")
        try:
            agent._replace_primary_openai_client(
                reason="stale_stream_pool_cleanup"
            )
        except Exception:
            pass
except Exception:
    pass

Severity

Low (cosmetic / log noise). Recovery still works because the Anthropic
client gets rebuilt on the next outer-retry iteration via the existing
error-handling path.

Environment

  • hermes-agent commit: cc93053
  • macOS 26.4.1, Python 3.11.15
  • provider=anthropic, model=claude-opus-4-7

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt builderduplicateThis issue or pull request already existsprovider/anthropicAnthropic native Messages APItype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions