Bug: Stream retry path loses api_key, all retries fail
Environment
- Hermes Agent (latest, self-hosted)
- Observed in
agent.log ~13 times during normal operation
- Provider: any (observed with minimax; likely affects all providers using the shared OpenAI client)
Description
When a provider stream drops or times out during a long generation, the retry path attempts to rebuild the shared OpenAI client (function: stream_retry_pool_cleanup). The rebuilt client has no api_key, so the retry immediately fails:
stream_diag: Stream drop on attempt 2/3 — retrying. provider=minimax
error=The read operation timed out elapsed=907s
run_agent: Failed to rebuild shared OpenAI client (stream_retry_pool_cleanup)
error=The api_key client option must be set ...
The entire run then returns empty or the connection drops — the generation is lost entirely.
Root cause
The stream_retry_pool_cleanup function rebuilds the OpenAI client but does not pass the api_key from the original client configuration. This appears to be an oversight in the retry path — the initial client construction correctly includes the key, but the rebuild path omits it.
Fix
Pass the api_key when rebuilding the OpenAI client in the stream_retry_pool_cleanup path. The key should be sourced from the same provider config that was used to create the original client.
Impact
High — any long generation that triggers a stream timeout will lose the entire response instead of being retried. The retry mechanism is supposed to recover from transient failures but the missing api_key guarantees failure every time.
Bug: Stream retry path loses
api_key, all retries failEnvironment
agent.log~13 times during normal operationDescription
When a provider stream drops or times out during a long generation, the retry path attempts to rebuild the shared OpenAI client (function:
stream_retry_pool_cleanup). The rebuilt client has noapi_key, so the retry immediately fails:The entire run then returns empty or the connection drops — the generation is lost entirely.
Root cause
The
stream_retry_pool_cleanupfunction rebuilds the OpenAI client but does not pass theapi_keyfrom the original client configuration. This appears to be an oversight in the retry path — the initial client construction correctly includes the key, but the rebuild path omits it.Fix
Pass the
api_keywhen rebuilding the OpenAI client in thestream_retry_pool_cleanuppath. The key should be sourced from the same provider config that was used to create the original client.Impact
High — any long generation that triggers a stream timeout will lose the entire response instead of being retried. The retry mechanism is supposed to recover from transient failures but the missing
api_keyguarantees failure every time.