Skip to content

fix(agent): disable SDK retries on per-request OpenAI clients#19642

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-8c54fd4a
May 4, 2026
Merged

fix(agent): disable SDK retries on per-request OpenAI clients#19642
teknium1 merged 1 commit into
mainfrom
hermes/hermes-8c54fd4a

Conversation

@teknium1

@teknium1 teknium1 commented May 4, 2026

Copy link
Copy Markdown
Contributor

Salvage of #15811 by @QifengKuang — core improvement only.

Summary

Per-request OpenAI-wire clients (used by both non-streaming and streaming chat-completions paths in _interruptible_api_call) should NOT run the SDK's built-in retry loop. The agent's outer loop owns retries with credential rotation, provider fallback, and backoff that the SDK can't see. Leaving SDK retries on (default 2) compounds with our outer retries and lets a single hung provider request stretch to ~3x the per-call timeout before our stale detector reports it.

Shared/primary clients and Anthropic / Bedrock paths are unaffected (they don't go through this code path).

Changes

  • run_agent.py: set request_kwargs["max_retries"] = 0 in _create_request_openai_client (+11/-0)

Note

The contributor's original PR also included a timeout= push-down through the chat.completions.create call; that part required scaffolding that has since been refactored on main, so only the max_retries=0 change is preserved here.

Original PR: #15811

Per-request OpenAI-wire clients (used by both non-streaming and
streaming chat-completions paths in _interruptible_api_call) should
not run the SDK's built-in retry loop: the agent's outer loop owns
retries with credential rotation, provider fallback, and backoff that
the SDK can't see.

Leaving SDK retries on (default 2) compounds with our outer retries
and lets a single hung provider request stretch to ~3x the per-call
timeout before our stale detector reports it.

Shared/primary clients and Anthropic / Bedrock paths are unaffected
(they don't go through here).

Salvage of #15811 core improvement — the timeout push-down in the
original PR required scaffolding that has since been refactored on
main, so only the max_retries=0 change is preserved.

Co-authored-by: QifengKuang <k2767567815@gmail.com>
@teknium1 teknium1 merged commit 52c539d into main May 4, 2026
7 of 10 checks passed
@teknium1 teknium1 deleted the hermes/hermes-8c54fd4a branch May 4, 2026 09:43
@alt-glitch alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API P2 Medium — degraded but workaround exists labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists provider/openai OpenAI / Codex Responses API type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants