fix(agent): disable SDK retries on per-request OpenAI clients by teknium1 · Pull Request #19642 · NousResearch/hermes-agent

teknium1 · 2026-05-04T09:43:11Z

Salvage of #15811 by @QifengKuang — core improvement only.

Summary

Per-request OpenAI-wire clients (used by both non-streaming and streaming chat-completions paths in _interruptible_api_call) should NOT run the SDK's built-in retry loop. The agent's outer loop owns retries with credential rotation, provider fallback, and backoff that the SDK can't see. Leaving SDK retries on (default 2) compounds with our outer retries and lets a single hung provider request stretch to ~3x the per-call timeout before our stale detector reports it.

Shared/primary clients and Anthropic / Bedrock paths are unaffected (they don't go through this code path).

Changes

run_agent.py: set request_kwargs["max_retries"] = 0 in _create_request_openai_client (+11/-0)

Note

The contributor's original PR also included a timeout= push-down through the chat.completions.create call; that part required scaffolding that has since been refactored on main, so only the max_retries=0 change is preserved here.

Original PR: #15811

Per-request OpenAI-wire clients (used by both non-streaming and streaming chat-completions paths in _interruptible_api_call) should not run the SDK's built-in retry loop: the agent's outer loop owns retries with credential rotation, provider fallback, and backoff that the SDK can't see. Leaving SDK retries on (default 2) compounds with our outer retries and lets a single hung provider request stretch to ~3x the per-call timeout before our stale detector reports it. Shared/primary clients and Anthropic / Bedrock paths are unaffected (they don't go through here). Salvage of #15811 core improvement — the timeout push-down in the original PR required scaffolding that has since been refactored on main, so only the max_retries=0 change is preserved. Co-authored-by: QifengKuang <k2767567815@gmail.com>

teknium1 merged commit 52c539d into main May 4, 2026
7 of 10 checks passed

teknium1 deleted the hermes/hermes-8c54fd4a branch May 4, 2026 09:43

teknium1 mentioned this pull request May 4, 2026

fix(agent): cap non-stream stale timeout via SDK + disable SDK retries #15811

Closed

6 tasks

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder provider/openai OpenAI / Codex Responses API P2 Medium — degraded but workaround exists labels May 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): disable SDK retries on per-request OpenAI clients#19642

fix(agent): disable SDK retries on per-request OpenAI clients#19642
teknium1 merged 1 commit into
mainfrom
hermes/hermes-8c54fd4a

teknium1 commented May 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 4, 2026

Summary

Changes

Note

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants