fix(agent): disable SDK retries on per-request OpenAI clients#19642
Merged
Conversation
Per-request OpenAI-wire clients (used by both non-streaming and streaming chat-completions paths in _interruptible_api_call) should not run the SDK's built-in retry loop: the agent's outer loop owns retries with credential rotation, provider fallback, and backoff that the SDK can't see. Leaving SDK retries on (default 2) compounds with our outer retries and lets a single hung provider request stretch to ~3x the per-call timeout before our stale detector reports it. Shared/primary clients and Anthropic / Bedrock paths are unaffected (they don't go through here). Salvage of #15811 core improvement — the timeout push-down in the original PR required scaffolding that has since been refactored on main, so only the max_retries=0 change is preserved. Co-authored-by: QifengKuang <k2767567815@gmail.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Salvage of #15811 by @QifengKuang — core improvement only.
Summary
Per-request OpenAI-wire clients (used by both non-streaming and streaming chat-completions paths in
_interruptible_api_call) should NOT run the SDK's built-in retry loop. The agent's outer loop owns retries with credential rotation, provider fallback, and backoff that the SDK can't see. Leaving SDK retries on (default 2) compounds with our outer retries and lets a single hung provider request stretch to ~3x the per-call timeout before our stale detector reports it.Shared/primary clients and Anthropic / Bedrock paths are unaffected (they don't go through this code path).
Changes
request_kwargs["max_retries"] = 0in_create_request_openai_client(+11/-0)Note
The contributor's original PR also included a
timeout=push-down through the chat.completions.create call; that part required scaffolding that has since been refactored on main, so only the max_retries=0 change is preserved here.Original PR: #15811