feat(providers): add per-provider and per-model request_timeout_seconds config#12652
Merged
Conversation
…ds config Adds optional providers.<id>.request_timeout_seconds and providers.<id>.models.<model>.timeout_seconds config, resolved via a new hermes_cli/timeouts.py helper and applied where client_kwargs is built in run_agent.py. Zero default behavior change: when both keys are unset, the openai SDK default takes over. Mirrors the existing _get_task_timeout pattern in agent/auxiliary_client.py for auxiliary tasks - the primary turn path just never got the equivalent knob. Cross-project demand: openclaw/openclaw#43946 (17 reactions) asks for exactly this config - specifically calls out Ollama cold-start hanging the client.
Follow-up on top of mvanhorn's cherry-picked commit. Original PR only wired request_timeout_seconds into the explicit-creds OpenAI branch at run_agent.py init; router-based implicit auth, native Anthropic, and the fallback chain were still hardcoded to SDK defaults. - agent/anthropic_adapter.py: build_anthropic_client() accepts an optional timeout kwarg (default 900s preserved when unset/invalid). - run_agent.py: resolve per-provider/per-model timeout once at init; apply to Anthropic native init + post-refresh rebuild + stale/interrupt rebuilds + switch_model + _restore_primary_runtime + the OpenAI implicit-auth path + _try_activate_fallback (with immediate client rebuild so the first fallback request carries the configured timeout). - tests: cover anthropic adapter kwarg honoring; widen mock signatures to accept the new timeout kwarg. - docs/example: clarify that the knob now applies to every transport, the fallback chain, and rebuilds after credential rotation.
…ry calls Live test with timeout_seconds: 0.5 on claude-sonnet-4.6 proved the initial wiring was insufficient: run_agent.py was overriding the client-level timeout on every call via hardcoded per-request kwargs. Root cause: run_agent.py had two sites that pass an explicit timeout= kwarg into chat.completions.create() — api_kwargs['timeout'] at line 7075 (HERMES_API_TIMEOUT=1800s default) and the streaming path's _httpx.Timeout(..., read=HERMES_STREAM_READ_TIMEOUT=120s, ...) at line 5760. Both override the per-provider config value the client was constructed with, so a 0.5s config timeout would silently not enforce. This commit: - Adds AIAgent._resolved_api_call_timeout() — config > HERMES_API_TIMEOUT env > 1800s default. - Uses it for the non-streaming api_kwargs['timeout'] field. - Uses it for the streaming path's httpx.Timeout(connect, read, write, pool) so both connect and read respect the configured value when set. Local-provider auto-bump (Ollama/vLLM cold-start) only applies when no explicit config value is set. - New test: test_resolved_api_call_timeout_priority covers all three precedence cases (config, env, default). Live verified: 0.5s config on claude-sonnet-4.6 now triggers APITimeoutError at ~3s per retry, exhausts 3 retries in ~15s total (was: 29-47s success with timeout ignored). Positive case (60s config + gpt-4o-mini) still succeeds at 1.3s.
…econds AWS Bedrock paths (bedrock_converse + AnthropicBedrock SDK) use boto3 with its own timeout config and are not wired to the per-provider knob. Documented in cli-config.yaml.example and website configuration.md so users don't expect it to take effect there.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Users can set
providers.<id>.request_timeout_secondsandproviders.<id>.models.<model>.timeout_secondsin config.yaml to control LLM request timeouts per provider/model. Enables local-model cold-start tolerance (Ollama, K2/opus extended thinking) and fast-fail cloud retries. Zero behavior change when unset.Salvage of #12415 — the original PR wired the knob only into the OpenAI-wire init path; live testing with
timeout_seconds: 0.5on claude-sonnet-4.6 showed the config was being silently overridden by hardcoded per-call timeout kwargs further down the stack. This PR extends the wiring to every client-construction and per-call-override site so the knob actually enforces.Credit: @mvanhorn authored the original resolver + init hook (commit 1). Follow-up commits extend coverage.
Changes
hermes_cli/timeouts.py— resolver with per-model > provider > None precedence (@mvanhorn)run_agent.py— wires resolved timeout into init (explicit + implicit auth), Anthropic native init + 3 credential-refresh/rebuild sites,switch_model,_restore_primary_runtime,_try_activate_fallback(OpenAI + Anthropic) with immediate client rebuild so the first fallback request carries the new timeoutrun_agent.py— new_resolved_api_call_timeout()helper (config >HERMES_API_TIMEOUTenv > 1800s default) wired into the non-streamingapi_kwargs["timeout"]site and the streaminghttpx.Timeout(connect, read, write, pool)so the per-provider config wins over the hardcoded env defaultagent/anthropic_adapter.py—build_anthropic_client()gains optionaltimeout=kwarg (keeps 900s default on unset/invalid); connect stays at 10scli-config.yaml.example+website/docs/user-guide/configuration.md— documented, including Bedrock-not-covered caveattests/hermes_cli/test_timeouts.py— 4 resolver tests (@mvanhorn) +test_anthropic_adapter_honors_timeout_kwarg+test_resolved_api_call_timeout_prioritycovering all 3 precedence casesbuild_anthropic_clientfakes) to accept the newtimeout=kwargCoverage
chat_completions(OpenRouter, Copilot, Nous, Kimi, Ollama, Alibaba, Qwen, MiniMax OpenAI-compat, xAI chat, custom)anthropic_messages(native Anthropic, MiniMax Claude, OpenCode, Z.AI)messages.create+messages.stream)codex_responses(OpenAI Codex, GPT-5 on OpenRouter/Copilot, xAI Responses)bedrock_converse+ AnthropicBedrock SDKValidation
Live E2E against real OpenRouter:
timeout_seconds: 60+ gpt-4o-mini + "reply PONG"timeout_seconds: 0.5+ claude-sonnet-4.6 thinking essayAPITimeoutErrorat ~3s/retry, exhausts in 15s_anthropic_client.timeout.read)Unit tests:
scripts/run_tests.sh tests/hermes_cli/test_timeouts.py→ 6/6 pass.Full run:
scripts/run_tests.sh tests/run_agent/ tests/hermes_cli/→ 3105 pass, same 6 pre-existing failures asorigin/main(verified by stashing; no new regressions).Closes #12415.