Skip to content

feat(providers): add per-provider and per-model request_timeout_seconds config#12652

Merged
teknium1 merged 4 commits into
mainfrom
hermes/hermes-6f7b0cec
Apr 19, 2026
Merged

feat(providers): add per-provider and per-model request_timeout_seconds config#12652
teknium1 merged 4 commits into
mainfrom
hermes/hermes-6f7b0cec

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Users can set providers.<id>.request_timeout_seconds and providers.<id>.models.<model>.timeout_seconds in config.yaml to control LLM request timeouts per provider/model. Enables local-model cold-start tolerance (Ollama, K2/opus extended thinking) and fast-fail cloud retries. Zero behavior change when unset.

Salvage of #12415 — the original PR wired the knob only into the OpenAI-wire init path; live testing with timeout_seconds: 0.5 on claude-sonnet-4.6 showed the config was being silently overridden by hardcoded per-call timeout kwargs further down the stack. This PR extends the wiring to every client-construction and per-call-override site so the knob actually enforces.

Credit: @mvanhorn authored the original resolver + init hook (commit 1). Follow-up commits extend coverage.

Changes

  • hermes_cli/timeouts.py — resolver with per-model > provider > None precedence (@mvanhorn)
  • run_agent.py — wires resolved timeout into init (explicit + implicit auth), Anthropic native init + 3 credential-refresh/rebuild sites, switch_model, _restore_primary_runtime, _try_activate_fallback (OpenAI + Anthropic) with immediate client rebuild so the first fallback request carries the new timeout
  • run_agent.py — new _resolved_api_call_timeout() helper (config > HERMES_API_TIMEOUT env > 1800s default) wired into the non-streaming api_kwargs["timeout"] site and the streaming httpx.Timeout(connect, read, write, pool) so the per-provider config wins over the hardcoded env default
  • agent/anthropic_adapter.pybuild_anthropic_client() gains optional timeout= kwarg (keeps 900s default on unset/invalid); connect stays at 10s
  • cli-config.yaml.example + website/docs/user-guide/configuration.md — documented, including Bedrock-not-covered caveat
  • tests/hermes_cli/test_timeouts.py — 4 resolver tests (@mvanhorn) + test_anthropic_adapter_honors_timeout_kwarg + test_resolved_api_call_timeout_priority covering all 3 precedence cases
  • Widened 4 existing test mock signatures (build_anthropic_client fakes) to accept the new timeout= kwarg

Coverage

api_mode Covered How
chat_completions (OpenRouter, Copilot, Nous, Kimi, Ollama, Alibaba, Qwen, MiniMax OpenAI-compat, xAI chat, custom) Client-level + per-call override on streaming and non-streaming
anthropic_messages (native Anthropic, MiniMax Claude, OpenCode, Z.AI) Client-level (SDK honors on messages.create + messages.stream)
codex_responses (OpenAI Codex, GPT-5 on OpenRouter/Copilot, xAI Responses) Client-level (no competing per-call override)
bedrock_converse + AnthropicBedrock SDK Documented; boto3 has its own timeout config — follow-up if needed

Validation

Live E2E against real OpenRouter:

Test Before After
timeout_seconds: 60 + gpt-4o-mini + "reply PONG" Not actually enforced Responds PONG in 1.3s
timeout_seconds: 0.5 + claude-sonnet-4.6 thinking essay Silently ignored — 29-47s success APITimeoutError at ~3s/retry, exhausts in 15s
Unconfigured provider SDK default SDK default (unchanged)
Anthropic native (_anthropic_client.timeout.read) 900s hardcoded Per-config value, else 900s
Fallback chain (primary → fallback) No timeout carried to fallback Resolved timeout applied + client rebuilt immediately

Unit tests: scripts/run_tests.sh tests/hermes_cli/test_timeouts.py → 6/6 pass.

Full run: scripts/run_tests.sh tests/run_agent/ tests/hermes_cli/ → 3105 pass, same 6 pre-existing failures as origin/main (verified by stashing; no new regressions).

Closes #12415.

mvanhorn and others added 4 commits April 19, 2026 05:32
…ds config

Adds optional providers.<id>.request_timeout_seconds and
providers.<id>.models.<model>.timeout_seconds config, resolved via a new
hermes_cli/timeouts.py helper and applied where client_kwargs is built
in run_agent.py. Zero default behavior change: when both keys are unset,
the openai SDK default takes over.

Mirrors the existing _get_task_timeout pattern in agent/auxiliary_client.py
for auxiliary tasks - the primary turn path just never got the equivalent
knob.

Cross-project demand: openclaw/openclaw#43946 (17 reactions) asks for
exactly this config - specifically calls out Ollama cold-start hanging
the client.
Follow-up on top of mvanhorn's cherry-picked commit. Original PR only
wired request_timeout_seconds into the explicit-creds OpenAI branch at
run_agent.py init; router-based implicit auth, native Anthropic, and the
fallback chain were still hardcoded to SDK defaults.

- agent/anthropic_adapter.py: build_anthropic_client() accepts an optional
  timeout kwarg (default 900s preserved when unset/invalid).
- run_agent.py: resolve per-provider/per-model timeout once at init; apply
  to Anthropic native init + post-refresh rebuild + stale/interrupt
  rebuilds + switch_model + _restore_primary_runtime + the OpenAI
  implicit-auth path + _try_activate_fallback (with immediate client
  rebuild so the first fallback request carries the configured timeout).
- tests: cover anthropic adapter kwarg honoring; widen mock signatures
  to accept the new timeout kwarg.
- docs/example: clarify that the knob now applies to every transport,
  the fallback chain, and rebuilds after credential rotation.
…ry calls

Live test with timeout_seconds: 0.5 on claude-sonnet-4.6 proved the
initial wiring was insufficient: run_agent.py was overriding the
client-level timeout on every call via hardcoded per-request kwargs.

Root cause: run_agent.py had two sites that pass an explicit timeout=
kwarg into chat.completions.create() — api_kwargs['timeout'] at line
7075 (HERMES_API_TIMEOUT=1800s default) and the streaming path's
_httpx.Timeout(..., read=HERMES_STREAM_READ_TIMEOUT=120s, ...) at line
5760. Both override the per-provider config value the client was
constructed with, so a 0.5s config timeout would silently not enforce.

This commit:
- Adds AIAgent._resolved_api_call_timeout() — config > HERMES_API_TIMEOUT env > 1800s default.
- Uses it for the non-streaming api_kwargs['timeout'] field.
- Uses it for the streaming path's httpx.Timeout(connect, read, write, pool)
  so both connect and read respect the configured value when set.
  Local-provider auto-bump (Ollama/vLLM cold-start) only applies when
  no explicit config value is set.
- New test: test_resolved_api_call_timeout_priority covers all three
  precedence cases (config, env, default).

Live verified: 0.5s config on claude-sonnet-4.6 now triggers
APITimeoutError at ~3s per retry, exhausts 3 retries in ~15s total
(was: 29-47s success with timeout ignored). Positive case (60s config
+ gpt-4o-mini) still succeeds at 1.3s.
…econds

AWS Bedrock paths (bedrock_converse + AnthropicBedrock SDK) use boto3
with its own timeout config and are not wired to the per-provider knob.
Documented in cli-config.yaml.example and website configuration.md so
users don't expect it to take effect there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants