Skip to content

feat(providers): add per-provider and per-model request_timeout_seconds config#12415

Closed
mvanhorn wants to merge 1 commit into
NousResearch:mainfrom
mvanhorn:feat/providers-request-timeout-seconds
Closed

feat(providers): add per-provider and per-model request_timeout_seconds config#12415
mvanhorn wants to merge 1 commit into
NousResearch:mainfrom
mvanhorn:feat/providers-request-timeout-seconds

Conversation

@mvanhorn

Copy link
Copy Markdown
Contributor

Summary

Adds request_timeout_seconds config at provider and per-model level, plumbed into the primary LLM client at run_agent.py:1036 via client_kwargs["timeout"]. Zero default behavior change - if neither key is set, the client is built exactly as today and the openai SDK default (600s) takes over.

Motivation

Hermes wraps the openai Python SDK to call every LLM provider. Today no timeout is passed when building the primary turn client, so every provider inherits the SDK default. That default is wrong in two directions:

  • Too short for local models. Ollama cold-start can exceed it; Kimi K2 thinking mode and Opus extended thinking hit the ceiling.
  • Too long for cloud latency control. A cloud provider hanging for 10 minutes before erroring is bad UX when a 30s fail-fast would let a retry succeed on a healthy replica.

Hermes already has the auxiliary.{task}.timeout pattern at agent/auxiliary_client.py:2240 _get_task_timeout for summarization/vision aux tasks. This PR mirrors that shape for the primary request path.

Checked for duplicates before opening. PRs matching provider timeout, request_timeout, http_timeout are either about Hindsight memory plugin (#9882, #9985, different concern) or provider concurrency semaphores (#7479, different concern). Cross-project signal from OpenClaw #43946 (17 reactions, open).

Example config

```yaml
providers:
ollama-local:
request_timeout_seconds: 300 # cold-start tolerance
anthropic:
request_timeout_seconds: 60
models:
claude-opus-4.7:
timeout_seconds: 600 # extended thinking
```

Resolution order: per-model timeout_seconds > provider request_timeout_seconds > SDK default.

Changes

File Change
hermes_cli/timeouts.py New module, get_provider_request_timeout(provider, model) reads config with per-model precedence
run_agent.py 4 lines: import + set client_kwargs[\"timeout\"] when resolved value is not None
cli-config.yaml.example Two documented stanzas showing per-provider and per-model usage
website/docs/user-guide/configuration.md 4 lines describing the knob
tests/hermes_cli/test_timeouts.py 4 tests: unset returns None, provider-level, per-model precedence, invalid values rejected

Net diff: +141/-1 across 5 files.

Evidence

request-timeout demo

Test plan

  • `pytest tests/hermes_cli/test_timeouts.py` passes (4 tests)
  • `ruff check` clean on new/changed files
  • `ruff format --check` clean
  • Default behavior unchanged when neither key set (verified in test_timeouts.py::test_no_config_returns_none)
  • Maintainer sanity check: confirm `client_kwargs["timeout"]` is the right hook point vs. building a separate transport

…ds config

Adds optional providers.<id>.request_timeout_seconds and
providers.<id>.models.<model>.timeout_seconds config, resolved via a new
hermes_cli/timeouts.py helper and applied where client_kwargs is built
in run_agent.py. Zero default behavior change: when both keys are unset,
the openai SDK default takes over.

Mirrors the existing _get_task_timeout pattern in agent/auxiliary_client.py
for auxiliary tasks - the primary turn path just never got the equivalent
knob.

Cross-project demand: openclaw/openclaw#43946 (17 reactions) asks for
exactly this config - specifically calls out Ollama cold-start hanging
the client.
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #12652 — your resolver + init-hook commit (3143d32) was cherry-picked onto current main with your authorship preserved in the rebase merge. Thank you for the contribution!

Follow-up commits on top of yours extended the wiring to every other client path (Anthropic native, fallback chain, switch_model, credential rotation, streaming + non-streaming per-call overrides) after live testing with a 0.5s timeout showed the original init-site hook was being overridden by hardcoded HERMES_API_TIMEOUT kwargs further down the stack.

Full details: #12652

@mvanhorn

Copy link
Copy Markdown
Contributor Author

Thanks @teknium1, really appreciate the salvage and the credit. Good catch on the 0.5s test exposing that the init-site hook was getting overridden downstream - I only wired it into the OpenAI-wire init path and missed the Anthropic native, fallback chain, and per-call override sites. Glad the resolver held up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants