Skip to content

feat(agent): make primary-provider API retry count configurable#12013

Closed
alexzhu0 wants to merge 1 commit into
NousResearch:mainfrom
alexzhu0:feat/configurable-api-max-retries
Closed

feat(agent): make primary-provider API retry count configurable#12013
alexzhu0 wants to merge 1 commit into
NousResearch:mainfrom
alexzhu0:feat/configurable-api-max-retries

Conversation

@alexzhu0

Copy link
Copy Markdown
Contributor

What & why

The per-call API retry loop in `run_agent.py` uses a hardcoded `max_retries = 3`. Users with a configured `fallback_model` who would rather fail over sooner after an unresponsive primary had no way to shorten the wait — three attempts with exponential backoff stretch to 15+ minutes on a flapping upstream before the fallback kicks in.

Issue #11616 logs exactly this scenario on Qwen via OpenRouter:

```
[20:12] ⚠️ No response from provider for 180s (model: qwen/qwen3-coder-480b-…). Reconnecting…
[20:13] ⏳ Retrying in 2.0s (attempt 1/3)…
[20:17] ⏳ Retrying in 4.5s (attempt 2/3)…
[20:20] ⚠️ No response from provider for 180s. Reconnecting… ← full retry budget burned
```

Change

Read the retry budget from `HERMES_API_MAX_RETRIES` (default `3`, clamped to non-negative, falls back to `3` on malformed values). `0` disables retries entirely, so one failed call routes directly to the fallback provider.

```python

before

max_retries = 3

after

try:
max_retries = max(0, int(os.getenv("HERMES_API_MAX_RETRIES", "3")))
except (TypeError, ValueError):
max_retries = 3
```

The env-var-based knob matches the existing `HERMES_API_TIMEOUT` / `HERMES_API_CALL_STALE_TIMEOUT` pattern in the same file and avoids opening a `config.yaml` schema discussion for a single integer. A nested `agent.max_api_retries` config knob (as the issue reporter proposed in ex-1) is a reasonable follow-up if self-hosters ask for it.

Also document the new variable alongside `HERMES_API_TIMEOUT` in `website/docs/reference/environment-variables.md`.

How to test

```bash

Default behaviour unchanged

unset HERMES_API_MAX_RETRIES
python -c "import os; print(int(os.getenv('HERMES_API_MAX_RETRIES', '3')))" # 3

Lower for fast failover

export HERMES_API_MAX_RETRIES=1
python -c "import os; print(int(os.getenv('HERMES_API_MAX_RETRIES', '3')))" # 1

Zero disables retries entirely

export HERMES_API_MAX_RETRIES=0
python -c "import os; print(int(os.getenv('HERMES_API_MAX_RETRIES', '3')))" # 0

Malformed values fall back to 3

export HERMES_API_MAX_RETRIES=not-a-number
```

No existing tests pin the `max_retries = 3` literal; the default preserves today's behaviour byte-for-byte.

Platforms tested

  • macOS (Darwin 25.3.0), Python 3.11.13. Change is platform-agnostic.

Related

Closes #11616.

The per-call API retry loop used a hardcoded ``max_retries = 3``.
Users with a configured ``fallback_model`` who would rather fail over
sooner after an unresponsive primary had no way to shorten the wait —
three attempts with backoff can stretch to 15+ minutes on a flapping
upstream before the fallback kicks in (see issue #11616).

Read the retry budget from ``HERMES_API_MAX_RETRIES`` (env var, default
3, clamped to non-negative). ``0`` disables retries entirely, so one
failed call routes directly to the fallback provider.

The env-var-based knob matches the existing ``HERMES_API_TIMEOUT`` /
``HERMES_API_CALL_STALE_TIMEOUT`` pattern and avoids opening a
config.yaml schema discussion for a single integer. A config.yaml
knob under ``agent:`` (as the issue reporter proposed) is a
reasonable follow-up if self-hosters ask for it.

Also document the new var alongside ``HERMES_API_TIMEOUT`` in the
environment-variables reference.

Closes #11616
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 24, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Superseded by #14730 (merged) — config key already implements this. This env-var approach is no longer needed.

@alexzhu0

Copy link
Copy Markdown
Contributor Author

Superseded by #14730 (merged 2026-04-23), which delivers the same capability through a cleaner agent.api_max_retries config key (vs. the env-var approach here). The config-key shape is a better fit for hermes; closing in favor of the merged version. Thanks @teknium1!

@alexzhu0 alexzhu0 closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: adjustable provider reconnection attempt count

2 participants