Skip to content

feat: configurable max API retries + stream retries with smarter backoff#8486

Open
iRonin wants to merge 3 commits into
NousResearch:mainfrom
iRonin:ironin/configurable-api-retries
Open

feat: configurable max API retries + stream retries with smarter backoff#8486
iRonin wants to merge 3 commits into
NousResearch:mainfrom
iRonin:ironin/configurable-api-retries

Conversation

@iRonin

@iRonin iRonin commented Apr 12, 2026

Copy link
Copy Markdown
Contributor

Configurable Two-Layer Retry Strategy

Configuration

agent:
  max_api_retries: 10
  max_stream_retries: 10

Retry behavior

  • Outer loop (max_api_retries, default 3): Respects Retry-After header, rate-limit backoff 5s * 2^n with jitter (cap 5 min), other errors 2^n (cap 60s)
  • Inner loop (max_stream_retries, default 2): Retries streaming requests on ReadTimeout/connection drops, rebuilds primary client to purge dead connections

Closes #5570

iRonin added 3 commits April 12, 2026 11:11
agent.max_api_retries in config.yaml (default 3, user set to 10).

Backoff improvements:
- Respects Retry-After header from API response (capped at 5 min)
- Rate limits: exponential 5s*2^n with ±20% jitter, cap 5 min
- Other errors: exponential 2^n, cap 60s
- Was: fixed min(2**n, 60) for all cases, ignored Retry-After

Usage:
  agent:
    max_api_retries: 10  # in ~/.hermes/config.yaml
…ection errors

agent.max_stream_retries in config.yaml (default 2, means 3 attempts).
Controls inner stream retry loop for ReadTimeout/connection drops.
Works alongside max_api_retries (outer loop) for two-layer retry strategy.

Usage:
  agent:
    max_api_retries: 10     # outer: full API call retries
    max_stream_retries: 5   # inner: stream/connection retries
@amindadgar

Copy link
Copy Markdown

I'm willing to work on this but first wanted to know if anyone worked on a PR like this before?

@iRonin

iRonin commented Apr 21, 2026

Copy link
Copy Markdown
Contributor Author

@amindadgar I am a little bit busy this week but will try my best to help
I have had all my PRs deployed locally so all was working (cosidering they were patches on hermes core)

@amindadgar

Copy link
Copy Markdown

Let me see if I can fix the CI errors we're facing here :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/config Config system, migrations, profiles comp/agent Core agent loop, run_agent.py, prompt builder P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: configurable max API retries + stream retries with smarter backoff

3 participants