Amindadgar PR#8486 retry ci fix by amindadgar · Pull Request #13519 · NousResearch/hermes-agent

amindadgar · 2026-04-21T12:57:05Z

What does this PR do?

Finishes the retry/backoff work intended for PR #8486 by wiring configurable retry settings all the way from config into AIAgent, and by fixing the actual retry behavior in run_agent.py.

This change adds:

agent.max_api_retries for full API-call retries
agent.max_stream_retries for transient streaming reconnect retries

It also fixes the outer retry loop so it no longer uses a hardcoded retry count, honors Retry-After headers, applies smarter rate-limit vs generic retry backoff, and keeps streaming reconnect recovery separate from full-request retries. This approach stays small and idiomatic by extending the existing retry paths instead of introducing a new abstraction.

Related Issue

Fixes #5570

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
📝 Documentation update
✅ Tests (adding or improving test coverage)

Changes Made

Added agent.max_api_retries and agent.max_stream_retries defaults and normalization in hermes_cli/config.py
Added CLI-side defaults and runtime plumbing in cli.py
Bridged gateway config into env vars in gateway/run.py
Updated AIAgent retry config handling and retry behavior in run_agent.py
Fixed outer retry loop to honor configured retries instead of a hardcoded value
Added capped Retry-After handling and smarter backoff logic for rate limits vs other retryable errors
Kept non-streaming requests on the outer retry loop only, and streaming reconnects on the inner loop
Ensured stream retry recovery rebuilds the primary OpenAI client when needed
Updated retry/config regression tests in:
- tests/test_api_retry_config.py
- tests/hermes_cli/test_config.py
- tests/cli/test_cli_init.py
- tests/run_agent/test_run_agent.py
Updated config/docs examples in:
- cli-config.yaml.example
- docs/acp-setup.md

How to Test

Set retry config in ~/.hermes/config.yaml, for example:

agent:
  max_api_retries: 5
  max_stream_retries: 5

Run the focused regression suite:

HERMES_HOME=/tmp/hermes-ci-home python -m pytest \
  tests/run_agent/test_run_agent.py \
  tests/run_agent/test_provider_fallback.py \
  tests/run_agent/test_streaming.py \
  tests/run_agent/test_openai_client_lifecycle.py \
  tests/test_api_retry_config.py \
  tests/hermes_cli/test_config.py \
  tests/cli/test_cli_init.py \
  -q -o addopts=''

Verify the suite passes and confirm retry behavior:
- outer retries respect max_api_retries
- stream reconnects respect max_stream_retries
- Retry-After is honored and capped
- rate-limit and generic retryable errors use different backoff behavior

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Focused verification passed locally:

361 passed in 17.93s

Command used:

HERMES_HOME=/tmp/hermes-ci-home python -m pytest \
  tests/run_agent/test_run_agent.py \
  tests/run_agent/test_provider_fallback.py \
  tests/run_agent/test_streaming.py \
  tests/run_agent/test_openai_client_lifecycle.py \
  tests/test_api_retry_config.py \
  tests/hermes_cli/test_config.py \
  tests/cli/test_cli_init.py \
  -q -o addopts=''

agent.max_api_retries in config.yaml (default 3, user set to 10). Backoff improvements: - Respects Retry-After header from API response (capped at 5 min) - Rate limits: exponential 5s*2^n with ±20% jitter, cap 5 min - Other errors: exponential 2^n, cap 60s - Was: fixed min(2**n, 60) for all cases, ignored Retry-After Usage: agent: max_api_retries: 10 # in ~/.hermes/config.yaml

…ection errors agent.max_stream_retries in config.yaml (default 2, means 3 attempts). Controls inner stream retry loop for ReadTimeout/connection drops. Works alongside max_api_retries (outer loop) for two-layer retry strategy. Usage: agent: max_api_retries: 10 # outer: full API call retries max_stream_retries: 5 # inner: stream/connection retries

iRonin and others added 4 commits April 12, 2026 11:11

test: add tests for PR NousResearch#5571

b0d69b5

feat(retries): fix CI!

d7c5234

alt-glitch added type/bug Something isn't working comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery area/config Config system, migrations, profiles labels Apr 22, 2026

alt-glitch mentioned this pull request May 1, 2026

feat(retries): configurable max_api_retries + max_stream_retries with smarter backoff #5571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Amindadgar PR#8486 retry ci fix#13519

Amindadgar PR#8486 retry ci fix#13519
amindadgar wants to merge 4 commits into
NousResearch:mainfrom
amindadgar:amindadgar-pr-8486-retry-ci-fix

amindadgar commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amindadgar commented Apr 21, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Screenshots / Logs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants