Skip to content

fix: correct off-by-one in retry exhaustion checks #223

Merged
teknium1 merged 1 commit into
NousResearch:mainfrom
Farukest:fix/retry-off-by-one
Mar 2, 2026
Merged

fix: correct off-by-one in retry exhaustion checks #223
teknium1 merged 1 commit into
NousResearch:mainfrom
Farukest:fix/retry-off-by-one

Conversation

@Farukest

Copy link
Copy Markdown
Contributor

When API retries are exhausted due to invalid responses (empty/null choices) or repeated errors, the agent crashes with IndexError instead of returning a proper error.

Changes

Changed > to >= in two retry exhaustion checks in run_agent.py:

  • Line 2095 - invalid response retries (empty/null choices)
  • Line 2326 - API error retries

The while loop condition is retry_count < max_retries, so retry_count maxes out at max_retries after increment. The old check retry_count > max_retries could never be true inside the loop, making it dead code. The loop would exit and fall through to response.choices[0] on an invalid response, crashing with IndexError.

Tests

Added TestRetryExhaustion to tests/test_run_agent.py with 2 regression tests:

  • Invalid response (empty choices) returns error dict instead of crashing
  • API errors raise after retries instead of falling through

All 74 tests pass.

Closes #222

The retry exhaustion checks used > instead of >= to compare
retry_count against max_retries. Since the while loop condition is
retry_count < max_retries, the check retry_count > max_retries can
never be true inside the loop. When retries are exhausted, the loop
exits and falls through to response.choices[0] on an invalid response,
crashing with IndexError instead of returning a proper error.
@teknium1 teknium1 merged commit e27e3a4 into NousResearch:main Mar 2, 2026
teknium1 added a commit that referenced this pull request Mar 2, 2026
The TestRetryExhaustion tests from PR #223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.

Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
@teknium1

teknium1 commented Mar 2, 2026

Copy link
Copy Markdown
Contributor

Merged in e27e3a4 — thanks for the fix and the clear write-up!

One follow-up in 234b67f: added run_agent.time mocking to both tests. Without it the retry backoff loops sleep for ~275s+ real time, causing the tests to time out. The mock advances time.time() by 500s per call so the sleep loops exit instantly.

@Farukest

Farukest commented Mar 3, 2026

Copy link
Copy Markdown
Contributor Author

Thanks! Good catch on the time mocking for the retry tests

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
The TestRetryExhaustion tests from PR NousResearch#223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.

Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
The TestRetryExhaustion tests from PR NousResearch#223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.

Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
The TestRetryExhaustion tests from PR NousResearch#223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.

Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Retry exhaustion falls through to IndexError instead of returning error

2 participants