fix(agent): abort retry loop on 402 billing instead of burning api_max_retries#31303
fix(agent): abort retry loop on 402 billing instead of burning api_max_retries#31303briandevans wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a regression guard to ensure HTTP 402 “billing” errors are treated as non-retryable client errors (abort) rather than being retried, preventing additional charges when credits are depleted.
Changes:
- Introduces
_NON_CLIENT_ERROR_REASONSinconversation_loop.pyand removesFailoverReason.billingfrom the exclusion set. - Refactors the
is_client_errorpredicate to reference_NON_CLIENT_ERROR_REASONS. - Adds tests asserting billing is not excluded and that 402 billing aborts while 402 rate-limit remains non-aborting.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/run_agent/test_402_billing_not_retried.py | Adds regression tests verifying billing is not excluded from client-error abort behavior. |
| agent/conversation_loop.py | Centralizes non-client-error exclusions into _NON_CLIENT_ERROR_REASONS and excludes billing to avoid retrying 402s. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| from __future__ import annotations | ||
|
|
||
| from agent.conversation_loop import _NON_CLIENT_ERROR_REASONS |
| def _is_client_error(classified: ClassifiedError) -> bool: | ||
| """Mirror of conversation_loop.py's is_client_error predicate. |
| # rate_limit is in the exclusion set, so is_client_error must stay False. | ||
| err = _APIError( | ||
| "Usage limit exceeded, try again in 5 minutes", | ||
| status_code=402, |
| synthetic = ClassifiedError( | ||
| reason=FailoverReason.billing, | ||
| status_code=400, | ||
| retryable=False, |
95a1a52 to
6228da8
Compare
CI Flake FixThe Root cause: Race condition under CI load. Fix (2 changes):
Cherry-pick: git fetch https://github.com/talwayh1/hermes-agent.git fix/agent-402-billing-retry-loop-31273
git cherry-pick 659b4d7abc15c04e3775e097c8d4aafa29ac6120Verification: Test passes locally in ~2.06s (12/12 in class). |
1d5a4be to
d60ec99
Compare
c9be261 to
34bdc65
Compare
…x_retries The conversation loop's is_client_error predicate previously included FailoverReason.billing in its exclusion set, alongside genuinely retryable reasons (rate_limit, overloaded) and compression-triggering reasons (context_overflow, payload_too_large). Billing is the only non-retryable reason in the set, so any 402 (or 400 mapped to billing) that survived upstream pool rotation and eager fallback fell through to the generic retry loop and was retried api_max_retries times — three more 402 charges against the user's already-depleted credit balance by default. This is the path NousResearch#31273 hit: an OpenRouter account with depleted credits and a single-credential pool, no fallback chain configured. Pool rotation returned None, eager fallback skipped because the chain was empty, is_client_error then returned False because billing was in the exclusion set, and the loop burned three retries before reaching the max_retries fallback branch. Lift the exclusion set to a module-level frozenset and remove FailoverReason.billing. The remaining members are either retryable or have their own recovery paths; the comment documents why billing is intentionally excluded. Pool rotation and eager fallback still run before this check unchanged — only the abort-after-both-recovery-paths-fail case is affected. Audited siblings: confirmed credential-pool rotation (`agent/agent_runtime_helpers.py::_recover_with_credential_pool`), eager fallback for billing+rate_limit, and auxiliary-client 402 caching all handle billing correctly. No widening needed.
The belt-and-suspenders test was source-text scanning a window starting
at "is_client_error = (" for the FailoverReason.rate_limit literal. The
fix in this PR refactored the exclusion set into a module-level
_NON_CLIENT_ERROR_REASONS frozenset, so the window no longer contains
that literal and the canary asserted false against its own refactor.
Import the constant directly and assert membership. This is more robust
than text-scanning and survives future moves of the predicate.
34bdc65 to
cf9c732
Compare
|
Closing to focus the queue on security/file-safety work where civilian merges are landing. Happy to reopen if maintainers want this picked up. |
What does this PR do?
The conversation loop's
is_client_errorpredicate previously includedFailoverReason.billingin its exclusion set, alongside genuinely retryable reasons (rate_limit,overloaded) and compression-triggering reasons (context_overflow,payload_too_large). Billing is the only non-retryable reason in that set, so any 402 (or 400 mapped to billing) that survived upstream pool rotation and eager fallback fell through to the generic retry loop and was retriedagent.api_max_retriestimes — three more 402 charges against the user's already-depleted credit balance by default. This is the path #31273 hit: an OpenRouter account with depleted credits, single-credential pool, no fallback chain.The fix lifts the exclusion set to a module-level
frozenset(_NON_CLIENT_ERROR_REASONS) and removesFailoverReason.billing. Pool rotation and eager fallback still run earlier in the loop unchanged — only the abort-after-both-recovery-paths-fail case is affected. After this change, a 402 with no rotation/fallback recovery falls into the existing client-error abort path, which already has billing-specific actionable guidance (link to OpenRouter credits page, etc.).Related Issue
Fixes #31273
Type of Change
Changes Made
agent/conversation_loop.py— extracted the inline exclusion set into a module-level_NON_CLIENT_ERROR_REASONSfrozenset and removedFailoverReason.billing. The accompanying comment documents why billing is intentionally excluded.tests/run_agent/test_402_billing_not_retried.py— new regression-guard test, modelled ontest_jsondecodeerror_retryable.py. Mirrors theis_client_errorpredicate and asserts that a 402 / billing-classified 400 evaluates to client-error, while transient 402 (usage-limit "try again") still classifies asrate_limitand is excluded.How to Test
uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/run_agent/test_402_billing_not_retried.py -v— 7/7 pass.uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/agent/test_error_classifier.py tests/run_agent/test_api_max_retries_config.py tests/run_agent/test_1630_context_overflow_loop.py tests/run_agent/test_jsondecodeerror_retryable.py tests/run_agent/test_run_agent.py::TestCredentialPoolRecovery -v— 161 + 10 pass.Checklist
Code
fix(scope):,feat(scope):, etc.)Documentation & Housekeeping
docs/, docstrings) — N/Acli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/ASibling audit