Skip to content

fix(agent): abort retry loop on 402 billing instead of burning api_max_retries#31303

Closed
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/agent-402-billing-retry-loop-31273
Closed

fix(agent): abort retry loop on 402 billing instead of burning api_max_retries#31303
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/agent-402-billing-retry-loop-31273

Conversation

@briandevans

Copy link
Copy Markdown
Contributor

What does this PR do?

The conversation loop's is_client_error predicate previously included FailoverReason.billing in its exclusion set, alongside genuinely retryable reasons (rate_limit, overloaded) and compression-triggering reasons (context_overflow, payload_too_large). Billing is the only non-retryable reason in that set, so any 402 (or 400 mapped to billing) that survived upstream pool rotation and eager fallback fell through to the generic retry loop and was retried agent.api_max_retries times — three more 402 charges against the user's already-depleted credit balance by default. This is the path #31273 hit: an OpenRouter account with depleted credits, single-credential pool, no fallback chain.

The fix lifts the exclusion set to a module-level frozenset (_NON_CLIENT_ERROR_REASONS) and removes FailoverReason.billing. Pool rotation and eager fallback still run earlier in the loop unchanged — only the abort-after-both-recovery-paths-fail case is affected. After this change, a 402 with no rotation/fallback recovery falls into the existing client-error abort path, which already has billing-specific actionable guidance (link to OpenRouter credits page, etc.).

Related Issue

Fixes #31273

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • agent/conversation_loop.py — extracted the inline exclusion set into a module-level _NON_CLIENT_ERROR_REASONS frozenset and removed FailoverReason.billing. The accompanying comment documents why billing is intentionally excluded.
  • tests/run_agent/test_402_billing_not_retried.py — new regression-guard test, modelled on test_jsondecodeerror_retryable.py. Mirrors the is_client_error predicate and asserts that a 402 / billing-classified 400 evaluates to client-error, while transient 402 (usage-limit "try again") still classifies as rate_limit and is excluded.

How to Test

  1. uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/run_agent/test_402_billing_not_retried.py -v — 7/7 pass.
  2. Adjacent regression sweep: uv run --with pytest --with pytest-xdist --with pytest-asyncio --with pytest-timeout python3 -m pytest tests/agent/test_error_classifier.py tests/run_agent/test_api_max_retries_config.py tests/run_agent/test_1630_context_overflow_loop.py tests/run_agent/test_jsondecodeerror_retryable.py tests/run_agent/test_run_agent.py::TestCredentialPoolRecovery -v — 161 + 10 pass.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run focused tests for the touched code and all pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15.x

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — N/A (pure logic change in the error-classification path)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Sibling audit

Audited siblings: confirmed agent/agent_runtime_helpers.py::_recover_with_credential_pool (already maps 402 + FailoverReason.billing → pool rotation), the eager-fallback block in agent/conversation_loop.py (already lists FailoverReason.billing alongside rate_limit), and agent/auxiliary_client.py (already caches 402'd providers as unhealthy with TTL). All upstream billing-recovery paths run before the fixed predicate. No widening needed.

Copilot AI review requested due to automatic review settings May 24, 2026 04:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a regression guard to ensure HTTP 402 “billing” errors are treated as non-retryable client errors (abort) rather than being retried, preventing additional charges when credits are depleted.

Changes:

  • Introduces _NON_CLIENT_ERROR_REASONS in conversation_loop.py and removes FailoverReason.billing from the exclusion set.
  • Refactors the is_client_error predicate to reference _NON_CLIENT_ERROR_REASONS.
  • Adds tests asserting billing is not excluded and that 402 billing aborts while 402 rate-limit remains non-aborting.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
tests/run_agent/test_402_billing_not_retried.py Adds regression tests verifying billing is not excluded from client-error abort behavior.
agent/conversation_loop.py Centralizes non-client-error exclusions into _NON_CLIENT_ERROR_REASONS and excludes billing to avoid retrying 402s.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""
from __future__ import annotations

from agent.conversation_loop import _NON_CLIENT_ERROR_REASONS
Comment on lines +37 to +38
def _is_client_error(classified: ClassifiedError) -> bool:
"""Mirror of conversation_loop.py's is_client_error predicate.
# rate_limit is in the exclusion set, so is_client_error must stay False.
err = _APIError(
"Usage limit exceeded, try again in 5 minutes",
status_code=402,
synthetic = ClassifiedError(
reason=FailoverReason.billing,
status_code=400,
retryable=False,
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/agent Core agent loop, run_agent.py, prompt builder labels May 24, 2026
@briandevans briandevans force-pushed the fix/agent-402-billing-retry-loop-31273 branch from 95a1a52 to 6228da8 Compare May 25, 2026 06:16
@talwayh1

Copy link
Copy Markdown

CI Flake Fix

The test_pub_broadcasts_to_events_subscribers test in TestPtyWebSocket flaked on run #26386399780 (slice 3/6) with:

AssertionError: broadcast not received within 10s — server likely dropped the frame silently

Root cause: Race condition under CI load. TestClient runs the ASGI app in a background thread. When the test calls pub.send_text(), the ASGI thread needs CPU time to process the message and call _broadcast_event(). Under heavy CI load, the ASGI thread was starved, and the 10s receive timeout expired before the broadcast arrived.

Fix (2 changes):

  1. Added time.sleep(1.0) after pub.send_text() — the code comment already described this yield but the actual sleep was missing
  2. Increased recv_q.get() timeout from 10s → 30s for CI load tolerance

Cherry-pick:

git fetch https://github.com/talwayh1/hermes-agent.git fix/agent-402-billing-retry-loop-31273
git cherry-pick 659b4d7abc15c04e3775e097c8d4aafa29ac6120

Verification: Test passes locally in ~2.06s (12/12 in class).

@briandevans briandevans force-pushed the fix/agent-402-billing-retry-loop-31273 branch from 1d5a4be to d60ec99 Compare May 25, 2026 09:08
@briandevans briandevans force-pushed the fix/agent-402-billing-retry-loop-31273 branch 2 times, most recently from c9be261 to 34bdc65 Compare May 29, 2026 20:11
…x_retries

The conversation loop's is_client_error predicate previously included
FailoverReason.billing in its exclusion set, alongside genuinely retryable
reasons (rate_limit, overloaded) and compression-triggering reasons
(context_overflow, payload_too_large). Billing is the only non-retryable
reason in the set, so any 402 (or 400 mapped to billing) that survived
upstream pool rotation and eager fallback fell through to the generic
retry loop and was retried api_max_retries times — three more 402 charges
against the user's already-depleted credit balance by default.

This is the path NousResearch#31273 hit: an OpenRouter account with depleted credits
and a single-credential pool, no fallback chain configured. Pool rotation
returned None, eager fallback skipped because the chain was empty,
is_client_error then returned False because billing was in the exclusion
set, and the loop burned three retries before reaching the max_retries
fallback branch.

Lift the exclusion set to a module-level frozenset and remove
FailoverReason.billing. The remaining members are either retryable or
have their own recovery paths; the comment documents why billing is
intentionally excluded. Pool rotation and eager fallback still run before
this check unchanged — only the abort-after-both-recovery-paths-fail
case is affected.

Audited siblings: confirmed credential-pool rotation
(`agent/agent_runtime_helpers.py::_recover_with_credential_pool`), eager
fallback for billing+rate_limit, and auxiliary-client 402 caching all
handle billing correctly. No widening needed.
The belt-and-suspenders test was source-text scanning a window starting
at "is_client_error = (" for the FailoverReason.rate_limit literal. The
fix in this PR refactored the exclusion set into a module-level
_NON_CLIENT_ERROR_REASONS frozenset, so the window no longer contains
that literal and the canary asserted false against its own refactor.

Import the constant directly and assert membership. This is more robust
than text-scanning and survives future moves of the predicate.
@briandevans briandevans force-pushed the fix/agent-402-billing-retry-loop-31273 branch from 34bdc65 to cf9c732 Compare June 1, 2026 05:16
@briandevans

Copy link
Copy Markdown
Contributor Author

Closing to focus the queue on security/file-safety work where civilian merges are landing. Happy to reopen if maintainers want this picked up.

@briandevans briandevans closed this Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTP 402 (payment required) incorrectly retried as transient error — causes runaway token spend

4 participants