Summary
When an LLM provider returns HTTP 402 (Payment Required — out of credits), Hermes retries the request up to agent.api_max_retries times (default: 3) as if it were a transient rate-limit or overload error. This is incorrect: a 402 is a permanent, non-retriable condition — retrying it does not resolve the underlying problem and burns additional tokens against a depleted balance.
Reproduction
- Configure Hermes to use OpenRouter with a low or exhausted credit balance
- Send any message that triggers an LLM call
- Observe: Hermes retries the request 3x before surfacing an error
- Each retry consumes credits (or, if the account recovers mid-retry, charges the user multiple times)
Impact
Real-world cost: ~$40 burned in ~48 hours (May 2026) due to this behavior compounded by a 24/7 gateway deployment routing Telegram + Discord traffic. The retry loop amplified every failed request into 3 charges before the user was notified.
Expected Behavior
HTTP 402 should be treated as non-retriable. The retry guard in the API call path should check for 402 explicitly and surface a clear user-facing error immediately:
'Provider returned 402: insufficient credits. Please top up your balance and try again.'
Suggested Fix
In the retry logic (likely run_agent.py or the model routing layer), add 402 to the non-retriable status code list alongside any other permanent errors:
NON_RETRIABLE_STATUS_CODES = {400, 401, 402, 403, 404, 422}
if response.status_code in NON_RETRIABLE_STATUS_CODES:
raise PermanentProviderError(response.status_code, response.text)
Environment
- Hermes version: latest (May 2026)
- Provider: OpenRouter
- Platform: Windows 10, gateway mode (Telegram + Discord)
- Config: agent.api_max_retries: 3 (default)
Notes
This is distinct from the UX issue of no cost disclosure before recommending OpenRouter. That is a model knowledge problem. This is a code defect in Hermes's retry logic that applies to any pay-per-token provider that returns 402.
Summary
When an LLM provider returns HTTP 402 (Payment Required — out of credits), Hermes retries the request up to
agent.api_max_retriestimes (default: 3) as if it were a transient rate-limit or overload error. This is incorrect: a 402 is a permanent, non-retriable condition — retrying it does not resolve the underlying problem and burns additional tokens against a depleted balance.Reproduction
Impact
Real-world cost: ~$40 burned in ~48 hours (May 2026) due to this behavior compounded by a 24/7 gateway deployment routing Telegram + Discord traffic. The retry loop amplified every failed request into 3 charges before the user was notified.
Expected Behavior
HTTP 402 should be treated as non-retriable. The retry guard in the API call path should check for 402 explicitly and surface a clear user-facing error immediately:
'Provider returned 402: insufficient credits. Please top up your balance and try again.'
Suggested Fix
In the retry logic (likely run_agent.py or the model routing layer), add 402 to the non-retriable status code list alongside any other permanent errors:
Environment
Notes
This is distinct from the UX issue of no cost disclosure before recommending OpenRouter. That is a model knowledge problem. This is a code defect in Hermes's retry logic that applies to any pay-per-token provider that returns 402.