Skip to content

HTTP 402 (payment required) incorrectly retried as transient error — causes runaway token spend #31273

@Chase-Key

Description

@Chase-Key

Summary

When an LLM provider returns HTTP 402 (Payment Required — out of credits), Hermes retries the request up to agent.api_max_retries times (default: 3) as if it were a transient rate-limit or overload error. This is incorrect: a 402 is a permanent, non-retriable condition — retrying it does not resolve the underlying problem and burns additional tokens against a depleted balance.

Reproduction

  1. Configure Hermes to use OpenRouter with a low or exhausted credit balance
  2. Send any message that triggers an LLM call
  3. Observe: Hermes retries the request 3x before surfacing an error
  4. Each retry consumes credits (or, if the account recovers mid-retry, charges the user multiple times)

Impact

Real-world cost: ~$40 burned in ~48 hours (May 2026) due to this behavior compounded by a 24/7 gateway deployment routing Telegram + Discord traffic. The retry loop amplified every failed request into 3 charges before the user was notified.

Expected Behavior

HTTP 402 should be treated as non-retriable. The retry guard in the API call path should check for 402 explicitly and surface a clear user-facing error immediately:

'Provider returned 402: insufficient credits. Please top up your balance and try again.'

Suggested Fix

In the retry logic (likely run_agent.py or the model routing layer), add 402 to the non-retriable status code list alongside any other permanent errors:

NON_RETRIABLE_STATUS_CODES = {400, 401, 402, 403, 404, 422}

if response.status_code in NON_RETRIABLE_STATUS_CODES:
    raise PermanentProviderError(response.status_code, response.text)

Environment

  • Hermes version: latest (May 2026)
  • Provider: OpenRouter
  • Platform: Windows 10, gateway mode (Telegram + Discord)
  • Config: agent.api_max_retries: 3 (default)

Notes

This is distinct from the UX issue of no cost disclosure before recommending OpenRouter. That is a model knowledge problem. This is a code defect in Hermes's retry logic that applies to any pay-per-token provider that returns 402.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions