Problem
When a user has only one API key configured (no credential pool with rotation), an auth error (HTTP 401/403) triggers the full retry loop instead of failing fast.
Current behavior
- API call fails with 401
error_classifier.py classifies as auth (retryable=True)
_recover_with_credential_pool() has no backup keys to rotate to → returns False
- Falls back to generic retry logic:
jittered_backoff(5s → 120s) × max_retries
- Each retry hits the same single expired/deprecated key → same 401
- All retries are wasted — API calls + backoff delay, then eventually gives up
Expected behavior
When there is only one credential in the pool (or no credential pool at all), an auth error should:
- Try a single fresh connection (in case it was a transient auth hiccup)
- If it fails again → immediately classify as
auth_permanent (retryable=False)
- Surface a clear error message: "Your API key appears to be invalid/expired. Try
hermes config set provider.api_key ... or add a backup key via hermes auth add."
Why it matters
- Wastes API calls on dead keys (up to
max_retries invocations × backoff)
- Delays user getting actionable error feedback
- In gateway mode, the 5-120s backoff per retry makes the agent appear hung
Affected users
Anyone using a single API key who runs into an expired/revoked key. This is the most common deployment pattern (single .env key, no credential pool).
Proposed solution
In conversation_loop.py, after calling _recover_with_credential_pool() returns False:
# If credential pool had nothing to rotate to, treat auth as permanent
if not recovered_with_pool and classified.is_auth:
if not transient_auth_retry_attempted:
transient_auth_retry_attempted = True
continue # One retry with a fresh connection to rule out transient
# Second failure → permanent
classified = ClassifiedError(
reason=FailoverReason.auth_permanent,
retryable=False,
message="API key appears invalid after retry — check your .env or run `hermes auth add`",
)
# Fall through to the retryable=False abort path below
This adds at most one extra API call (fresh connection), then immediately surfaces the error.
Alternative considered
- Check pool size upfront: Also valid, but requires knowing the pool cardinality at the error-site, which is one more coupling point.
Reported by a user who hit this with an expired DeepSeek API key. The system correctly identified the 401 but burned multiple retries before giving up.
Problem
When a user has only one API key configured (no credential pool with rotation), an
autherror (HTTP 401/403) triggers the full retry loop instead of failing fast.Current behavior
error_classifier.pyclassifies asauth(retryable=True)_recover_with_credential_pool()has no backup keys to rotate to → returns Falsejittered_backoff(5s → 120s)×max_retriesExpected behavior
When there is only one credential in the pool (or no credential pool at all), an
autherror should:auth_permanent(retryable=False)hermes config set provider.api_key ...or add a backup key viahermes auth add."Why it matters
max_retriesinvocations × backoff)Affected users
Anyone using a single API key who runs into an expired/revoked key. This is the most common deployment pattern (single .env key, no credential pool).
Proposed solution
In
conversation_loop.py, after calling_recover_with_credential_pool()returnsFalse:This adds at most one extra API call (fresh connection), then immediately surfaces the error.
Alternative considered
Reported by a user who hit this with an expired DeepSeek API key. The system correctly identified the 401 but burned multiple retries before giving up.