Bug Description
Symptom: When the primary inference provider returns a 401 (bad/expired API key), compression fails silently and messages are discarded. The conversation continues, but the compression summary logs:
Compression summary failed: 401
Impact: Message history is lost after compaction events when the primary provider has an auth failure. The fallback chain is never attempted.
Root Cause
In agent/auxiliary_client.py, the should_fallback trigger condition in both call_llm() (sync) and async_call_llm() only checks for three error types:
_is_payment_error (402 / 429 credits exceeded)
_is_connection_error (network failures)
_is_rate_limit_error (429 rate limiting)
401 authentication errors are not included. The _is_auth_error() helper function already exists and is used in the refresh logic (it correctly triggers the auth refresh flow before reaching fallback), but it is missing from the should_fallback boolean that gates whether to try an alternative provider.
Reproduction Path
- Configure
auto as the provider (uses fallback chain)
- Primary provider (e.g. MiniMax) has a bad/expired key → returns 401
- Auth refresh is attempted → fails (key really is invalid)
- Code reaches
should_fallback check → 401 is not in the condition → fallback is skipped
- Compression fails, messages are discarded
Fix
Add _is_auth_error(first_err) to the should_fallback condition in both sync and async versions.
Sync version (call_llm, ~line 3728):
should_fallback = (
_is_auth_error(first_err) # ← add this
or _is_payment_error(first_err)
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
)
if _is_auth_error(first_err):
reason = "auth error"
elif _is_payment_error(first_err):
reason = "payment error"
elif _is_rate_limit_error(first_err):
reason = "rate limit"
else:
reason = "connection error"
Async version (async_call_llm, ~line 4016): identical patch.
Execution order is correct: The auth refresh (Nous-specific or provider credential refresh) runs first (~lines 3649–3700). Fallback is only attempted after refresh fails. Adding _is_auth_error to should_fallback preserves this ordering — refresh is always tried first, and if it fails, fallback kicks in.
Verification
After applying the fix, trigger a compression event (long conversation) and check logs for:
trying fallback
auth error
grep -c "_is_auth_error(first_err)" agent/auxiliary_client.py
# Expected: >= 2 (sync + async versions)
Additional Context
_is_auth_error() function exists at line 1774 and correctly identifies 401 errors
- The function is already used in the refresh logic path (correctly)
- The gap is only in the fallback trigger condition
- Fix is a minimal 2-line addition to
should_fallback + corresponding reason branch
- No changes to refresh logic required
Bug Description
Symptom: When the primary inference provider returns a
401(bad/expired API key), compression fails silently and messages are discarded. The conversation continues, but the compression summary logs:Impact: Message history is lost after compaction events when the primary provider has an auth failure. The fallback chain is never attempted.
Root Cause
In
agent/auxiliary_client.py, theshould_fallbacktrigger condition in bothcall_llm()(sync) andasync_call_llm()only checks for three error types:_is_payment_error(402 / 429 credits exceeded)_is_connection_error(network failures)_is_rate_limit_error(429 rate limiting)401authentication errors are not included. The_is_auth_error()helper function already exists and is used in the refresh logic (it correctly triggers the auth refresh flow before reaching fallback), but it is missing from theshould_fallbackboolean that gates whether to try an alternative provider.Reproduction Path
autoas the provider (uses fallback chain)should_fallbackcheck → 401 is not in the condition → fallback is skippedFix
Add
_is_auth_error(first_err)to theshould_fallbackcondition in both sync and async versions.Sync version (
call_llm, ~line 3728):Async version (
async_call_llm, ~line 4016): identical patch.Execution order is correct: The auth refresh (Nous-specific or provider credential refresh) runs first (~lines 3649–3700). Fallback is only attempted after refresh fails. Adding
_is_auth_errortoshould_fallbackpreserves this ordering — refresh is always tried first, and if it fails, fallback kicks in.Verification
After applying the fix, trigger a compression event (long conversation) and check logs for:
Additional Context
_is_auth_error()function exists at line 1774 and correctly identifies 401 errorsshould_fallback+ corresponding reason branch