feat(auxiliary): layered fallback (chain → main agent) + capacity-error gate fix#27625
Conversation
…ity-error fallback for explicit providers Closes #26803 Root causes: 1. _is_payment_error() checked for billing keywords (credits, insufficient funds, billing, payment required) but missed daily token quota exhaustion phrases used by Bedrock, Vertex AI, and LiteLLM proxies — e.g. 'Too many tokens per day', 'quota exceeded', 'resource exhausted', 'daily limit'. These are functionally identical to credit exhaustion (provider cannot serve the request) but don't trigger fallback. 2. The call_llm() fallback chain was gated on resolved_provider == 'auto'. When a task resolves to a specific provider (e.g. 'custom' for a LiteLLM proxy, or 'openrouter'), capacity failures (payment/quota/connection) silently raise instead of trying alternatives. This is overly conservative: capacity errors mean the provider *cannot* serve the request regardless of user intent, so alternatives should always be tried. Fixes: - Add quota-related keywords to _is_payment_error(): quota_exceeded, too many tokens per day, daily limit, tokens per day, daily quota, resource exhausted (Vertex AI gRPC code). - Allow fallback for capacity errors (payment + connection) even when resolved_provider is not 'auto'. Rate-limit fallback stays gated on is_auto to honour explicit provider constraints for transient limits. - Apply both fixes to sync call_llm() and async acall_llm() paths. - Add 6 targeted tests for the new quota-error detection cases.
The two TestAuxiliaryClientPoisonedCacheEviction tests were written when explicit-provider users got no fallback at all on connection errors — they asserted ConnectionError propagated after eviction because the fallback gate blocked the auto chain. After the #26803 fix in the previous commit, capacity errors (payment/quota/connection) now DO trigger fallback even on explicit providers. The tests still verify cache eviction (their actual contract) but now stub _try_payment_fallback so the fallback machinery does not attempt a real network call.
🔎 Lint report:
|
| Rule | Count |
|---|---|
unknown-argument |
2 |
no-matching-overload |
1 |
First entries
agent/auxiliary_client.py:2719: [unknown-argument] unknown-argument: Argument `base_url` does not match any known parameter of function `resolve_provider_client`
agent/auxiliary_client.py:2679: [no-matching-overload] no-matching-overload: No overload of bound method `dict.get` matches arguments
agent/auxiliary_client.py:2720: [unknown-argument] unknown-argument: Argument `api_key` does not match any known parameter of function `resolve_provider_client`
✅ Fixed issues: none
Unchanged: 4603 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
… net Layered fallback for auxiliary tasks (compression, vision, tts, web_extract, session_search, etc.): 1. Primary aux provider (existing) 2. User-configured auxiliary.<task>.fallback_chain (new) 3. Main agent provider + model (new — last-resort safety net) 4. Warn user + re-raise original error (new) For users on 'auto' (no explicit aux provider), the existing _try_payment_fallback auto-detection chain runs instead — its Step 1 already IS the main agent model, so they get the same behaviour without configuration. The configured fallback_chain config schema comes from #26882 / @zccyman; the main-agent safety net + exhaustion warning were added on top. Closes #26882. Builds on the capacity-error gate fix in the previous commit (#26803 / @Bartok9).
7 new tests:
TestAuxiliaryFallbackLayering (3):
- configured_chain succeeds → main agent fallback NOT consulted
- chain returns nothing → main agent fallback runs and succeeds
- both exhausted → user-visible 'all fallbacks exhausted' warning
fires before the original error is re-raised
TestTryMainAgentModelFallback (4):
- returns (None, None, "") when main provider is 'auto'
- returns (None, None, "") when failed provider == main provider
(no point retrying the same backend)
- resolves the main provider's client when configured correctly
- skips when main provider is marked unhealthy
|
CI triage note from rock-turning: I made the narrow local fix on top of this PR as commit HOME=/Users/spencer scripts/run_tests.sh \
tests/agent/test_auxiliary_client.py::TestIsPaymentError \
tests/agent/test_auxiliary_client.py::TestAuxiliaryFallbackLayering \
tests/agent/test_auxiliary_client.py::TestTryMainAgentModelFallback -q
# 20 passedI also ran the full |
|
Follow-up status update: the previously pending |
Adds a new 'Auxiliary Capacity-Error Fallback' section to website/docs/user-guide/features/fallback-providers.md covering: - The 4-step ladder (primary → fallback_chain → main agent → warn) - Which errors trigger fallback (402, 429 quota, connection) vs which respect explicit provider choice (transient 429 rate limits) - Optional fallback_chain config schema with vision + compression examples - Recognized quota-error phrases (Bedrock, Vertex AI, generic) Updates the bottom summary table — every auxiliary task now shows 'Layered (see above)' instead of 'Auto-detection chain' since explicit-provider users also get the main-agent safety net.
Salvages #26811 (@Bartok9) AND #26998 (@zccyman) into a layered auxiliary fallback system. Closes #26803, closes #26882.
What this makes true
Auxiliary tasks (compression, vision, tts, web_extract, session_search, etc.) now follow a 4-step fallback ladder when the primary aux provider fails on a capacity error (402 payment, 429 quota, connection failure):
auxiliary.<task>.fallback_chainentries, in orderlogger.warning+ re-raise the original errorFor users on
auto(no explicit aux provider), the existing auto-detection chain runs instead — its Step 1 already IS the main agent model, so they get the same outcome with zero config.Config schema
If
fallback_chainis omitted, the user still gets main-agent fallback for free. The chain is optional ordering preference, not required setup.Underlying bug fixes (from #26811)
_is_payment_error()now recognizes daily/monthly quota exhaustion phrases used by Bedrock, Vertex AI, LiteLLM proxies (quota exceeded,too many tokens per day,daily limit,resource exhausted). Previously these were misclassified as transient rate limits and silently raised on explicit providers.Changes
agent/auxiliary_client.py_try_main_agent_model_fallback()helper, layered fallback incall_llm/async_call_llm, exhaustion warning, quota-keyword detectiontests/agent/test_auxiliary_client.pyscripts/release.pyValidation
scripts/run_tests.sh tests/agent/test_auxiliary_client.py→ 171 passed, 1 pre-existing failure on main (test_custom_endpoint_uses_codex_wrapper_when_runtime_requests_responses_api, unrelated to this PR).E2E verified with real imports: layered fallback resolves the configured chain entries in order, falls back to the user's actual main agent provider+model when chain exhausts, and emits the warning when both layers fail.
Credit
_is_payment_errorquota-keyword fix and capacity-error gate relaxation (fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers #26811)fallback_chainconfig schema,_try_configured_fallback_chain,_resolve_single_provider(feat(auxiliary): add configurable fallback chains for auxiliary tasks (#26882) #26998)Closes #26803, closes #26882. Closes #26998 and #26809 superseded.