fix(oneshot): honor fallback_providers chain during worker startup, not just runtime#9
Merged
Merged
Conversation
…ot just runtime Closes #6. Problem Worker startup calls `resolve_runtime_provider` to acquire credentials for the primary provider. If that raises AuthError (xAI OAuth token expired, Anthropic logged out, Codex revoked), the worker crashes before AIAgent's runtime fallback loop ever gets a chance — even though the user has explicitly configured a fallback chain for exactly this case. Observed in the v6.6 incident 2026-06-07: xAI OAuth token went missing mid-session and every subsequent worker crashed at startup despite having `fallback_providers: [openai-codex/gpt-5.5, xai-oauth/grok-4.3]` configured. Solution New helper `_resolve_runtime_with_fallback` wraps the primary-resolution call. On AuthError, iterates the configured fallback chain (read once from `get_fallback_chain(cfg)`) until one succeeds. If all fail, re-raises the LAST AuthError so cli.py's exit handling can surface it. Three safety bounds preserved (informed by code-review): 1. **Explicit CLI pin** — `hermes -z --model X --provider Y ...` should NOT silently downgrade. When `model` OR `provider` was a non-empty CLI arg, the helper re-raises primary AuthError verbatim, no fallback attempt. 2. **Rate-limit AuthError on primary** — falling through to other providers would burn their quota in milliseconds (the "quota amplification" footgun). Detected via existing `is_rate_limited_auth_error()` — re-raise immediately; existing rate-limit handling (cli.py exit 75) gets the task requeued. 3. **Remaining-chain handoff to AIAgent** — when fallback lands on chain entry [N], AIAgent's runtime fallback loop should only see entries AFTER N (not the dead primary, not the entry we just used). The helper now returns `(runtime, effective_model, landed_at_index, remaining_chain)` and the caller passes `remaining` to AIAgent's `fallback_model`. Implementation - `hermes_cli/oneshot.py:33-110` — new helper (testable at module level). - `hermes_cli/oneshot.py:439-460` — call site updated; reads chain once, detects explicit_pin from CLI args, passes remaining_chain to AIAgent. - AIAgent receives the correctly-sliced chain via `fallback_model=_fb`, preserving existing runtime-fallback semantics for mid-conversation failures. Tests (9/9 passing) — tests/cli/test_oneshot_runtime_fallback.py - primary succeeds → no fallback attempted, full chain preserved for AIAgent - primary fails, first fallback succeeds → effective_model advances, remaining_chain sliced correctly - two failures → third succeeds, slicing correct - all fail → LAST AuthError propagates (not primary's) - empty chain → primary error verbatim - fallback without model → effective_model preserved - explicit_pin=True → no fallback, primary error verbatim - rate-limit AuthError → no fallback, primary error verbatim - same provider in chain → no infinite loop, advances to next entry Code-review pre-merge: reviewer caught silent-downgrade regression, stale chain handoff, and quota-amplification footgun. All three addressed. Follow-up (separate issues, not blocking) - Consider applying the same pattern to `gateway/run.py:_resolve_runtime_agent_kwargs` and `cli.py:4881-4914` for a consistent worker-startup contract across surfaces. - Optional: emit a metric/heartbeat counter when fallback fires so we can detect "constantly failing primary" silently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
resolve_runtime_providerAuthError now iterates the configuredfallback_providerschain instead of crashing the worker.Why this matters
v6.6 incident (2026-06-07): xAI OAuth token went missing mid-session and every subsequent worker crashed at startup, despite each profile having
fallback_providers: [openai-codex/gpt-5.5, xai-oauth/grok-4.3]configured. The fallback chain was only honored AFTER successful credential resolution (AIAgent's runtime loop), not DURING initial worker startup. This fixes that.Combined with #7 (now merged — dispatcher heartbeat) and #5 (still open — 429 exit-code mapping), the worker-startup → dispatcher-detect → next-retry loop is now operationally robust.
Test plan (9/9 passing)
Code-review focus
hermes -z --model X --provider Yshould NOT silently downgrade. Newexplicit_pin: boolparameter, derived from CLI args (not env vars, not config).is_rate_limited_auth_error()from auth.py.(runtime, model, landed_idx, remaining_chain); caller passesremainingto AIAgent so its runtime loop doesn't re-try the dead primary.get_fallback_chain(cfg)to one local — was being called twice (TOCTOU on mutable cfg).Follow-up (separate issues, not blocking)
gateway/run.py:_resolve_runtime_agent_kwargsandcli.py:4881-4914for consistent worker-startup contract across all surfaces (CLI / gateway-spawned / oneshot).🤖 Generated with Claude Code