Skip to content

feat: Vision fallback chain for auxiliary models (Gemini → configured fallback → local)#25878

Closed
saved-j wants to merge 1 commit into
NousResearch:mainfrom
saved-j:feat/vision-fallback-chain
Closed

feat: Vision fallback chain for auxiliary models (Gemini → configured fallback → local)#25878
saved-j wants to merge 1 commit into
NousResearch:mainfrom
saved-j:feat/vision-fallback-chain

Conversation

@saved-j

@saved-j saved-j commented May 14, 2026

Copy link
Copy Markdown

Summary

When the primary vision provider fails (payment errors, quota exhaustion, server errors), automatically fall back through a configurable chain instead of raising immediately.

Problem

Gemini free tier (250 req/day) exhausts quickly with Hermes agent usage. Without a fallback chain, all vision requests fail with HTTP 429 until quota resets. The existing `fallback_provider` and `local_fallback_provider` config keys in `auxiliary.vision` were dead keys — the code never read them.

Changes

1. `_resolve_vision_fallback_client()` (new function)

Resolves the configured fallback chain from `auxiliary.vision.{fallback,local_fallback}_{provider,model}`. Tries `fallback_provider` first, then `local_fallback_provider`.

2. `_is_server_error()` (new function)

Detects HTTP 5xx errors (500, 502, 503, 504) that indicate provider-side failures — should trigger fallback rather than retry.

3. `_is_payment_error()` (extended)

Now catches `"quota exceeded"` and `"exceeded your current quota"` messages from Gemini API.

4. `call_llm()` and `async_call_llm()` (modified)

Before raising, check if `task == "vision"` and either:

  • Client could not be created (`client is None`)
  • Client was created but API call failed (`first_err is not None`)

If so, resolve and invoke the fallback chain immediately.

Configuration

```yaml
auxiliary:
vision:
provider: gemini
model: gemini-2.5-flash
fallback_provider: Xiaomi-TP
fallback_model: mimo-v2-omni
local_fallback_provider: ollama-launch
local_fallback_model: llama3.2-vision:11b
```

Testing

Verified on production:

  • Gemini returned HTTP 429 (RESOURCE_EXHAUSTED, quota exceeded)
  • Fallback resolved Xiaomi-TP (mimo-v2-omni) successfully
  • Image analysis completed in ~10s via fallback
  • Tested with multiple images (bicycle cargo trailer with child, bird in hand with EXIF data)

Logs

```
INFO agent.auxiliary_client: Vision fallback: using Xiaomi-TP (mimo-v2-omni)
INFO agent.auxiliary_client: Vision fallback chain: using Xiaomi-TP (mimo-v2-omni)
INFO tools.vision_tools: Image analysis completed (2158 characters)
```

Related

When the primary vision provider (e.g. Gemini) fails with payment errors,
server errors (5xx), or quota exhaustion, automatically fall back through
the configured fallback_provider → local_fallback_provider chain.

Changes:
- _resolve_vision_fallback_client(): resolves configured fallback chain
  from auxiliary.vision.{fallback,local_fallback}_{provider,model}
- _is_server_error(): detects 5xx HTTP errors that should trigger fallback
- _is_payment_error(): extended to catch 'quota exceeded' messages
- call_llm() and async_call_llm(): invoke fallback chain before raising

Config example:
  auxiliary:
    vision:
      provider: gemini
      model: gemini-2.5-flash
      fallback_provider: Xiaomi-TP
      fallback_model: mimo-v2-omni
      local_fallback_provider: ollama-launch
      local_fallback_model: llama3.2-vision:11b

Verified: Gemini 429 → Xiaomi-TP (mimo-v2-omni) fallback chain working.
Tested with image analysis: bird in hand (vivo X200 Ultra, 35mm f/1.69).
@teknium1

Copy link
Copy Markdown
Contributor

This has been implemented on current main by the generalized auxiliary fallback ladder. Thanks for pushing this direction — the merged version uses a slightly different documented config shape (auxiliary.<task>.fallback_chain) but covers the vision fallback behavior and extends it to other auxiliary tasks too.

Automated hermes-sweeper review evidence:

  • agent/auxiliary_client.py:2352 detects quota/payment exhaustion, including quota exceeded, quota_exceeded, and resource exhausted.
  • agent/auxiliary_client.py:5434 and agent/auxiliary_client.py:5880 route sync and async auxiliary failures through fallback on payment/quota, connection, and rate-limit capacity errors.
  • agent/auxiliary_client.py:3049 implements _try_configured_fallback_chain(), reading auxiliary.<task>.fallback_chain in order.
  • Commit a57424683759617040dd82082d85128deb236de4 added the configurable auxiliary fallback chains plus the main-agent safety net.
  • website/docs/user-guide/features/fallback-providers.md:297 documents the auxiliary fallback ladder, including a vision example.

One caveat: the merged schema is fallback_chain, not the PR's proposed fallback_provider / local_fallback_provider keys.

@teknium1 teknium1 closed this Jun 12, 2026
@teknium1 teknium1 added the sweeper:implemented-on-main Sweeper: behavior already present on current main label Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists sweeper:implemented-on-main Sweeper: behavior already present on current main tool/vision Vision analysis and image generation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants