Problem
Users who set their base model to a local LLM (e.g., Qwen 3.5 35B) are surprised to see OpenAI/Codex API calls burning tokens. The cause: the auxiliary model system for background tasks (compression, vision, smart approval, memory flush, session search) auto-resolves independently of the base model.
The auto-detection chain at `auxiliary_client.py:759`:
```
OpenRouter → Nous Portal → Custom endpoint → Codex → API key provider → None
```
If the user has any of these API keys set from a previous setup (`OPENROUTER_API_KEY`, `OPENAI_API_KEY`, Codex OAuth token), the auxiliary system silently uses them — even though the user explicitly configured a local model as their base.
User experience
- User runs `hermes setup`, selects local Qwen 3.5 35B as their model
- User also has `OPENROUTER_API_KEY` set in `.env` from a previous configuration
- User starts chatting — base model uses local Qwen correctly
- Context compression fires → silently routes to OpenRouter → burns tokens
- Vision preprocessing fires → silently routes to OpenRouter → burns more tokens
- User has no idea this is happening until they check their API billing
Suggested fixes
Minimal: Surface auxiliary routing in /usage
Add a line to `/usage` output showing which provider is being used for auxiliary tasks:
```
Auxiliary model: openrouter/google/gemini-3-flash (auto-detected)
```
Better: Warn on mismatch
If the base model is local (custom endpoint) but auxiliary auto-resolved to a paid provider, emit a one-time warning:
```
⚠️ Auxiliary tasks (compression, vision) are using OpenRouter.
To route everything through your local model, add to config.yaml:
auxiliary:
compression:
provider: custom
base_url: http://localhost:1234/v1
```
Best: `hermes setup` should ask
When the user selects a local model, setup should ask: "Route auxiliary tasks (compression, vision) through this model too? [y/N]" and configure `auxiliary.*` accordingly.
Technical details
- Auxiliary routing: `agent/auxiliary_client.py:754-776` (`_resolve_auto`)
- Task-specific config: `auxiliary.{task}.provider` and `auxiliary.{task}.model` in config.yaml
- Tasks that use auxiliary: compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories
- The user CAN configure this manually, but setup doesn't guide them to it
Problem
Users who set their base model to a local LLM (e.g., Qwen 3.5 35B) are surprised to see OpenAI/Codex API calls burning tokens. The cause: the auxiliary model system for background tasks (compression, vision, smart approval, memory flush, session search) auto-resolves independently of the base model.
The auto-detection chain at `auxiliary_client.py:759`:
```
OpenRouter → Nous Portal → Custom endpoint → Codex → API key provider → None
```
If the user has any of these API keys set from a previous setup (`OPENROUTER_API_KEY`, `OPENAI_API_KEY`, Codex OAuth token), the auxiliary system silently uses them — even though the user explicitly configured a local model as their base.
User experience
Suggested fixes
Minimal: Surface auxiliary routing in /usage
Add a line to `/usage` output showing which provider is being used for auxiliary tasks:
```
Auxiliary model: openrouter/google/gemini-3-flash (auto-detected)
```
Better: Warn on mismatch
If the base model is local (custom endpoint) but auxiliary auto-resolved to a paid provider, emit a one-time warning:
⚠️ Auxiliary tasks (compression, vision) are using OpenRouter.
```
To route everything through your local model, add to config.yaml:
auxiliary:
compression:
provider: custom
base_url: http://localhost:1234/v1
```
Best: `hermes setup` should ask
When the user selects a local model, setup should ask: "Route auxiliary tasks (compression, vision) through this model too? [y/N]" and configure `auxiliary.*` accordingly.
Technical details