[Bug] Provider auto-detection ignores config.yaml — stale API keys override explicit custom endpoint

## Problem

Users who set their base model to a local LLM (e.g., Qwen 3.5 35B) are surprised to see OpenAI/Codex API calls burning tokens. The cause: the **auxiliary model system** for background tasks (compression, vision, smart approval, memory flush, session search) auto-resolves independently of the base model.

The auto-detection chain at \`auxiliary_client.py:759\`:
\`\`\`
OpenRouter → Nous Portal → Custom endpoint → Codex → API key provider → None
\`\`\`

If the user has any of these API keys set from a previous setup (\`OPENROUTER_API_KEY\`, \`OPENAI_API_KEY\`, Codex OAuth token), the auxiliary system silently uses them — even though the user explicitly configured a local model as their base.

## User experience

1. User runs \`hermes setup\`, selects local Qwen 3.5 35B as their model
2. User also has \`OPENROUTER_API_KEY\` set in \`.env\` from a previous configuration
3. User starts chatting — base model uses local Qwen correctly
4. Context compression fires → silently routes to OpenRouter → burns tokens
5. Vision preprocessing fires → silently routes to OpenRouter → burns more tokens
6. User has no idea this is happening until they check their API billing

## Suggested fixes

### Minimal: Surface auxiliary routing in /usage
Add a line to \`/usage\` output showing which provider is being used for auxiliary tasks:
\`\`\`
Auxiliary model: openrouter/google/gemini-3-flash (auto-detected)
\`\`\`

### Better: Warn on mismatch
If the base model is local (custom endpoint) but auxiliary auto-resolved to a paid provider, emit a one-time warning:
\`\`\`
⚠️ Auxiliary tasks (compression, vision) are using OpenRouter. 
To route everything through your local model, add to config.yaml:
  auxiliary:
    compression:
      provider: custom
      base_url: http://localhost:1234/v1
\`\`\`

### Best: \`hermes setup\` should ask
When the user selects a local model, setup should ask: "Route auxiliary tasks (compression, vision) through this model too? [y/N]" and configure \`auxiliary.*\` accordingly.

## Technical details

- Auxiliary routing: \`agent/auxiliary_client.py:754-776\` (\`_resolve_auto\`)
- Task-specific config: \`auxiliary.{task}.provider\` and \`auxiliary.{task}.model\` in config.yaml
- Tasks that use auxiliary: compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories
- The user CAN configure this manually, but setup doesn't guide them to it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Provider auto-detection ignores config.yaml — stale API keys override explicit custom endpoint #4171

Problem

User experience

Suggested fixes

Minimal: Surface auxiliary routing in /usage

Better: Warn on mismatch

Best: `hermes setup` should ask

Technical details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] Provider auto-detection ignores config.yaml — stale API keys override explicit custom endpoint #4171

Description

Problem

User experience

Suggested fixes

Minimal: Surface auxiliary routing in /usage

Better: Warn on mismatch

Best: `hermes setup` should ask

Technical details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions