[Feature]: Local model routing for auxiliary tasks (Ollama/custom endpoint support)

### Problem or Use Case

**Summary**

Add support for routing auxiliary tasks (compression, tool decisions, heartbeats) to a local model endpoint (e.g. Ollama) independently of the main model provider.

**Motivation**

The current auxiliary client resolution chain prioritizes cloud providers (OpenRouter → Nous Portal → custom endpoint). Users running a hybrid setup — cloud API for primary reasoning, local inference for lightweight tasks — have no way to direct auxiliary calls to a local model without sacrificing their primary provider configuration.
This matters most for:

Heartbeat/routing tasks that don't require frontier model quality
Cost reduction on high-frequency low-complexity auxiliary calls
Users who want local-first architecture for side tasks while keeping a cloud primary model

**Related Issue**

This is also the architectural prerequisite for the multi-model hybrid setup scenario raised as an open question in #523 ('Should the skill cover multi-model setups — local model for fast tasks, cloud model for complex reasoning?'). That skill cannot deliver hybrid routing without a config layer that supports directing auxiliary tasks to a local endpoint independently of the main provider. Related to #157, which addresses capability-based routing for the main agent loop — this proposal specifically targets the auxiliary client (auxiliary_client.py) and focuses on local endpoint support rather than capability categorization.

### Proposed Solution

Add a dedicated auxiliary.local configuration block in config.yaml:

`auxiliary:
  local:
    base_url: http://localhost:11434/v1
    model: qwen3:8b
    tasks:
      - compression
      - web_extract
      - vision`

Tasks listed under local would bypass the existing resolution chain and route directly to the specified local endpoint. Tasks not listed would continue using the existing chain.
Alternatively, expose per-task provider override env vars that accept ollama as a valid provider alongside the existing openrouter, nous, and main options.

**Current Workaround**

None without removing OPENROUTER_API_KEY from the environment, which breaks the primary provider configuration.

**Environment**

Hermes Agent v1.0.0
WSL2 (Ubuntu 22.04) on Windows 10
Ollama 0.17.7 with qwen3:8b running locally
Primary model: minimax-m2.5 via Nous Portal

### Alternatives Considered

**Setting OPENAI_BASE_URL to a local Ollama endpoint** — works but requires removing OPENROUTER_API_KEY, which breaks the primary provider and the entire resolution chain for non-auxiliary tasks.

**Using AUXILIARY_*_PROVIDER env var overrides** — the existing override mechanism only accepts openrouter, nous, codex, and main as valid values. Ollama/local endpoints are not supported options.

**Running a separate Hermes instance pointed at Ollama** — technically possible but operationally awkward, defeats the purpose of a unified agent, and doubles session/memory overhead.

**Modifying source directly** — viable for technical users but not sustainable across updates and puts the burden on individual users to re-patch after every Hermes update.

### Feature Type

Configuration option

### Scope

Medium (few files, < 300 lines)

### Contribution

- [ ] I'd like to implement this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Local model routing for auxiliary tasks (Ollama/custom endpoint support) #879

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: Local model routing for auxiliary tasks (Ollama/custom endpoint support) #879

Description

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions