Skip to content

[Feature]: Local model routing for auxiliary tasks (Ollama/custom endpoint support) #879

@eternal4430

Description

@eternal4430

Problem or Use Case

Summary

Add support for routing auxiliary tasks (compression, tool decisions, heartbeats) to a local model endpoint (e.g. Ollama) independently of the main model provider.

Motivation

The current auxiliary client resolution chain prioritizes cloud providers (OpenRouter → Nous Portal → custom endpoint). Users running a hybrid setup — cloud API for primary reasoning, local inference for lightweight tasks — have no way to direct auxiliary calls to a local model without sacrificing their primary provider configuration.
This matters most for:

Heartbeat/routing tasks that don't require frontier model quality
Cost reduction on high-frequency low-complexity auxiliary calls
Users who want local-first architecture for side tasks while keeping a cloud primary model

Related Issue

This is also the architectural prerequisite for the multi-model hybrid setup scenario raised as an open question in #523 ('Should the skill cover multi-model setups — local model for fast tasks, cloud model for complex reasoning?'). That skill cannot deliver hybrid routing without a config layer that supports directing auxiliary tasks to a local endpoint independently of the main provider. Related to #157, which addresses capability-based routing for the main agent loop — this proposal specifically targets the auxiliary client (auxiliary_client.py) and focuses on local endpoint support rather than capability categorization.

Proposed Solution

Add a dedicated auxiliary.local configuration block in config.yaml:

auxiliary: local: base_url: http://localhost:11434/v1 model: qwen3:8b tasks: - compression - web_extract - vision

Tasks listed under local would bypass the existing resolution chain and route directly to the specified local endpoint. Tasks not listed would continue using the existing chain.
Alternatively, expose per-task provider override env vars that accept ollama as a valid provider alongside the existing openrouter, nous, and main options.

Current Workaround

None without removing OPENROUTER_API_KEY from the environment, which breaks the primary provider configuration.

Environment

Hermes Agent v1.0.0
WSL2 (Ubuntu 22.04) on Windows 10
Ollama 0.17.7 with qwen3:8b running locally
Primary model: minimax-m2.5 via Nous Portal

Alternatives Considered

Setting OPENAI_BASE_URL to a local Ollama endpoint — works but requires removing OPENROUTER_API_KEY, which breaks the primary provider and the entire resolution chain for non-auxiliary tasks.

Using AUXILIARY_*_PROVIDER env var overrides — the existing override mechanism only accepts openrouter, nous, codex, and main as valid values. Ollama/local endpoints are not supported options.

Running a separate Hermes instance pointed at Ollama — technically possible but operationally awkward, defeats the purpose of a unified agent, and doubles session/memory overhead.

Modifying source directly — viable for technical users but not sustainable across updates and puts the burden on individual users to re-patch after every Hermes update.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions