Problem or Use Case
Summary
Add support for routing auxiliary tasks (compression, tool decisions, heartbeats) to a local model endpoint (e.g. Ollama) independently of the main model provider.
Motivation
The current auxiliary client resolution chain prioritizes cloud providers (OpenRouter → Nous Portal → custom endpoint). Users running a hybrid setup — cloud API for primary reasoning, local inference for lightweight tasks — have no way to direct auxiliary calls to a local model without sacrificing their primary provider configuration.
This matters most for:
Heartbeat/routing tasks that don't require frontier model quality
Cost reduction on high-frequency low-complexity auxiliary calls
Users who want local-first architecture for side tasks while keeping a cloud primary model
Related Issue
This is also the architectural prerequisite for the multi-model hybrid setup scenario raised as an open question in #523 ('Should the skill cover multi-model setups — local model for fast tasks, cloud model for complex reasoning?'). That skill cannot deliver hybrid routing without a config layer that supports directing auxiliary tasks to a local endpoint independently of the main provider. Related to #157, which addresses capability-based routing for the main agent loop — this proposal specifically targets the auxiliary client (auxiliary_client.py) and focuses on local endpoint support rather than capability categorization.
Proposed Solution
Add a dedicated auxiliary.local configuration block in config.yaml:
auxiliary: local: base_url: http://localhost:11434/v1 model: qwen3:8b tasks: - compression - web_extract - vision
Tasks listed under local would bypass the existing resolution chain and route directly to the specified local endpoint. Tasks not listed would continue using the existing chain.
Alternatively, expose per-task provider override env vars that accept ollama as a valid provider alongside the existing openrouter, nous, and main options.
Current Workaround
None without removing OPENROUTER_API_KEY from the environment, which breaks the primary provider configuration.
Environment
Hermes Agent v1.0.0
WSL2 (Ubuntu 22.04) on Windows 10
Ollama 0.17.7 with qwen3:8b running locally
Primary model: minimax-m2.5 via Nous Portal
Alternatives Considered
Setting OPENAI_BASE_URL to a local Ollama endpoint — works but requires removing OPENROUTER_API_KEY, which breaks the primary provider and the entire resolution chain for non-auxiliary tasks.
Using AUXILIARY_*_PROVIDER env var overrides — the existing override mechanism only accepts openrouter, nous, codex, and main as valid values. Ollama/local endpoints are not supported options.
Running a separate Hermes instance pointed at Ollama — technically possible but operationally awkward, defeats the purpose of a unified agent, and doubles session/memory overhead.
Modifying source directly — viable for technical users but not sustainable across updates and puts the burden on individual users to re-patch after every Hermes update.
Feature Type
Configuration option
Scope
Medium (few files, < 300 lines)
Contribution
Problem or Use Case
Summary
Add support for routing auxiliary tasks (compression, tool decisions, heartbeats) to a local model endpoint (e.g. Ollama) independently of the main model provider.
Motivation
The current auxiliary client resolution chain prioritizes cloud providers (OpenRouter → Nous Portal → custom endpoint). Users running a hybrid setup — cloud API for primary reasoning, local inference for lightweight tasks — have no way to direct auxiliary calls to a local model without sacrificing their primary provider configuration.
This matters most for:
Heartbeat/routing tasks that don't require frontier model quality
Cost reduction on high-frequency low-complexity auxiliary calls
Users who want local-first architecture for side tasks while keeping a cloud primary model
Related Issue
This is also the architectural prerequisite for the multi-model hybrid setup scenario raised as an open question in #523 ('Should the skill cover multi-model setups — local model for fast tasks, cloud model for complex reasoning?'). That skill cannot deliver hybrid routing without a config layer that supports directing auxiliary tasks to a local endpoint independently of the main provider. Related to #157, which addresses capability-based routing for the main agent loop — this proposal specifically targets the auxiliary client (auxiliary_client.py) and focuses on local endpoint support rather than capability categorization.
Proposed Solution
Add a dedicated auxiliary.local configuration block in config.yaml:
auxiliary: local: base_url: http://localhost:11434/v1 model: qwen3:8b tasks: - compression - web_extract - visionTasks listed under local would bypass the existing resolution chain and route directly to the specified local endpoint. Tasks not listed would continue using the existing chain.
Alternatively, expose per-task provider override env vars that accept ollama as a valid provider alongside the existing openrouter, nous, and main options.
Current Workaround
None without removing OPENROUTER_API_KEY from the environment, which breaks the primary provider configuration.
Environment
Hermes Agent v1.0.0
WSL2 (Ubuntu 22.04) on Windows 10
Ollama 0.17.7 with qwen3:8b running locally
Primary model: minimax-m2.5 via Nous Portal
Alternatives Considered
Setting OPENAI_BASE_URL to a local Ollama endpoint — works but requires removing OPENROUTER_API_KEY, which breaks the primary provider and the entire resolution chain for non-auxiliary tasks.
Using AUXILIARY_*_PROVIDER env var overrides — the existing override mechanism only accepts openrouter, nous, codex, and main as valid values. Ollama/local endpoints are not supported options.
Running a separate Hermes instance pointed at Ollama — technically possible but operationally awkward, defeats the purpose of a unified agent, and doubles session/memory overhead.
Modifying source directly — viable for technical users but not sustainable across updates and puts the burden on individual users to re-patch after every Hermes update.
Feature Type
Configuration option
Scope
Medium (few files, < 300 lines)
Contribution