Skip to content

Feature Request: User-Configurable Multi-Model Routing with Capability Categories and Evaluation Feedback #157

@rolyataylor2

Description

@rolyataylor2

Summary

Enable end users to configure multiple LLMs across defined capability categories (e.g., speed, intelligence, uncensored, low-cost, reasoning-heavy), and allow tools to request models based on declared requirements rather than relying on a single developer-defined model.

This would introduce a flexible model-routing layer where:

  • Users assign models to capability categories.
  • Tools specify their needs (e.g., “fast + cheap” vs “high reasoning”).
  • The runtime resolves the appropriate model dynamically.
  • Optional evaluation metrics help refine model selection over time.

Motivation

Currently, tool developers implicitly choose which model is used. However, different users have different priorities:

  • Some prioritize cost efficiency.
  • Some prioritize maximum reasoning depth.
  • Some need uncensored models.
  • Some want ultra-low latency.
  • Some may run local models for privacy.

Allowing users to define model assignments per capability category increases:

  • Flexibility
  • Transparency
  • Performance tuning
  • Cost control
  • Adaptability to new models

It also decouples tool design from specific model vendors.


Proposed Architecture

1. Model Capability Categories

Allow users to define models per category, for example:

models:
  fast:
    - gpt-4o-mini
    - mistral-small
  reasoning:
    - gpt-4o
    - claude-opus
  uncensored:
    - local-llama
  cheap:
    - gpt-4o-mini

These categories are user-configurable.


2. Tool-Level Model Requirements

When a tool calls the LLM, it declares its needs:

call_llm(
    task="parse structured JSON",
    requirements={
        "speed": "high",
        "reasoning": "low"
    }
)

The runtime then selects an appropriate model based on user configuration.

This prevents overusing large models when smaller ones are sufficient.


3. Dynamic Category Resolution and User-Driven Assignment

Tools should be able to dynamically request capability categories or reasoning levels without requiring that every possible category be predefined by the framework. If a tool requests a capability that has not yet been mapped by the user (e.g., "deep_reasoning_level_3" or "creative_uncensored_longform"), the system should gracefully fall back to the default model. At the same time, this unresolved request should appear in the user configuration as an “unassigned capability.” The user can then choose to link that capability to an existing category, assign a specific model, or define a new routing rule. This creates a feedback loop where the system evolves based on actual tool demands rather than requiring exhaustive upfront configuration. Over time, the model routing layer becomes shaped organically by real usage patterns instead of rigid developer assumptions. This approach allows capability taxonomies to emerge from real-world tool usage rather than being hardcoded, making the routing layer extensible and future-proof.


4. Evaluation Layer (Optional but Powerful)

Add an optional evaluation mode where:

  • The LLM (or secondary model) evaluates output correctness.
  • Success/failure stats are logged per tool-model pairing.
  • Developers can analyze performance tradeoffs.

Example stored metrics:

{
  "tool": "json_parser",
  "model": "gpt-4o-mini",
  "success_rate": 0.94,
  "avg_latency": 120ms,
  "avg_cost": 0.0003
}

This would allow:

  • Data-driven model routing
  • Automatic optimization
  • Tool-specific model recommendations

Benefits

  • Decouples tool logic from fixed model assumptions
  • Empowers users to control cost, performance, censorship level
  • Enables adaptive routing strategies
  • Future-proofs the agent against rapid model evolution
  • Creates a foundation for self-optimizing agents

Open Questions

  • Should routing be rule-based, weighted, or hybrid?
  • Should evaluation be user opt-in?
  • Should tools declare “minimum viable intelligence” levels?
  • Should there be fallback chains if a preferred model fails?

Why This Matters

As model ecosystems diversify (open weights, closed APIs, local models, etc.), a single-model architecture becomes limiting.

A user-configurable routing layer positions Hermes Agent as:

  • Vendor-neutral
  • Cost-aware
  • Performance-tunable
  • Adaptable to future model ecosystems

This also aligns with the philosophy of modular, agentic systems rather than monolithic LLM binding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/agentCore agent loop, run_agent.py, prompt buildertype/featureNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions