Summary
Enable end users to configure multiple LLMs across defined capability categories (e.g., speed, intelligence, uncensored, low-cost, reasoning-heavy), and allow tools to request models based on declared requirements rather than relying on a single developer-defined model.
This would introduce a flexible model-routing layer where:
- Users assign models to capability categories.
- Tools specify their needs (e.g., “fast + cheap” vs “high reasoning”).
- The runtime resolves the appropriate model dynamically.
- Optional evaluation metrics help refine model selection over time.
Motivation
Currently, tool developers implicitly choose which model is used. However, different users have different priorities:
- Some prioritize cost efficiency.
- Some prioritize maximum reasoning depth.
- Some need uncensored models.
- Some want ultra-low latency.
- Some may run local models for privacy.
Allowing users to define model assignments per capability category increases:
- Flexibility
- Transparency
- Performance tuning
- Cost control
- Adaptability to new models
It also decouples tool design from specific model vendors.
Proposed Architecture
1. Model Capability Categories
Allow users to define models per category, for example:
models:
fast:
- gpt-4o-mini
- mistral-small
reasoning:
- gpt-4o
- claude-opus
uncensored:
- local-llama
cheap:
- gpt-4o-mini
These categories are user-configurable.
2. Tool-Level Model Requirements
When a tool calls the LLM, it declares its needs:
call_llm(
task="parse structured JSON",
requirements={
"speed": "high",
"reasoning": "low"
}
)
The runtime then selects an appropriate model based on user configuration.
This prevents overusing large models when smaller ones are sufficient.
3. Dynamic Category Resolution and User-Driven Assignment
Tools should be able to dynamically request capability categories or reasoning levels without requiring that every possible category be predefined by the framework. If a tool requests a capability that has not yet been mapped by the user (e.g., "deep_reasoning_level_3" or "creative_uncensored_longform"), the system should gracefully fall back to the default model. At the same time, this unresolved request should appear in the user configuration as an “unassigned capability.” The user can then choose to link that capability to an existing category, assign a specific model, or define a new routing rule. This creates a feedback loop where the system evolves based on actual tool demands rather than requiring exhaustive upfront configuration. Over time, the model routing layer becomes shaped organically by real usage patterns instead of rigid developer assumptions. This approach allows capability taxonomies to emerge from real-world tool usage rather than being hardcoded, making the routing layer extensible and future-proof.
4. Evaluation Layer (Optional but Powerful)
Add an optional evaluation mode where:
- The LLM (or secondary model) evaluates output correctness.
- Success/failure stats are logged per tool-model pairing.
- Developers can analyze performance tradeoffs.
Example stored metrics:
{
"tool": "json_parser",
"model": "gpt-4o-mini",
"success_rate": 0.94,
"avg_latency": 120ms,
"avg_cost": 0.0003
}
This would allow:
- Data-driven model routing
- Automatic optimization
- Tool-specific model recommendations
Benefits
- Decouples tool logic from fixed model assumptions
- Empowers users to control cost, performance, censorship level
- Enables adaptive routing strategies
- Future-proofs the agent against rapid model evolution
- Creates a foundation for self-optimizing agents
Open Questions
- Should routing be rule-based, weighted, or hybrid?
- Should evaluation be user opt-in?
- Should tools declare “minimum viable intelligence” levels?
- Should there be fallback chains if a preferred model fails?
Why This Matters
As model ecosystems diversify (open weights, closed APIs, local models, etc.), a single-model architecture becomes limiting.
A user-configurable routing layer positions Hermes Agent as:
- Vendor-neutral
- Cost-aware
- Performance-tunable
- Adaptable to future model ecosystems
This also aligns with the philosophy of modular, agentic systems rather than monolithic LLM binding.
Summary
Enable end users to configure multiple LLMs across defined capability categories (e.g., speed, intelligence, uncensored, low-cost, reasoning-heavy), and allow tools to request models based on declared requirements rather than relying on a single developer-defined model.
This would introduce a flexible model-routing layer where:
Motivation
Currently, tool developers implicitly choose which model is used. However, different users have different priorities:
Allowing users to define model assignments per capability category increases:
It also decouples tool design from specific model vendors.
Proposed Architecture
1. Model Capability Categories
Allow users to define models per category, for example:
These categories are user-configurable.
2. Tool-Level Model Requirements
When a tool calls the LLM, it declares its needs:
The runtime then selects an appropriate model based on user configuration.
This prevents overusing large models when smaller ones are sufficient.
3. Dynamic Category Resolution and User-Driven Assignment
Tools should be able to dynamically request capability categories or reasoning levels without requiring that every possible category be predefined by the framework. If a tool requests a capability that has not yet been mapped by the user (e.g.,
"deep_reasoning_level_3"or"creative_uncensored_longform"), the system should gracefully fall back to the default model. At the same time, this unresolved request should appear in the user configuration as an “unassigned capability.” The user can then choose to link that capability to an existing category, assign a specific model, or define a new routing rule. This creates a feedback loop where the system evolves based on actual tool demands rather than requiring exhaustive upfront configuration. Over time, the model routing layer becomes shaped organically by real usage patterns instead of rigid developer assumptions. This approach allows capability taxonomies to emerge from real-world tool usage rather than being hardcoded, making the routing layer extensible and future-proof.4. Evaluation Layer (Optional but Powerful)
Add an optional evaluation mode where:
Example stored metrics:
{ "tool": "json_parser", "model": "gpt-4o-mini", "success_rate": 0.94, "avg_latency": 120ms, "avg_cost": 0.0003 }This would allow:
Benefits
Open Questions
Why This Matters
As model ecosystems diversify (open weights, closed APIs, local models, etc.), a single-model architecture becomes limiting.
A user-configurable routing layer positions Hermes Agent as:
This also aligns with the philosophy of modular, agentic systems rather than monolithic LLM binding.