Description
resolve_runtime_provider does not accept an api_mode parameter. When smart model routing routes a message to the cheap_model, the configured api_mode (e.g. anthropic_messages) is silently dropped, and the runtime defaults to chat_completions.
This breaks local inference servers that expect a specific API format, particularly when using Anthropic-format endpoints with SSD KV caching where the API mode determines cache hit behavior.
Steps to Reproduce
- Configure smart routing with a cheap model that requires
anthropic_messages API mode:
smart_model_routing:
enabled: true
cheap_model:
provider: custom
model: local-model
base_url: http://127.0.0.1:8000
api_mode: anthropic_messages
- Send a simple message that routes to the cheap model
- Observe that the API call uses
chat_completions format instead of anthropic_messages
Expected Behavior
api_mode from cheap_model config should take priority, falling back to the runtime-resolved value.
Actual Behavior
api_mode is always taken from runtime.get("api_mode"), ignoring the explicit config value.
Affected Component
Agent - Smart model routing (agent/smart_model_routing.py)
Platform
All platforms
Hermes Version
v0.8.0
Description
resolve_runtime_providerdoes not accept anapi_modeparameter. When smart model routing routes a message to thecheap_model, the configuredapi_mode(e.g.anthropic_messages) is silently dropped, and the runtime defaults tochat_completions.This breaks local inference servers that expect a specific API format, particularly when using Anthropic-format endpoints with SSD KV caching where the API mode determines cache hit behavior.
Steps to Reproduce
anthropic_messagesAPI mode:chat_completionsformat instead ofanthropic_messagesExpected Behavior
api_modefromcheap_modelconfig should take priority, falling back to the runtime-resolved value.Actual Behavior
api_modeis always taken fromruntime.get("api_mode"), ignoring the explicit config value.Affected Component
Agent - Smart model routing (
agent/smart_model_routing.py)Platform
All platforms
Hermes Version
v0.8.0