Skip to content

[Bug]: Smart routing drops api_mode from cheap_model config #8515

@BukeLy

Description

@BukeLy

Description

resolve_runtime_provider does not accept an api_mode parameter. When smart model routing routes a message to the cheap_model, the configured api_mode (e.g. anthropic_messages) is silently dropped, and the runtime defaults to chat_completions.

This breaks local inference servers that expect a specific API format, particularly when using Anthropic-format endpoints with SSD KV caching where the API mode determines cache hit behavior.

Steps to Reproduce

  1. Configure smart routing with a cheap model that requires anthropic_messages API mode:
    smart_model_routing:
      enabled: true
      cheap_model:
        provider: custom
        model: local-model
        base_url: http://127.0.0.1:8000
        api_mode: anthropic_messages
  2. Send a simple message that routes to the cheap model
  3. Observe that the API call uses chat_completions format instead of anthropic_messages

Expected Behavior

api_mode from cheap_model config should take priority, falling back to the runtime-resolved value.

Actual Behavior

api_mode is always taken from runtime.get("api_mode"), ignoring the explicit config value.

Affected Component

Agent - Smart model routing (agent/smart_model_routing.py)

Platform

All platforms

Hermes Version

v0.8.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/agentCore agent loop, run_agent.py, prompt buildersweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions