Skip to content

[Feature]: LLM model switch by skill #5997

@kentimsit

Description

@kentimsit

Problem or Use Case

Skills currently inherit the main agent's LLM model, which is set globally via config.yaml or the hermes model CLI command. All skills — regardless of complexity — run on the same model. This means users pay top-tier model costs even for simple, well-structured skills (e.g., gif-search, find-nearby, arxiv) that could run reliably on a cheaper, faster model.

The current workarounds are:

  • Manual model switching via hermes model, which requires relaunching the CLI session — disruptive and impractical mid-workflow.
  • Delegated tasks via delegate_task(), which support a separate delegation.model/delegation.provider in config — but delegated agents run in an isolated environment and cannot access the parent agent's sandbox files, making them unsuitable for skills that need to read/write files in-place.
  • Smart model routing (smart_model_routing), which routes simple user messages to a cheap model — but this is turn-level, not skill-level, and has no awareness of which skill is active.

None of these allow a skill author to declare "this skill works fine on Gemini Flash" or a user to override the model for a specific skill.

Proposed Solution

Allow skills to declare a preferred model and provider in their SKILL.md frontmatter, and apply a temporary model switch when the skill is invoked via /skill-name. The agent switches back to the primary model when the skill's turn completes.

This reuses the existing AIAgent.switch_model() infrastructure (already used by /model and fallback logic) and the existing metadata.hermes.config pattern (already used by skills like llm-wiki).

SKILL.md frontmatter addition

Skill authors add an optional model block under metadata.hermes:

---
name: gif-search
description: "Search and display GIFs"
version: 1.0.0
metadata:
  hermes:
    tags: [media, fun]
    model:
      provider: openrouter
      model: google/gemini-2.5-flash
---

Both fields are optional. If only model is set, the current provider is kept. If neither is set, the skill inherits the main agent's model (current behavior).

Users can also override per-skill models globally in config.yaml, taking precedence over the SKILL.md declaration:

skills:
  model_overrides:
    gif-search:
      provider: openrouter
      model: google/gemini-2.5-flash
    arxiv:
      model: google/gemini-2.5-flash

Resolution order

  1. config.yamlskills.model_overrides.<skill-name> (user override, highest priority)
  2. SKILL.mdmetadata.hermes.model (skill author's recommendation)
  3. Main agent model (current behavior, fallback)

Files changed

1. agent/skill_commands.py — extract and return model override metadata

In _load_skill_payload() (line 45): after loading the skill, also extract metadata.hermes.model from the frontmatter and include it in the returned tuple.

Add a new function resolve_skill_model_override(skill_name, frontmatter) that:

  • Reads skills.model_overrides.<skill_name> from config.yaml (via skill_utils._resolve_dotpath)
  • Falls back to metadata.hermes.model from the skill's frontmatter
  • Returns {"model": ..., "provider": ...} or None if no override

Modify build_skill_invocation_message() (line 291) to return a (message, model_override) tuple instead of just the message string. The model_override is the resolved dict or None.

2. cli.py — apply model switch on skill invocation

At line ~4564, where /skill-name is handled: after calling build_skill_invocation_message(), if a model_override is returned, call self.agent.switch_model() before injecting the skill message, and schedule a restore to the primary model after the agent's response completes.

Before (line 4566-4574):

msg = build_skill_invocation_message(
    base_cmd, user_instruction, task_id=self.session_id
)
if msg:
    skill_name = _skill_commands[base_cmd]["name"]
    print(f"\n⚡ Loading skill: {skill_name}")
    if hasattr(self, '_pending_input'):
        self._pending_input.put(msg)

After:

result = build_skill_invocation_message(
    base_cmd, user_instruction, task_id=self.session_id
)
if result:
    msg, model_override = result
    skill_name = _skill_commands[base_cmd]["name"]
    if model_override:
        # Stash current model for restore after skill turn
        self._skill_model_stash = {
            "model": self.agent.model,
            "provider": self.agent.provider,
            "api_key": self.agent.api_key,
            "base_url": self.agent.base_url,
            "api_mode": self.agent.api_mode,
        }
        self.agent.switch_model(
            model_override["model"],
            model_override["provider"],
            api_key=model_override.get("api_key", ""),
            base_url=model_override.get("base_url", ""),
        )
        print(f"\n⚡ Loading skill: {skill_name} (using {model_override['model']})")
    else:
        self._skill_model_stash = None
        print(f"\n⚡ Loading skill: {skill_name}")
    if hasattr(self, '_pending_input'):
        self._pending_input.put(msg)

3. cli.py — restore primary model after skill turn

After the agent produces its response (in the main loop where the agent's reply is processed), check if self._skill_model_stash is set and restore:

if getattr(self, '_skill_model_stash', None):
    stash = self._skill_model_stash
    self.agent.switch_model(
        stash["model"], stash["provider"],
        api_key=stash["api_key"],
        base_url=stash["base_url"],
        api_mode=stash["api_mode"],
    )
    self._skill_model_stash = None

4. gateway/run.py — same skill model switch for gateway surface

Apply the same pattern as cli.py for the gateway's skill invocation path, so Telegram/Discord/Slack users also benefit.

5. cli-config.yaml.example — document the new config option

Add a commented-out skills.model_overrides section near the existing skills: block:

skills:
  # Per-skill model overrides — use a cheaper/faster model for specific skills.
  # Skill authors can also declare a default in SKILL.md (metadata.hermes.model).
  # Config overrides take precedence over SKILL.md declarations.
  # model_overrides:
  #   gif-search:
  #     provider: openrouter
  #     model: google/gemini-2.5-flash
  #   arxiv:
  #     model: google/gemini-2.5-flash

What this does NOT change

  • No changes to run_agent.pyswitch_model() already exists and handles all client rebuilding.
  • No changes to tools/skills_tool.py — skill discovery and loading stay the same.
  • No changes to the delegation system — this is orthogonal to delegate_task().
  • No changes to smart_model_routing — that feature remains turn-level; this is skill-level.
  • No new dependencies — reuses switch_model(), _resolve_dotpath(), resolve_runtime_provider().

Alternatives Considered

  1. Extend smart_model_routing to be skill-aware. Rejected because smart routing is designed for message complexity heuristics, not skill identity. Mixing the two concerns would make both harder to configure and debug.

  2. Use delegate_task() for skill execution. Already possible but fundamentally limited: delegated agents run in an isolated environment, cannot access the parent sandbox files, and have a restricted toolset. Many skills (e.g., plan, llm-wiki, obsidian) need to read/write files in the main workspace.

  3. Add a /model-for-next command that temporarily overrides the model for the next turn only. Simpler but requires the user to remember to type it before every skill invocation — poor UX compared to a declarative per-skill default.

  4. Use metadata.hermes.config with a model key (like llm-wiki uses for wiki.path). This would inject the model as a config value visible to the agent, but wouldn't actually switch the runtime model — the agent would still use the primary model. The agent can't switch its own model mid-conversation.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions