[Feature]: LLM model switch by skill

### Problem or Use Case

Skills currently inherit the main agent's LLM model, which is set globally via `config.yaml` or the `hermes model` CLI command. All skills — regardless of complexity — run on the same model. This means users pay top-tier model costs even for simple, well-structured skills (e.g., `gif-search`, `find-nearby`, `arxiv`) that could run reliably on a cheaper, faster model.

The current workarounds are:

- **Manual model switching** via `hermes model`, which requires relaunching the CLI session — disruptive and impractical mid-workflow.
- **Delegated tasks** via `delegate_task()`, which support a separate `delegation.model`/`delegation.provider` in config — but delegated agents run in an isolated environment and cannot access the parent agent's sandbox files, making them unsuitable for skills that need to read/write files in-place.
- **Smart model routing** (`smart_model_routing`), which routes *simple user messages* to a cheap model — but this is turn-level, not skill-level, and has no awareness of which skill is active.

None of these allow a skill author to declare "this skill works fine on Gemini Flash" or a user to override the model for a specific skill.

### Proposed Solution

Allow skills to declare a preferred `model` and `provider` in their SKILL.md frontmatter, and apply a temporary model switch when the skill is invoked via `/skill-name`. The agent switches back to the primary model when the skill's turn completes.

This reuses the existing `AIAgent.switch_model()` infrastructure (already used by `/model` and fallback logic) and the existing `metadata.hermes.config` pattern (already used by skills like `llm-wiki`).

#### SKILL.md frontmatter addition

Skill authors add an optional `model` block under `metadata.hermes`:

```yaml
---
name: gif-search
description: "Search and display GIFs"
version: 1.0.0
metadata:
  hermes:
    tags: [media, fun]
    model:
      provider: openrouter
      model: google/gemini-2.5-flash
---
```

Both fields are optional. If only `model` is set, the current provider is kept. If neither is set, the skill inherits the main agent's model (current behavior).

Users can also override per-skill models globally in `config.yaml`, taking precedence over the SKILL.md declaration:

```yaml
skills:
  model_overrides:
    gif-search:
      provider: openrouter
      model: google/gemini-2.5-flash
    arxiv:
      model: google/gemini-2.5-flash
```

#### Resolution order

1. `config.yaml` → `skills.model_overrides.<skill-name>` (user override, highest priority)
2. `SKILL.md` → `metadata.hermes.model` (skill author's recommendation)
3. Main agent model (current behavior, fallback)

#### Files changed

**1. `agent/skill_commands.py`** — extract and return model override metadata

In `_load_skill_payload()` (line 45): after loading the skill, also extract `metadata.hermes.model` from the frontmatter and include it in the returned tuple.

Add a new function `resolve_skill_model_override(skill_name, frontmatter)` that:
- Reads `skills.model_overrides.<skill_name>` from `config.yaml` (via `skill_utils._resolve_dotpath`)
- Falls back to `metadata.hermes.model` from the skill's frontmatter
- Returns `{"model": ..., "provider": ...}` or `None` if no override

Modify `build_skill_invocation_message()` (line 291) to return a `(message, model_override)` tuple instead of just the message string. The `model_override` is the resolved dict or `None`.

**2. `cli.py`** — apply model switch on skill invocation

At line ~4564, where `/skill-name` is handled: after calling `build_skill_invocation_message()`, if a `model_override` is returned, call `self.agent.switch_model()` before injecting the skill message, and schedule a restore to the primary model after the agent's response completes.

Before (line 4566-4574):
```python
msg = build_skill_invocation_message(
    base_cmd, user_instruction, task_id=self.session_id
)
if msg:
    skill_name = _skill_commands[base_cmd]["name"]
    print(f"\n⚡ Loading skill: {skill_name}")
    if hasattr(self, '_pending_input'):
        self._pending_input.put(msg)
```

After:
```python
result = build_skill_invocation_message(
    base_cmd, user_instruction, task_id=self.session_id
)
if result:
    msg, model_override = result
    skill_name = _skill_commands[base_cmd]["name"]
    if model_override:
        # Stash current model for restore after skill turn
        self._skill_model_stash = {
            "model": self.agent.model,
            "provider": self.agent.provider,
            "api_key": self.agent.api_key,
            "base_url": self.agent.base_url,
            "api_mode": self.agent.api_mode,
        }
        self.agent.switch_model(
            model_override["model"],
            model_override["provider"],
            api_key=model_override.get("api_key", ""),
            base_url=model_override.get("base_url", ""),
        )
        print(f"\n⚡ Loading skill: {skill_name} (using {model_override['model']})")
    else:
        self._skill_model_stash = None
        print(f"\n⚡ Loading skill: {skill_name}")
    if hasattr(self, '_pending_input'):
        self._pending_input.put(msg)
```

**3. `cli.py`** — restore primary model after skill turn

After the agent produces its response (in the main loop where the agent's reply is processed), check if `self._skill_model_stash` is set and restore:

```python
if getattr(self, '_skill_model_stash', None):
    stash = self._skill_model_stash
    self.agent.switch_model(
        stash["model"], stash["provider"],
        api_key=stash["api_key"],
        base_url=stash["base_url"],
        api_mode=stash["api_mode"],
    )
    self._skill_model_stash = None
```

**4. `gateway/run.py`** — same skill model switch for gateway surface

Apply the same pattern as cli.py for the gateway's skill invocation path, so Telegram/Discord/Slack users also benefit.

**5. `cli-config.yaml.example`** — document the new config option

Add a commented-out `skills.model_overrides` section near the existing `skills:` block:

```yaml
skills:
  # Per-skill model overrides — use a cheaper/faster model for specific skills.
  # Skill authors can also declare a default in SKILL.md (metadata.hermes.model).
  # Config overrides take precedence over SKILL.md declarations.
  # model_overrides:
  #   gif-search:
  #     provider: openrouter
  #     model: google/gemini-2.5-flash
  #   arxiv:
  #     model: google/gemini-2.5-flash
```

#### What this does NOT change

- **No changes to `run_agent.py`** — `switch_model()` already exists and handles all client rebuilding.
- **No changes to `tools/skills_tool.py`** — skill discovery and loading stay the same.
- **No changes to the delegation system** — this is orthogonal to `delegate_task()`.
- **No changes to `smart_model_routing`** — that feature remains turn-level; this is skill-level.
- **No new dependencies** — reuses `switch_model()`, `_resolve_dotpath()`, `resolve_runtime_provider()`.

### Alternatives Considered

1. **Extend `smart_model_routing` to be skill-aware.** Rejected because smart routing is designed for message complexity heuristics, not skill identity. Mixing the two concerns would make both harder to configure and debug.

2. **Use `delegate_task()` for skill execution.** Already possible but fundamentally limited: delegated agents run in an isolated environment, cannot access the parent sandbox files, and have a restricted toolset. Many skills (e.g., `plan`, `llm-wiki`, `obsidian`) need to read/write files in the main workspace.

3. **Add a `/model-for-next` command** that temporarily overrides the model for the next turn only. Simpler but requires the user to remember to type it before every skill invocation — poor UX compared to a declarative per-skill default.

4. **Use `metadata.hermes.config` with a `model` key** (like `llm-wiki` uses for `wiki.path`). This would inject the model as a config value visible to the agent, but wouldn't actually switch the runtime model — the agent would still use the primary model. The agent can't switch its own model mid-conversation.

### Feature Type

Configuration option

### Scope

Medium (few files, < 300 lines)

### Contribution

- [x] I'd like to implement this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: LLM model switch by skill #5997

Problem or Use Case

Proposed Solution

SKILL.md frontmatter addition

Resolution order

Files changed

What this does NOT change

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: LLM model switch by skill #5997

Description

Problem or Use Case

Proposed Solution

SKILL.md frontmatter addition

Resolution order

Files changed

What this does NOT change

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions