Problem or Use Case
Skills currently inherit the main agent's LLM model, which is set globally via config.yaml or the hermes model CLI command. All skills — regardless of complexity — run on the same model. This means users pay top-tier model costs even for simple, well-structured skills (e.g., gif-search, find-nearby, arxiv) that could run reliably on a cheaper, faster model.
The current workarounds are:
- Manual model switching via
hermes model, which requires relaunching the CLI session — disruptive and impractical mid-workflow.
- Delegated tasks via
delegate_task(), which support a separate delegation.model/delegation.provider in config — but delegated agents run in an isolated environment and cannot access the parent agent's sandbox files, making them unsuitable for skills that need to read/write files in-place.
- Smart model routing (
smart_model_routing), which routes simple user messages to a cheap model — but this is turn-level, not skill-level, and has no awareness of which skill is active.
None of these allow a skill author to declare "this skill works fine on Gemini Flash" or a user to override the model for a specific skill.
Proposed Solution
Allow skills to declare a preferred model and provider in their SKILL.md frontmatter, and apply a temporary model switch when the skill is invoked via /skill-name. The agent switches back to the primary model when the skill's turn completes.
This reuses the existing AIAgent.switch_model() infrastructure (already used by /model and fallback logic) and the existing metadata.hermes.config pattern (already used by skills like llm-wiki).
SKILL.md frontmatter addition
Skill authors add an optional model block under metadata.hermes:
---
name: gif-search
description: "Search and display GIFs"
version: 1.0.0
metadata:
hermes:
tags: [media, fun]
model:
provider: openrouter
model: google/gemini-2.5-flash
---
Both fields are optional. If only model is set, the current provider is kept. If neither is set, the skill inherits the main agent's model (current behavior).
Users can also override per-skill models globally in config.yaml, taking precedence over the SKILL.md declaration:
skills:
model_overrides:
gif-search:
provider: openrouter
model: google/gemini-2.5-flash
arxiv:
model: google/gemini-2.5-flash
Resolution order
config.yaml → skills.model_overrides.<skill-name> (user override, highest priority)
SKILL.md → metadata.hermes.model (skill author's recommendation)
- Main agent model (current behavior, fallback)
Files changed
1. agent/skill_commands.py — extract and return model override metadata
In _load_skill_payload() (line 45): after loading the skill, also extract metadata.hermes.model from the frontmatter and include it in the returned tuple.
Add a new function resolve_skill_model_override(skill_name, frontmatter) that:
- Reads
skills.model_overrides.<skill_name> from config.yaml (via skill_utils._resolve_dotpath)
- Falls back to
metadata.hermes.model from the skill's frontmatter
- Returns
{"model": ..., "provider": ...} or None if no override
Modify build_skill_invocation_message() (line 291) to return a (message, model_override) tuple instead of just the message string. The model_override is the resolved dict or None.
2. cli.py — apply model switch on skill invocation
At line ~4564, where /skill-name is handled: after calling build_skill_invocation_message(), if a model_override is returned, call self.agent.switch_model() before injecting the skill message, and schedule a restore to the primary model after the agent's response completes.
Before (line 4566-4574):
msg = build_skill_invocation_message(
base_cmd, user_instruction, task_id=self.session_id
)
if msg:
skill_name = _skill_commands[base_cmd]["name"]
print(f"\n⚡ Loading skill: {skill_name}")
if hasattr(self, '_pending_input'):
self._pending_input.put(msg)
After:
result = build_skill_invocation_message(
base_cmd, user_instruction, task_id=self.session_id
)
if result:
msg, model_override = result
skill_name = _skill_commands[base_cmd]["name"]
if model_override:
# Stash current model for restore after skill turn
self._skill_model_stash = {
"model": self.agent.model,
"provider": self.agent.provider,
"api_key": self.agent.api_key,
"base_url": self.agent.base_url,
"api_mode": self.agent.api_mode,
}
self.agent.switch_model(
model_override["model"],
model_override["provider"],
api_key=model_override.get("api_key", ""),
base_url=model_override.get("base_url", ""),
)
print(f"\n⚡ Loading skill: {skill_name} (using {model_override['model']})")
else:
self._skill_model_stash = None
print(f"\n⚡ Loading skill: {skill_name}")
if hasattr(self, '_pending_input'):
self._pending_input.put(msg)
3. cli.py — restore primary model after skill turn
After the agent produces its response (in the main loop where the agent's reply is processed), check if self._skill_model_stash is set and restore:
if getattr(self, '_skill_model_stash', None):
stash = self._skill_model_stash
self.agent.switch_model(
stash["model"], stash["provider"],
api_key=stash["api_key"],
base_url=stash["base_url"],
api_mode=stash["api_mode"],
)
self._skill_model_stash = None
4. gateway/run.py — same skill model switch for gateway surface
Apply the same pattern as cli.py for the gateway's skill invocation path, so Telegram/Discord/Slack users also benefit.
5. cli-config.yaml.example — document the new config option
Add a commented-out skills.model_overrides section near the existing skills: block:
skills:
# Per-skill model overrides — use a cheaper/faster model for specific skills.
# Skill authors can also declare a default in SKILL.md (metadata.hermes.model).
# Config overrides take precedence over SKILL.md declarations.
# model_overrides:
# gif-search:
# provider: openrouter
# model: google/gemini-2.5-flash
# arxiv:
# model: google/gemini-2.5-flash
What this does NOT change
- No changes to
run_agent.py — switch_model() already exists and handles all client rebuilding.
- No changes to
tools/skills_tool.py — skill discovery and loading stay the same.
- No changes to the delegation system — this is orthogonal to
delegate_task().
- No changes to
smart_model_routing — that feature remains turn-level; this is skill-level.
- No new dependencies — reuses
switch_model(), _resolve_dotpath(), resolve_runtime_provider().
Alternatives Considered
-
Extend smart_model_routing to be skill-aware. Rejected because smart routing is designed for message complexity heuristics, not skill identity. Mixing the two concerns would make both harder to configure and debug.
-
Use delegate_task() for skill execution. Already possible but fundamentally limited: delegated agents run in an isolated environment, cannot access the parent sandbox files, and have a restricted toolset. Many skills (e.g., plan, llm-wiki, obsidian) need to read/write files in the main workspace.
-
Add a /model-for-next command that temporarily overrides the model for the next turn only. Simpler but requires the user to remember to type it before every skill invocation — poor UX compared to a declarative per-skill default.
-
Use metadata.hermes.config with a model key (like llm-wiki uses for wiki.path). This would inject the model as a config value visible to the agent, but wouldn't actually switch the runtime model — the agent would still use the primary model. The agent can't switch its own model mid-conversation.
Feature Type
Configuration option
Scope
Medium (few files, < 300 lines)
Contribution
Problem or Use Case
Skills currently inherit the main agent's LLM model, which is set globally via
config.yamlor thehermes modelCLI command. All skills — regardless of complexity — run on the same model. This means users pay top-tier model costs even for simple, well-structured skills (e.g.,gif-search,find-nearby,arxiv) that could run reliably on a cheaper, faster model.The current workarounds are:
hermes model, which requires relaunching the CLI session — disruptive and impractical mid-workflow.delegate_task(), which support a separatedelegation.model/delegation.providerin config — but delegated agents run in an isolated environment and cannot access the parent agent's sandbox files, making them unsuitable for skills that need to read/write files in-place.smart_model_routing), which routes simple user messages to a cheap model — but this is turn-level, not skill-level, and has no awareness of which skill is active.None of these allow a skill author to declare "this skill works fine on Gemini Flash" or a user to override the model for a specific skill.
Proposed Solution
Allow skills to declare a preferred
modelandproviderin their SKILL.md frontmatter, and apply a temporary model switch when the skill is invoked via/skill-name. The agent switches back to the primary model when the skill's turn completes.This reuses the existing
AIAgent.switch_model()infrastructure (already used by/modeland fallback logic) and the existingmetadata.hermes.configpattern (already used by skills likellm-wiki).SKILL.md frontmatter addition
Skill authors add an optional
modelblock undermetadata.hermes:Both fields are optional. If only
modelis set, the current provider is kept. If neither is set, the skill inherits the main agent's model (current behavior).Users can also override per-skill models globally in
config.yaml, taking precedence over the SKILL.md declaration:Resolution order
config.yaml→skills.model_overrides.<skill-name>(user override, highest priority)SKILL.md→metadata.hermes.model(skill author's recommendation)Files changed
1.
agent/skill_commands.py— extract and return model override metadataIn
_load_skill_payload()(line 45): after loading the skill, also extractmetadata.hermes.modelfrom the frontmatter and include it in the returned tuple.Add a new function
resolve_skill_model_override(skill_name, frontmatter)that:skills.model_overrides.<skill_name>fromconfig.yaml(viaskill_utils._resolve_dotpath)metadata.hermes.modelfrom the skill's frontmatter{"model": ..., "provider": ...}orNoneif no overrideModify
build_skill_invocation_message()(line 291) to return a(message, model_override)tuple instead of just the message string. Themodel_overrideis the resolved dict orNone.2.
cli.py— apply model switch on skill invocationAt line ~4564, where
/skill-nameis handled: after callingbuild_skill_invocation_message(), if amodel_overrideis returned, callself.agent.switch_model()before injecting the skill message, and schedule a restore to the primary model after the agent's response completes.Before (line 4566-4574):
After:
3.
cli.py— restore primary model after skill turnAfter the agent produces its response (in the main loop where the agent's reply is processed), check if
self._skill_model_stashis set and restore:4.
gateway/run.py— same skill model switch for gateway surfaceApply the same pattern as cli.py for the gateway's skill invocation path, so Telegram/Discord/Slack users also benefit.
5.
cli-config.yaml.example— document the new config optionAdd a commented-out
skills.model_overridessection near the existingskills:block:What this does NOT change
run_agent.py—switch_model()already exists and handles all client rebuilding.tools/skills_tool.py— skill discovery and loading stay the same.delegate_task().smart_model_routing— that feature remains turn-level; this is skill-level.switch_model(),_resolve_dotpath(),resolve_runtime_provider().Alternatives Considered
Extend
smart_model_routingto be skill-aware. Rejected because smart routing is designed for message complexity heuristics, not skill identity. Mixing the two concerns would make both harder to configure and debug.Use
delegate_task()for skill execution. Already possible but fundamentally limited: delegated agents run in an isolated environment, cannot access the parent sandbox files, and have a restricted toolset. Many skills (e.g.,plan,llm-wiki,obsidian) need to read/write files in the main workspace.Add a
/model-for-nextcommand that temporarily overrides the model for the next turn only. Simpler but requires the user to remember to type it before every skill invocation — poor UX compared to a declarative per-skill default.Use
metadata.hermes.configwith amodelkey (likellm-wikiuses forwiki.path). This would inject the model as a config value visible to the agent, but wouldn't actually switch the runtime model — the agent would still use the primary model. The agent can't switch its own model mid-conversation.Feature Type
Configuration option
Scope
Medium (few files, < 300 lines)
Contribution