Skip to content

Feat/delegate model parameter#7586

Open
Labhund wants to merge 3 commits into
NousResearch:mainfrom
Labhund:feat/delegate-model-parameter
Open

Feat/delegate model parameter#7586
Labhund wants to merge 3 commits into
NousResearch:mainfrom
Labhund:feat/delegate-model-parameter

Conversation

@Labhund

@Labhund Labhund commented Apr 11, 2026

Copy link
Copy Markdown

Per-Task Model Parameter for delegate_task

Summary

delegate_task now accepts an optional model parameter at both the top level and per-task inside the tasks array. It routes each subagent through the same resolution pipeline as the /model slash command — aliases, direct mappings, catalog search, credential resolution, and cross-provider routing.

Key benefit: Enables subagent-driven development where the main agent picks model capability per task — haiku for lookups, sonnet for moderate reasoning, opus for the hard stuff, all in parallel, improving cost, speed, and context preservation in the parent.


What Changed

1. Core Implementation (tools/delegate_tool.py)

New Helper: _resolve_model_override()

  • Wraps hermes_cli.model_switch.switch_model to convert user-friendly model strings into full credential bundles
  • Handles three syntax forms:
    • Bare model ID: "haiku", "sonnet", "glm-4.7"
    • Short alias: "opus", "grok", "gemini"
    • Provider switch: "stepfun/step-3.5-flash --provider openrouter"
  • Reuses the exact /model pipeline including alias resolution and provider credential lookup
  • Surfaces resolution errors as JSON tool_error messages so the LLM sees them and can retry

Updated delegate_task() Signature

def delegate_task(
    goal: Optional[str] = None,
    context: Optional[str] = None,
    model: Optional[str] = None,  # NEW: per-task model override
    tasks: Optional[list] = None,
    # ... other params
) -> dict

Model Precedence (per-task)

  1. Per-task model in the task dict (highest priority)
  2. Top-level model argument to delegate_task(...)
  3. delegation.model config from ~/.hermes/config.yaml
  4. Parent's model (fallback — inherit)

Task Normalization Loop

Each task in the batch now resolves its own credentials:

for task in tasks:
    task_model = task.get("model") or model or delegation_model or parent_model
    task_credentials = _resolve_model_override(task_model, parent_agent)
    # build child with task-specific model

DELEGATE_TASK_SCHEMA Updates

  • Added model field at top level with examples
  • Added model field inside tasks[].items schema
  • Explicit anti-pattern documentation: "DO NOT use colon-prefix syntax (e.g. 'openrouter:stepfun/...')"
  • Concrete examples of the --provider flag syntax to guide the LLM

2. Documentation (website/docs/user-guide/features/delegation.md)

New "Per-Task Model Selection" section covering:

  • Model parameter syntax (bare, alias, --provider switch)
  • Valid provider slugs list
  • Batch with mixed models example
  • Model precedence rules
  • Configuration fallback
  • Common patterns (cost optimization, rate-limit relief, OpenRouter :free variants)

Why This Matters

Subagent-Driven Development

Traditionally, all subagents inherit the parent's model. With per-task routing:

# Before: all subagents run on parent's model (expensive if parent is opus)
delegate_task(tasks=[
    {"goal": "Find TODOs in src/"},           # Could use cheap haiku
    {"goal": "Redesign auth flow"},           # Needs sonnet
    {"goal": "Analyze algorithm complexity"}  # Needs opus
])

# After: route each to the right capability level
delegate_task(tasks=[
    {"goal": "Find TODOs in src/", "model": "haiku"},
    {"goal": "Redesign auth flow", "model": "sonnet"},
    {"goal": "Analyze algorithm", "model": "opus"}
])

Benefits

  1. Cost efficiency — cheap fast models for simple lookups, expensive models only where needed
  2. Speed — haiku tasks complete in seconds, not waiting for opus
  3. Context preservation — parent doesn't bloat with subagent conversation history; isolated contexts per model
  4. Rate-limit relief — spread load across multiple providers in parallel
  5. Provider agility — main agent can route to best-in-class models (e.g., grok for reasoning, glm-4.7 for coding)

How to Test

Unit Tests

All 83 tests pass (67 existing + 10 model override + 5 toolset scope + 1 schema regression guard):

pytest tests/tools/test_delegate.py -v

New test classes:

  • TestResolveModelOverride — model name parsing, provider flag, empty input, switch_model failures
  • TestDelegateTaskModelOverride — precedence rules, batch with mixed models, per-task beats top-level, config fallback, bad model returns JSON error, registry dispatch

Manual Smoke Test

This is the actual end-to-end test run with three models across providers:

./hermes chat

Then ask:

please in parallel delegate a haiku on three seperate animals to glm-4.7 via zai, 
glm-5.1 via not providing a provider and stepfun/step-3.5-flash via openrouter

Live Smoke Test Output

Initializing agent...
────────────────────────────────────────

┌─ Reasoning ───────────────────────────────────────────────────────────────────────────────────┐
The user wants me to delegate three parallel tasks, each requesting a haiku on a different
animal, to three different models:

glm-4.7 via zai provider
glm-5.1 with no provider specified (will use default)
stepfun/step-3.5-flash via openrouter provider

I need to use the delegate_task function with tasks parameter to run these in parallel.
Each task will be a delegation to a different model as specified.

Let me construct the tasks array with the proper model specifications.
└───────────────────────────────────────────────────────────────────────────────────────────────┘

▣ preparing delegate_task…

✓ [1/3] Write a haiku about a lion. (4.7s)
✓ [2/3] Write a haiku about a dolphin. (9.43s)
✓ [3/3] Write a haiku about a falcon. (12.78s)

┌─ Reasoning ───────────────────────────────────────────────────────────────────────────────────┐
All three tasks completed successfully. Let me present the results clearly.
└───────────────────────────────────────────────────────────────────────────────────────────────┘

╭─ ✿ Sakura ────────────────────────────────────────────────────────────────────────────────────╮

Done. All three haikus completed in parallel (14.08s total):

[GLM-4.7 via ZAI] - Lion:
Golden mane so bright
Roaring loud across the plains
King of all the land

[GLM-5.1] - Dolphin:
Silver leap through waves,
Joyful dance in ocean spray,
Playful sea-born friend.

[StepFun/Step-3.5-Flash via OpenRouter] - Falcon:
A falcon takes flight,
Wings cut the morning's chill air—
Sky's arrow returns.

Show more

Verification checklist from the smoke test:

  • ✅ GLM-4.7 via Z.AI — correct model + provider routed
  • ✅ GLM-5.1 (no provider) — correct model routed to default provider (Z.AI)
  • ✅ StepFun via OpenRouter — correct model + provider routed
  • ✅ All 3 subagents ran in parallel (14.08s total, not sequential 28s)
  • ✅ Each subagent got correct credentials and produced output
  • ✅ Schema examples were clear enough that the LLM used correct --provider syntax (not the invalid colon-prefix)

Design Decisions

Why --provider <slug> Instead of Colon-Prefix?

The colon is semantically reserved in hermes for OpenRouter's variant suffixes:

  • anthropic/claude-sonnet-4:thinking — variant tag for extended thinking
  • meta-llama/llama-3.3-70b:free — variant tag for free tier
  • google/gemini-2.5-pro:fast — variant tag for fast inference

A colon-prefix provider syntax (openrouter:stepfun/step-3.5-flash) would create ambiguous parses:

  • Is the first colon for provider routing or a variant tag?
  • If a model ID itself contains a colon (custom provider), how is it disambiguated?

The --provider <slug> approach is explicit and matches the /model command behavior already familiar to hermes users.

Anti-Pattern Documentation

The schema explicitly warns against colon-prefix syntax with concrete examples, so the LLM doesn't invent it again without prompting. Test test_schema_documents_provider_switch_syntax ensures this anti-pattern stays documented if the schema is refactored.

Error Handling

If model resolution fails (e.g., model not found, provider not authenticated):

  • _resolve_model_override() raises ValueError with a clear message
  • The error bubbles up as a JSON tool_error message
  • The LLM sees the error and can retry with a different model or provider

This is by design — it forces explicit errors instead of silent fallbacks that hide bugs.


Files Changed

tools/delegate_tool.py
  + _resolve_model_override() helper
  + delegate_task() signature: model param
  + _build_child_agent loop: per-task credential resolution
  + DELEGATE_TASK_SCHEMA: model field + examples + anti-pattern warning
  + Registry handler: forward args.get("model")

tests/tools/test_delegate.py
  + TestResolveModelOverride (5 tests)
  + TestDelegateTaskModelOverride (5 tests)
  + test_schema_documents_provider_switch_syntax (regression guard)

website/docs/user-guide/features/delegation.md
  + "Per-Task Model Selection" section (102 lines)
  + Model syntax, precedence, batch examples, patterns
  + Updated existing "Model Override" section

(No breaking changes, no existing tests regressed)

Commits

  1. 121c8b7 feat(delegate): add per-task model parameter for on-the-fly subagent routing

    • Core plumbing: _resolve_model_override(), delegate_task signature, per-task credentials
    • Schema updates with provider syntax examples
    • 10 new tests, 67 existing tests all pass
  2. 75ddbeb fix(tools): clarify delegate_task model parameter syntax in schema

    • Schema anti-pattern documentation (no colon-prefix)
    • Concrete --provider flag examples
    • Regression-guard test to keep examples stable
  3. 1bafd42 docs(delegation): document per-task model selection feature

    • Comprehensive user documentation
    • Model syntax, precedence, batch patterns, common use cases

Backwards Compatibility

  • ✅ Existing delegation code works unchanged (model param is optional)
  • ✅ Config fallback still works (delegation.model in config.yaml)
  • ✅ No changes to toolsets, max_iterations, depth limit, or interruption behavior
  • ✅ 67 existing tests still pass — no regressions

Platforms Tested

  • Linux (full end-to-end with GLM 4.7, GLM 5.1 on Z.AI, and Stepfun via OpenRouter)
  • Three parallel subagents, 14.08s total runtime
  • Cross-provider credential routing (Z.AI + OpenRouter)
  • Bash terminal, Python environment

Future Work (Out of Scope)

  • Automatic model selection based on task complexity (could layer on top of this)
  • Per-task provider config beyond just model string (low priority — model string is flexible enough)
  • Caching of model resolution to avoid repeated lookups (could be a micro-optimization)

Reviewer Notes

  1. The anti-pattern warning in the schema is load-bearing — it prevents the LLM from inventing invalid syntax. If schema is refactored, keep the concrete examples and colon-prefix warning in place.

  2. _resolve_model_override() reuses hermes_cli.model_switch.switch_model, so any future changes to model routing automatically flow through delegation. No duplication.

  3. Error handling is deliberate: if model resolution fails, it surfaces as a JSON error so the LLM sees it and can retry. We don't silently fall back to the parent's model.

  4. The smoke test demonstrates the real-world use case: three tasks on three models across two providers in parallel. If any provider routing fails, the error is clear and non-blocking for the other subagents.

claude added 3 commits April 11, 2026 06:19
…routing

Lets the parent agent (or the LLM via tool-calling) pick a model per
subagent invocation, using the same resolution pipeline as the /model
slash command: aliases, direct mappings, catalog search, and credential
resolution. Per-task model beats top-level model, which beats
delegation.model config, which falls back to inheriting the parent.

This unlocks cost/speed/capability routing for subagent-driven
development — e.g. dispatch a haiku for a trivial lookup, a sonnet for
a moderate refactor, and glm-4.7 for a bulk research task, all inside
a single delegate_task batch call.

Changes:
- tools/delegate_tool.py
  - New _resolve_model_override() helper that wraps switch_model() and
    returns a credential bundle compatible with _build_child_agent's
    override_* params. Strips --global to ensure per-task overrides
    never persist to config.yaml.
  - delegate_task() gains an optional model= kwarg, threaded through
    task normalization and the child-build loop so each subagent can
    resolve credentials independently.
  - DELEGATE_TASK_SCHEMA advertises the new model field at the top
    level and inside each task object, with descriptions the LLM can
    use to decide when to route to which model.
  - Registry handler forwards args['model'] to delegate_task().

- tests/tools/test_delegate.py
  - TestResolveModelOverride covers bare name, --provider flag, the
    --global strip-but-ignore guarantee, switch_model failures, and
    empty input.
  - TestDelegateTaskModelOverride covers top-level override, per-task
    > top-level > delegation config precedence, no-override falls
    through to delegation config, bad model names surface as JSON
    errors, and the full registry dispatch path.

All 82 delegate tests pass (67 existing + 10 new + 5 toolset scope).
The initial schema said only "supports optional --provider flag" without
showing a concrete example. When asked to route a subagent through a
different provider, the LLM reached for the intuitively natural
'provider:model' colon-prefix syntax (e.g. 'openrouter:stepfun/step-3.5-flash')
— but colons in hermes are reserved for OpenRouter variant suffixes
(:free, :extended, :thinking, :fast), so the colon-prefix form was passed
raw to the parent's provider and rejected as an Unknown Model.

Fix:
- Top-level and per-task 'model' field descriptions now show three
  concrete syntax forms: bare ID, short alias, and '--provider <slug>'
  with worked examples (stepfun/step-3.5-flash --provider openrouter,
  claude-opus-4-6 --provider anthropic, deepseek-chat --provider deepseek).
- Valid provider slugs are enumerated so the LLM doesn't have to guess.
- The colon-prefix anti-pattern is explicitly called out as DO NOT with
  an example, since LLMs gravitate toward it. This keeps delegate_task
  consistent with the existing /model slash command, which also uses
  --provider exclusively (see hermes_cli/model_switch.py:16-18).
- Main description MODEL SELECTION bullet updated with the same examples.
- New TestDelegateRequirements.test_schema_documents_provider_switch_syntax
  regression guard asserts the concrete --provider example and
  colon-prefix anti-pattern stay in the schema across future refactors.

Behaviour unchanged; this is a schema-description-only fix. All 83
delegate tests pass.
@malaiwah

Copy link
Copy Markdown
Contributor

We independently built the same feature on our fork (oikos homelab deployment) and can confirm the per-task model parameter works well in practice.

A few additions from our experience that might be worth considering:

1. Model tiers (small/medium/large)

We added delegation.model_tiers config so the agent can say model="small" instead of needing to know exact model names:

delegation:
  model_tiers:
    small: gemma4-nothink    # fast/cheap — file exploration, summarization
    # medium: inherits parent model
    large: claude-sonnet-4-6  # complex reasoning, peer review, escalation

Tier names resolve to configured model names. The agent doesn't need to know deployment-specific model identifiers.

2. list_models tool

A lightweight tool that returns available models with tier assignments, context lengths, and providers. Lets the agent make informed decisions about which model to use for each delegation.

3. Why this matters more than smart_model_routing

We tried upstream's smart_model_routing (message-length heuristic for cheap model routing) and disabled it after production testing. Short messages like "yes" or "go ahead" often trigger the most complex operations. Message length is a terrible proxy for task complexity.

The model-directed approach (this PR) is fundamentally better — the model has full context and knows when a task is simple enough for a smaller model. We've seen it correctly route file exploration to Gemma 4 27B and keep complex debugging on Qwen 3.5 397B.

+1 for merging this. The per-task model override in batch mode is especially valuable for mixed workloads.

@xlionjuan

Copy link
Copy Markdown

I don't think it should be called model_tiers, it just an alias, and you could call it whatever you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low — cosmetic, nice to have tool/delegate Subagent delegation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants