feat(delegate_task): per-subagent model/provider overrides + model observability plugin by thesunofdog · Pull Request #12794 · NousResearch/hermes-agent

thesunofdog · 2026-04-20T03:13:04Z

feat(delegate_task): per-subagent model/provider overrides + model observability plugin

Summary

Two related additions:

Per-subagent model/provider overrides — adds model and provider parameters to delegate_task, allowing the calling agent to route individual subagents (or each task in a batch) to a specific model, independent of delegation.model in config.
model_observability plugin v2 — verifies that requested models actually reach the subagent. Surfaces ground-truth routing data (requested vs actual model, mismatch warnings, auto-router resolutions, Pareto-router resolutions) directly in delegate_task results via transform_tool_result.

These two components are designed together: the override exposes the parameter, the plugin verifies it was honored.

Components

Three pieces that form a complete system:

Patch (tools/delegate_tool.py, run_agent.py) — exposes the capability. Adds model and provider to the delegate_task schema and threads them through _dispatch_delegate_task(). Without this, any model arg passed by the LLM is silently discarded.

Skill (skills/autonomous-ai-agents/subagent-model-routing/SKILL.md) — tells the agent how to use it. Defines the routing decision matrix: which models belong in which tier, which tasks warrant which tier, and when to specify a model pin vs. let the auto-router decide. The skill is what turns the new parameter from "available" to "used correctly."

Plugin (plugins/model_observability/) — verifies it worked. Intercepts every delegate_task call, compares requested vs. actual model for each subagent, and injects the result back into the tool output before the LLM sees it. If a pin was silently dropped, the agent knows immediately rather than discovering it through degraded output quality.

Motivation

_build_child_agent() already accepted model=... (used by cron/scheduler.py) but the parameter was never plumbed through the tool schema or dispatch path. The result: any model arg passed by the LLM was silently discarded, and every subagent inherited delegation.model from config regardless of what the caller specified.

The observability plugin was built specifically to detect this class of silent failure. On main (without this patch), the plugin reports a mismatch on every delegation — confirmed by live instrumentation. On this branch, it reports clean matches for explicit pins and correctly distinguishes router resolutions from override mismatches.

Rebase note (2026-04-26)

Rebased onto main after upstream merged:

48ecb98f — _dispatch_delegate_task() helper consolidating the two hardcoded dispatch sites this PR originally patched. We accepted that refactor entirely and add model=/provider= to the single dispatch method instead.
9c9d9b7d — FileStateRegistry for concurrent subagent write safety.

TestRunAgentDispatchForwarding updated to verify _dispatch_delegate_task() directly.

API

Single-agent override

{
  "model": "anthropic/claude-haiku-4.5",
  "goal": "..."
}

Batch — per-task overrides

{
  "model": "anthropic/claude-opus-4.7",  // default for tasks that do not specify
  "tasks": [
    {"goal": "...", "model": "anthropic/claude-haiku-4.5"},  // overrides default
    {"goal": "..."}                                           // uses default above
  ]
}

Routing through OpenRouter (multi-provider batch)

provider is resolved once per delegate_task call, so per-task provider routing is not supported. To route tasks to different providers in a single batch, use provider="openrouter" at the top level — OpenRouter accepts provider-prefixed model strings and handles the downstream routing:

{
  "provider": "openrouter",
  "tasks": [
    {"goal": "...", "model": "anthropic/claude-haiku-4.5"},
    {"goal": "...", "model": "x-ai/grok-4.1-fast"},
    {"goal": "...", "model": "google/gemini-2.5-flash"}
  ]
}

Router requests

Router-style model IDs are requests for a routing policy, not concrete model pins:

openrouter/auto → report as auto_router_resolutions
openrouter/pareto-code → report as pareto_router_resolutions

These are expected to return a different concrete backend model. That is not an override mismatch.

Precedence

Per-task model (batch only)
Top-level model argument
delegation.model from config.yaml
Parent agent inherit

Error behavior

Model slugs are not validated locally — an unrecognized slug passes through and the provider returns an opaque invalid-model error. The model_observability plugin will log the attempt regardless. The _warn_model_provider_mismatch() guard catches the most common misconfiguration (provider-prefixed model string used with a non-aggregator provider) and logs a warning before the call fires.

Model Observability Plugin v2

Design

Four hooks, one enforcement chain:

pre_tool_call — scoping anchor only (always returns None).
Before delegate_task executes, captures the current JSONL byte offset and whether a model pin was specified. Stored in a tool_call_id-keyed stash. This is not a warning hook — the framework only enforces block from pre_tool_call. Its sole purpose is to give transform_tool_result a precise log read boundary, preventing bleed from prior delegations in the same session. Stash entries evict after 120s TTL.

transform_tool_result — enrichment and soft enforcement.
After delegate_task returns, reads JSONL entries past the saved offset (scoped to this call only), computes requested-vs-actual per subagent, and injects an observability block before the result reaches the LLM:

✓ sa-0: anthropic/claude-haiku-4.5 → anthropic/claude-4.5-haiku-20251001 [MATCH]
⚠ sa-1: requested anthropic/claude-opus-4.7 → actual google/gemini-2.5-flash [MISMATCH — override silently dropped]
→ sa-2: no model specified → auto-router resolved to google/gemini-2.5-flash-lite
→ sa-3: pareto-router resolved to deepseek/deepseek-v4-pro-20260423

post_api_request — JSONL logging backbone.
Logs every LLM API call to ~/.hermes/logs/model_usage.jsonl. Read by the other hooks.

on_session_start — session boundary marker.
Writes a boundary record so log readers can distinguish gateway incarnations and session-scoped runs.

All failure modes (missing log, failed delegation, stale stash, bad args) degrade silently — the plugin never interrupts the agent loop.

Why `pre_tool_call` as a scoping anchor rather than filtering by `session_id`

Filtering by session + subagent type still returns all delegations from the session. A long-running session with multiple delegate_task calls would aggregate unrelated records. The byte-offset approach costs one stat() call and gives exact per-invocation scoping with no false positives.

Live verification (2026-05-03)

Tested end-to-end with the gateway running from this branch:

Task	Requested	Actual	Result
0	`anthropic/claude-haiku-4.5`	`anthropic/claude-4.5-haiku-20251001`	✅ match
1	`anthropic/claude-opus-4.7`	`anthropic/claude-4.7-opus-20260416`	✅ match
2	`openrouter/auto`	`google/gemini-2.5-flash-lite`	✅ auto-router noted, no warning

Same test on main (without this patch): all three tasks report MISMATCH — confirming the patch is the differentiator.

Pareto router verification (2026-05-13)

After upstream added the OpenRouter Pareto Code router, the plugin needed to treat openrouter/pareto-code as a router request rather than a concrete model pin.

Live smoke after gateway restart:

Request: model="openrouter/pareto-code", provider="openrouter"
Task id: sa-0-7780e3fa
Actual backend: deepseek/deepseek-v4-pro-20260423
Result: inline observability contains pareto_router_resolutions
Result: no override_mismatches

Reader-script output agreed:

Pareto-router: yes → deepseek/deepseek-v4-pro-20260423 ×1

Tests

Original PR coverage:

tests/plugins/test_model_observability_v2.py — plugin lifecycle, hook registration, scoping, enrichment, mismatch/auto-router behavior, log isolation, edge cases
tests/tools/test_delegate.py — delegate coverage including credential resolution, model/provider overrides, mismatch guard, OpenRouter auto smoke, and dispatch forwarding

Latest verification after Pareto-router compatibility update:

python -m pytest -o addopts='' \
  tests/plugins/test_model_observability_v2.py \
  tests/tools/test_delegate_tool_observability.py -q
# 41 passed

python -m pytest -o addopts='' tests/plugins -q --tb=short
# 587 passed

Files changed

tools/delegate_tool.py — schema fields, handler, _resolve_delegation_credentials() overrides, delegate_task() signature, observability enrichment helpers, router-aware inline metadata
run_agent.py — model=/provider= forwarded through _dispatch_delegate_task()
plugins/model_observability/__init__.py + plugin.yaml — v2 plugin, 4 hooks, auto/Pareto-router classification
tests/plugins/test_model_observability_v2.py — plugin regression tests, including Pareto-router resolution behavior
tests/tools/test_delegate.py — delegate model/provider coverage
tests/tools/test_delegate_tool_observability.py — inline observability metadata regression tests
scripts/refresh_openrouter_models.py — weekly model catalog maintenance script (price delta tracking, tier exclusivity validation)
skills/autonomous-ai-agents/subagent-model-routing/SKILL.md — agent-optimized routing skill

Backwards compatibility

All new parameters are optional. No behavior change when omitted. Observability enrichment degrades gracefully when log is absent.

Router requests are explicitly treated as router requests: openrouter/auto and openrouter/pareto-code may return concrete backend model names without being classified as override failures.

Prior art

#3172 (ReqX), #6771 (GusBot69), and #12715 (ViFigueiredo) all attempt per-call model overrides and remain open. None have been rebased onto the _dispatch_delegate_task() dispatch refactor (48ecb98f) that landed in main after they were submitted — all three would have merge conflicts against current main on their current branches. This PR is the only open implementation rebased onto that refactor.

The skill's escalation patterns are adapted from #3172 — credit to ReqX for that framing.

thesunofdog · 2026-04-21T03:56:30Z

Hey! CI hasn't triggered on this one (fork workflow restriction) — could a maintainer approve the workflow run? Happy to address any feedback. Thanks! 🙌

alt-glitch · 2026-04-22T03:44:29Z

Multiple prior PRs for the same feature (#3172, #6771, #12715) — this one is rebased onto the latest _dispatch_delegate_task() refactor.

alt-glitch · 2026-04-22T03:45:33Z

Multiple prior PRs for the same feature.

thesunofdog · 2026-04-22T08:22:26Z

Yep, noted — and covered in the PR description under Prior Art. #3172 (ReqX), #6771 (GusBot69), and #12715 (ViFigueiredo) all attempted this. This branch is the only one rebased onto the _dispatch_delegate_task() refactor that landed in 48ecb98f — the prior branches all patch the two hardcoded dispatch sites directly, which upstream has since consolidated. The escalation ladder and cost patterns in the skill are adapted from ReqX's model-routing-template skill in #3172 with credit noted. Happy to add anything else that would help the review.

thesunofdog · 2026-04-25T08:14:34Z

Rebased onto main after today's 65-commit upstream sync. One conflict in test_delegate.py — both test classes (TestSubagentApprovalCallback from upstream, ours from this branch) preserved. All 22 commits applied cleanly, all declared dependencies present. CI shows UNSTABLE — suspect missing run registration, not a test failure. Currently working on a ground-up observability plugin redesign using pre_tool_call + transform_tool_result that will build on this PR's infrastructure.

ReqX · 2026-05-24T08:38:13Z

+1 on this, happy to test/help - closed superseeded pr #3172

Davidsoff · 2026-06-04T09:00:34Z

What is needed to get this over the line? I have a couple of workflows that would really benefit from this!

jarodtaylor · 2026-06-12T00:05:48Z

Please make this available! This is the ONLY thing that I miss about OpenClaw. It was pretty easy to route tasks, skills, crons, discord channels, etc. to specific providers/models.

Adds per-call model/provider override support for delegate_task, model observability verification, OpenRouter model refresh support, and subagent routing documentation/tests.

Removes missing Grok fast budget slug, adds Gemini 3.1 Flash Lite as a budget option, and adds Grok Build to the coding whitelist.

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 046205d to da7eded Compare April 20, 2026 04:22

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from da7eded to 6afc4f8 Compare April 22, 2026 03:39

alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/delegate Subagent delegation labels Apr 22, 2026

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 8610d3c to 8fc2b5e Compare April 22, 2026 08:22

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from ad69d31 to 6c79e0f Compare April 22, 2026 09:01

alt-glitch mentioned this pull request Apr 23, 2026

feat: allow delegate_task model overrides #12715

Open

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 6fd7c05 to 62daa7a Compare April 24, 2026 07:09

alt-glitch mentioned this pull request Apr 24, 2026

[Feature]: delegation-per-task-endpoint-overrides #14974

Closed

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 8762ee1 to e296eeb Compare April 25, 2026 06:11

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 6c92b62 to f8a9f03 Compare April 28, 2026 23:50

alt-glitch mentioned this pull request Apr 30, 2026

Feature: per-call model/provider override in delegate_task #17732

Open

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 59eb485 to ea056b2 Compare May 1, 2026 17:25

alt-glitch mentioned this pull request May 2, 2026

feat(delegate): per-call model/provider override for subagents #3172

Closed

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from ea056b2 to 5077248 Compare May 3, 2026 16:41

thesunofdog changed the title ~~feat(delegate_task): expose model/provider overrides; fix dispatch-bypass bug~~ feat(delegate_task): per-subagent model/provider overrides + model observability plugin May 3, 2026

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 3 times, most recently from 2b3293c to 48bbd6a Compare May 4, 2026 23:04

This was referenced May 10, 2026

feat(delegate): per-task model and provider overrides #23266

Open

delegate_task model parameter silently discarded — subagents always inherit parent model #23467

Open

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from be2475f to 2825155 Compare May 12, 2026 23:19

ozdalva mentioned this pull request May 13, 2026

feat(delegate): per-task model and provider override for delegate_task subagents #25026

Closed

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 5 times, most recently from 086fc8a to baa5fd8 Compare May 20, 2026 04:27

joshcheng820222 mentioned this pull request May 21, 2026

feat: add model and provider parameters to delegate_task for dynamic model routing #29899

Closed

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 7795ac7 to c1c674d Compare May 22, 2026 00:44

This was referenced May 24, 2026

fix(delegate): wire model/provider params through to _build_child_agent #31284

Closed

feat: allow per-call delegation model overrides #31345

Open

jmelchiori mentioned this pull request May 24, 2026

feat(tools): add per-task routing to delegate_task #31537

Open

4 tasks

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 3 times, most recently from c66ef59 to 852192b Compare May 29, 2026 04:21

alt-glitch mentioned this pull request May 29, 2026

Add model parameter to delegate tool for per-call provider overrides #34681

Open

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 3 times, most recently from c79c3b5 to 64b2c16 Compare June 2, 2026 18:42

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 64b2c16 to 39c5fd8 Compare June 8, 2026 19:25

thesunofdog added 4 commits June 12, 2026 12:27

feat(tools): add delegate model/provider overrides

6859d64

Adds per-call model/provider override support for delegate_task, model observability verification, OpenRouter model refresh support, and subagent routing documentation/tests.

chore(skill): sync PR skill from local — 2026-05-20 03:05:47 PDT

34c9f87

chore(skill): sync PR skill from local — 2026-05-26 03:05:10 PDT

f335269

docs(routing): refresh OpenRouter model whitelist

c15950f

Removes missing Grok fast budget slug, adds Gemini 3.1 Flash Lite as a budget option, and adds Grok Build to the coding whitelist.

thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 39c5fd8 to c15950f Compare June 12, 2026 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(delegate_task): per-subagent model/provider overrides + model observability plugin#12794

feat(delegate_task): per-subagent model/provider overrides + model observability plugin#12794
thesunofdog wants to merge 4 commits into
NousResearch:mainfrom
thesunofdog:feat/delegate-task-model-provider-override

thesunofdog commented Apr 20, 2026 •

edited

Loading

Uh oh!

thesunofdog commented Apr 21, 2026

Uh oh!

alt-glitch commented Apr 22, 2026

Uh oh!

alt-glitch commented Apr 22, 2026

Uh oh!

thesunofdog commented Apr 22, 2026

Uh oh!

thesunofdog commented Apr 25, 2026

Uh oh!

ReqX commented May 24, 2026

Uh oh!

Davidsoff commented Jun 4, 2026

Uh oh!

jarodtaylor commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

thesunofdog commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(delegate_task): per-subagent model/provider overrides + model observability plugin

Summary

Components

Motivation

Rebase note (2026-04-26)

API

Single-agent override

Batch — per-task overrides

Routing through OpenRouter (multi-provider batch)

Router requests

Precedence

Error behavior

Model Observability Plugin v2

Design

Why pre_tool_call as a scoping anchor rather than filtering by session_id

Live verification (2026-05-03)

Pareto router verification (2026-05-13)

Tests

Files changed

Backwards compatibility

Prior art

Uh oh!

thesunofdog commented Apr 21, 2026

Uh oh!

alt-glitch commented Apr 22, 2026

Uh oh!

alt-glitch commented Apr 22, 2026

Uh oh!

thesunofdog commented Apr 22, 2026

Uh oh!

thesunofdog commented Apr 25, 2026

Uh oh!

ReqX commented May 24, 2026

Uh oh!

Davidsoff commented Jun 4, 2026

Uh oh!

jarodtaylor commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thesunofdog commented Apr 20, 2026 •

edited

Loading

Why `pre_tool_call` as a scoping anchor rather than filtering by `session_id`