Skip to content

feat(delegate_task): per-subagent model/provider overrides + model observability plugin#12794

Open
thesunofdog wants to merge 4 commits into
NousResearch:mainfrom
thesunofdog:feat/delegate-task-model-provider-override
Open

feat(delegate_task): per-subagent model/provider overrides + model observability plugin#12794
thesunofdog wants to merge 4 commits into
NousResearch:mainfrom
thesunofdog:feat/delegate-task-model-provider-override

Conversation

@thesunofdog

@thesunofdog thesunofdog commented Apr 20, 2026

Copy link
Copy Markdown

feat(delegate_task): per-subagent model/provider overrides + model observability plugin

Summary

Two related additions:

  1. Per-subagent model/provider overrides — adds model and provider parameters to delegate_task, allowing the calling agent to route individual subagents (or each task in a batch) to a specific model, independent of delegation.model in config.

  2. model_observability plugin v2 — verifies that requested models actually reach the subagent. Surfaces ground-truth routing data (requested vs actual model, mismatch warnings, auto-router resolutions, Pareto-router resolutions) directly in delegate_task results via transform_tool_result.

These two components are designed together: the override exposes the parameter, the plugin verifies it was honored.

Components

Three pieces that form a complete system:

Patch (tools/delegate_tool.py, run_agent.py) — exposes the capability. Adds model and provider to the delegate_task schema and threads them through _dispatch_delegate_task(). Without this, any model arg passed by the LLM is silently discarded.

Skill (skills/autonomous-ai-agents/subagent-model-routing/SKILL.md) — tells the agent how to use it. Defines the routing decision matrix: which models belong in which tier, which tasks warrant which tier, and when to specify a model pin vs. let the auto-router decide. The skill is what turns the new parameter from "available" to "used correctly."

Plugin (plugins/model_observability/) — verifies it worked. Intercepts every delegate_task call, compares requested vs. actual model for each subagent, and injects the result back into the tool output before the LLM sees it. If a pin was silently dropped, the agent knows immediately rather than discovering it through degraded output quality.

Motivation

_build_child_agent() already accepted model=... (used by cron/scheduler.py) but the parameter was never plumbed through the tool schema or dispatch path. The result: any model arg passed by the LLM was silently discarded, and every subagent inherited delegation.model from config regardless of what the caller specified.

The observability plugin was built specifically to detect this class of silent failure. On main (without this patch), the plugin reports a mismatch on every delegation — confirmed by live instrumentation. On this branch, it reports clean matches for explicit pins and correctly distinguishes router resolutions from override mismatches.

Rebase note (2026-04-26)

Rebased onto main after upstream merged:

  • 48ecb98f_dispatch_delegate_task() helper consolidating the two hardcoded dispatch sites this PR originally patched. We accepted that refactor entirely and add model=/provider= to the single dispatch method instead.
  • 9c9d9b7dFileStateRegistry for concurrent subagent write safety.

TestRunAgentDispatchForwarding updated to verify _dispatch_delegate_task() directly.

API

Single-agent override

{
  "model": "anthropic/claude-haiku-4.5",
  "goal": "..."
}

Batch — per-task overrides

{
  "model": "anthropic/claude-opus-4.7",  // default for tasks that do not specify
  "tasks": [
    {"goal": "...", "model": "anthropic/claude-haiku-4.5"},  // overrides default
    {"goal": "..."}                                           // uses default above
  ]
}

Routing through OpenRouter (multi-provider batch)

provider is resolved once per delegate_task call, so per-task provider routing is not supported. To route tasks to different providers in a single batch, use provider="openrouter" at the top level — OpenRouter accepts provider-prefixed model strings and handles the downstream routing:

{
  "provider": "openrouter",
  "tasks": [
    {"goal": "...", "model": "anthropic/claude-haiku-4.5"},
    {"goal": "...", "model": "x-ai/grok-4.1-fast"},
    {"goal": "...", "model": "google/gemini-2.5-flash"}
  ]
}

Router requests

Router-style model IDs are requests for a routing policy, not concrete model pins:

  • openrouter/auto → report as auto_router_resolutions
  • openrouter/pareto-code → report as pareto_router_resolutions

These are expected to return a different concrete backend model. That is not an override mismatch.

Precedence

  1. Per-task model (batch only)
  2. Top-level model argument
  3. delegation.model from config.yaml
  4. Parent agent inherit

Error behavior

Model slugs are not validated locally — an unrecognized slug passes through and the provider returns an opaque invalid-model error. The model_observability plugin will log the attempt regardless. The _warn_model_provider_mismatch() guard catches the most common misconfiguration (provider-prefixed model string used with a non-aggregator provider) and logs a warning before the call fires.

Model Observability Plugin v2

Design

Four hooks, one enforcement chain:

pre_tool_call — scoping anchor only (always returns None).
Before delegate_task executes, captures the current JSONL byte offset and whether a model pin was specified. Stored in a tool_call_id-keyed stash. This is not a warning hook — the framework only enforces block from pre_tool_call. Its sole purpose is to give transform_tool_result a precise log read boundary, preventing bleed from prior delegations in the same session. Stash entries evict after 120s TTL.

transform_tool_result — enrichment and soft enforcement.
After delegate_task returns, reads JSONL entries past the saved offset (scoped to this call only), computes requested-vs-actual per subagent, and injects an observability block before the result reaches the LLM:

✓ sa-0: anthropic/claude-haiku-4.5 → anthropic/claude-4.5-haiku-20251001 [MATCH]
⚠ sa-1: requested anthropic/claude-opus-4.7 → actual google/gemini-2.5-flash [MISMATCH — override silently dropped]
→ sa-2: no model specified → auto-router resolved to google/gemini-2.5-flash-lite
→ sa-3: pareto-router resolved to deepseek/deepseek-v4-pro-20260423

post_api_request — JSONL logging backbone.
Logs every LLM API call to ~/.hermes/logs/model_usage.jsonl. Read by the other hooks.

on_session_start — session boundary marker.
Writes a boundary record so log readers can distinguish gateway incarnations and session-scoped runs.

All failure modes (missing log, failed delegation, stale stash, bad args) degrade silently — the plugin never interrupts the agent loop.

Why pre_tool_call as a scoping anchor rather than filtering by session_id

Filtering by session + subagent type still returns all delegations from the session. A long-running session with multiple delegate_task calls would aggregate unrelated records. The byte-offset approach costs one stat() call and gives exact per-invocation scoping with no false positives.

Live verification (2026-05-03)

Tested end-to-end with the gateway running from this branch:

Task Requested Actual Result
0 anthropic/claude-haiku-4.5 anthropic/claude-4.5-haiku-20251001 ✅ match
1 anthropic/claude-opus-4.7 anthropic/claude-4.7-opus-20260416 ✅ match
2 openrouter/auto google/gemini-2.5-flash-lite ✅ auto-router noted, no warning

Same test on main (without this patch): all three tasks report MISMATCH — confirming the patch is the differentiator.

Pareto router verification (2026-05-13)

After upstream added the OpenRouter Pareto Code router, the plugin needed to treat openrouter/pareto-code as a router request rather than a concrete model pin.

Live smoke after gateway restart:

  • Request: model="openrouter/pareto-code", provider="openrouter"
  • Task id: sa-0-7780e3fa
  • Actual backend: deepseek/deepseek-v4-pro-20260423
  • Result: inline observability contains pareto_router_resolutions
  • Result: no override_mismatches

Reader-script output agreed:

Pareto-router: yes → deepseek/deepseek-v4-pro-20260423 ×1

Tests

Original PR coverage:

  • tests/plugins/test_model_observability_v2.py — plugin lifecycle, hook registration, scoping, enrichment, mismatch/auto-router behavior, log isolation, edge cases
  • tests/tools/test_delegate.py — delegate coverage including credential resolution, model/provider overrides, mismatch guard, OpenRouter auto smoke, and dispatch forwarding

Latest verification after Pareto-router compatibility update:

python -m pytest -o addopts='' \
  tests/plugins/test_model_observability_v2.py \
  tests/tools/test_delegate_tool_observability.py -q
# 41 passed

python -m pytest -o addopts='' tests/plugins -q --tb=short
# 587 passed

Files changed

  • tools/delegate_tool.py — schema fields, handler, _resolve_delegation_credentials() overrides, delegate_task() signature, observability enrichment helpers, router-aware inline metadata
  • run_agent.pymodel=/provider= forwarded through _dispatch_delegate_task()
  • plugins/model_observability/__init__.py + plugin.yaml — v2 plugin, 4 hooks, auto/Pareto-router classification
  • tests/plugins/test_model_observability_v2.py — plugin regression tests, including Pareto-router resolution behavior
  • tests/tools/test_delegate.py — delegate model/provider coverage
  • tests/tools/test_delegate_tool_observability.py — inline observability metadata regression tests
  • scripts/refresh_openrouter_models.py — weekly model catalog maintenance script (price delta tracking, tier exclusivity validation)
  • skills/autonomous-ai-agents/subagent-model-routing/SKILL.md — agent-optimized routing skill

Backwards compatibility

All new parameters are optional. No behavior change when omitted. Observability enrichment degrades gracefully when log is absent.

Router requests are explicitly treated as router requests: openrouter/auto and openrouter/pareto-code may return concrete backend model names without being classified as override failures.

Prior art

#3172 (ReqX), #6771 (GusBot69), and #12715 (ViFigueiredo) all attempt per-call model overrides and remain open. None have been rebased onto the _dispatch_delegate_task() dispatch refactor (48ecb98f) that landed in main after they were submitted — all three would have merge conflicts against current main on their current branches. This PR is the only open implementation rebased onto that refactor.

The skill's escalation patterns are adapted from #3172 — credit to ReqX for that framing.

@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 046205d to da7eded Compare April 20, 2026 04:22
@thesunofdog

Copy link
Copy Markdown
Author

Hey! CI hasn't triggered on this one (fork workflow restriction) — could a maintainer approve the workflow run? Happy to address any feedback. Thanks! 🙌

@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from da7eded to 6afc4f8 Compare April 22, 2026 03:39
@alt-glitch alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder tool/delegate Subagent delegation labels Apr 22, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Multiple prior PRs for the same feature (#3172, #6771, #12715) — this one is rebased onto the latest _dispatch_delegate_task() refactor.

@alt-glitch

Copy link
Copy Markdown
Collaborator

Multiple prior PRs for the same feature.

@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 8610d3c to 8fc2b5e Compare April 22, 2026 08:22
@thesunofdog

Copy link
Copy Markdown
Author

Yep, noted — and covered in the PR description under Prior Art. #3172 (ReqX), #6771 (GusBot69), and #12715 (ViFigueiredo) all attempted this. This branch is the only one rebased onto the _dispatch_delegate_task() refactor that landed in 48ecb98f — the prior branches all patch the two hardcoded dispatch sites directly, which upstream has since consolidated. The escalation ladder and cost patterns in the skill are adapted from ReqX's model-routing-template skill in #3172 with credit noted. Happy to add anything else that would help the review.

@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from ad69d31 to 6c79e0f Compare April 22, 2026 09:01
@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 6fd7c05 to 62daa7a Compare April 24, 2026 07:09
@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 8762ee1 to e296eeb Compare April 25, 2026 06:11
@thesunofdog

Copy link
Copy Markdown
Author

Rebased onto main after today's 65-commit upstream sync. One conflict in test_delegate.py — both test classes (TestSubagentApprovalCallback from upstream, ours from this branch) preserved. All 22 commits applied cleanly, all declared dependencies present. CI shows UNSTABLE — suspect missing run registration, not a test failure. Currently working on a ground-up observability plugin redesign using pre_tool_call + transform_tool_result that will build on this PR's infrastructure.

@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 6c92b62 to f8a9f03 Compare April 28, 2026 23:50
@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 2 times, most recently from 59eb485 to ea056b2 Compare May 1, 2026 17:25
@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from ea056b2 to 5077248 Compare May 3, 2026 16:41
@thesunofdog thesunofdog changed the title feat(delegate_task): expose model/provider overrides; fix dispatch-bypass bug feat(delegate_task): per-subagent model/provider overrides + model observability plugin May 3, 2026
@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 3 times, most recently from 2b3293c to 48bbd6a Compare May 4, 2026 23:04
@ReqX

ReqX commented May 24, 2026

Copy link
Copy Markdown
Contributor

+1 on this, happy to test/help - closed superseeded pr #3172

@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 3 times, most recently from c66ef59 to 852192b Compare May 29, 2026 04:21
@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch 3 times, most recently from c79c3b5 to 64b2c16 Compare June 2, 2026 18:42
@Davidsoff

Copy link
Copy Markdown

What is needed to get this over the line? I have a couple of workflows that would really benefit from this!

@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 64b2c16 to 39c5fd8 Compare June 8, 2026 19:25
@jarodtaylor

Copy link
Copy Markdown

Please make this available! This is the ONLY thing that I miss about OpenClaw. It was pretty easy to route tasks, skills, crons, discord channels, etc. to specific providers/models.

Adds per-call model/provider override support for delegate_task, model observability verification, OpenRouter model refresh support, and subagent routing documentation/tests.
Removes missing Grok fast budget slug, adds Gemini 3.1 Flash Lite as a budget option, and adds Grok Build to the coding whitelist.
@thesunofdog thesunofdog force-pushed the feat/delegate-task-model-provider-override branch from 39c5fd8 to c15950f Compare June 12, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists tool/delegate Subagent delegation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants