feat(delegate_task): per-subagent model/provider overrides + model observability plugin#12794
Conversation
046205d to
da7eded
Compare
|
Hey! CI hasn't triggered on this one (fork workflow restriction) — could a maintainer approve the workflow run? Happy to address any feedback. Thanks! 🙌 |
da7eded to
6afc4f8
Compare
|
Multiple prior PRs for the same feature. |
8610d3c to
8fc2b5e
Compare
|
Yep, noted — and covered in the PR description under Prior Art. #3172 (ReqX), #6771 (GusBot69), and #12715 (ViFigueiredo) all attempted this. This branch is the only one rebased onto the |
ad69d31 to
6c79e0f
Compare
6fd7c05 to
62daa7a
Compare
8762ee1 to
e296eeb
Compare
|
Rebased onto main after today's 65-commit upstream sync. One conflict in |
6c92b62 to
f8a9f03
Compare
59eb485 to
ea056b2
Compare
ea056b2 to
5077248
Compare
2b3293c to
48bbd6a
Compare
be2475f to
2825155
Compare
086fc8a to
baa5fd8
Compare
7795ac7 to
c1c674d
Compare
|
+1 on this, happy to test/help - closed superseeded pr #3172 |
c66ef59 to
852192b
Compare
c79c3b5 to
64b2c16
Compare
|
What is needed to get this over the line? I have a couple of workflows that would really benefit from this! |
64b2c16 to
39c5fd8
Compare
|
Please make this available! This is the ONLY thing that I miss about OpenClaw. It was pretty easy to route tasks, skills, crons, discord channels, etc. to specific providers/models. |
Adds per-call model/provider override support for delegate_task, model observability verification, OpenRouter model refresh support, and subagent routing documentation/tests.
Removes missing Grok fast budget slug, adds Gemini 3.1 Flash Lite as a budget option, and adds Grok Build to the coding whitelist.
39c5fd8 to
c15950f
Compare
feat(delegate_task): per-subagent model/provider overrides + model observability plugin
Summary
Two related additions:
Per-subagent model/provider overrides — adds
modelandproviderparameters todelegate_task, allowing the calling agent to route individual subagents (or each task in a batch) to a specific model, independent ofdelegation.modelin config.model_observabilityplugin v2 — verifies that requested models actually reach the subagent. Surfaces ground-truth routing data (requested vs actual model, mismatch warnings, auto-router resolutions, Pareto-router resolutions) directly indelegate_taskresults viatransform_tool_result.These two components are designed together: the override exposes the parameter, the plugin verifies it was honored.
Components
Three pieces that form a complete system:
Patch (
tools/delegate_tool.py,run_agent.py) — exposes the capability. Addsmodelandproviderto thedelegate_taskschema and threads them through_dispatch_delegate_task(). Without this, anymodelarg passed by the LLM is silently discarded.Skill (
skills/autonomous-ai-agents/subagent-model-routing/SKILL.md) — tells the agent how to use it. Defines the routing decision matrix: which models belong in which tier, which tasks warrant which tier, and when to specify a model pin vs. let the auto-router decide. The skill is what turns the new parameter from "available" to "used correctly."Plugin (
plugins/model_observability/) — verifies it worked. Intercepts everydelegate_taskcall, compares requested vs. actual model for each subagent, and injects the result back into the tool output before the LLM sees it. If a pin was silently dropped, the agent knows immediately rather than discovering it through degraded output quality.Motivation
_build_child_agent()already acceptedmodel=...(used bycron/scheduler.py) but the parameter was never plumbed through the tool schema or dispatch path. The result: anymodelarg passed by the LLM was silently discarded, and every subagent inheriteddelegation.modelfrom config regardless of what the caller specified.The observability plugin was built specifically to detect this class of silent failure. On
main(without this patch), the plugin reports a mismatch on every delegation — confirmed by live instrumentation. On this branch, it reports clean matches for explicit pins and correctly distinguishes router resolutions from override mismatches.Rebase note (2026-04-26)
Rebased onto
mainafter upstream merged:48ecb98f—_dispatch_delegate_task()helper consolidating the two hardcoded dispatch sites this PR originally patched. We accepted that refactor entirely and addmodel=/provider=to the single dispatch method instead.9c9d9b7d—FileStateRegistryfor concurrent subagent write safety.TestRunAgentDispatchForwardingupdated to verify_dispatch_delegate_task()directly.API
Single-agent override
{ "model": "anthropic/claude-haiku-4.5", "goal": "..." }Batch — per-task overrides
{ "model": "anthropic/claude-opus-4.7", // default for tasks that do not specify "tasks": [ {"goal": "...", "model": "anthropic/claude-haiku-4.5"}, // overrides default {"goal": "..."} // uses default above ] }Routing through OpenRouter (multi-provider batch)
provideris resolved once perdelegate_taskcall, so per-task provider routing is not supported. To route tasks to different providers in a single batch, useprovider="openrouter"at the top level — OpenRouter accepts provider-prefixed model strings and handles the downstream routing:{ "provider": "openrouter", "tasks": [ {"goal": "...", "model": "anthropic/claude-haiku-4.5"}, {"goal": "...", "model": "x-ai/grok-4.1-fast"}, {"goal": "...", "model": "google/gemini-2.5-flash"} ] }Router requests
Router-style model IDs are requests for a routing policy, not concrete model pins:
openrouter/auto→ report asauto_router_resolutionsopenrouter/pareto-code→ report aspareto_router_resolutionsThese are expected to return a different concrete backend model. That is not an override mismatch.
Precedence
model(batch only)modelargumentdelegation.modelfromconfig.yamlError behavior
Model slugs are not validated locally — an unrecognized slug passes through and the provider returns an opaque invalid-model error. The
model_observabilityplugin will log the attempt regardless. The_warn_model_provider_mismatch()guard catches the most common misconfiguration (provider-prefixed model string used with a non-aggregator provider) and logs a warning before the call fires.Model Observability Plugin v2
Design
Four hooks, one enforcement chain:
pre_tool_call— scoping anchor only (always returnsNone).Before
delegate_taskexecutes, captures the current JSONL byte offset and whether a model pin was specified. Stored in atool_call_id-keyed stash. This is not a warning hook — the framework only enforcesblockfrompre_tool_call. Its sole purpose is to givetransform_tool_resulta precise log read boundary, preventing bleed from prior delegations in the same session. Stash entries evict after 120s TTL.transform_tool_result— enrichment and soft enforcement.After
delegate_taskreturns, reads JSONL entries past the saved offset (scoped to this call only), computes requested-vs-actual per subagent, and injects an observability block before the result reaches the LLM:post_api_request— JSONL logging backbone.Logs every LLM API call to
~/.hermes/logs/model_usage.jsonl. Read by the other hooks.on_session_start— session boundary marker.Writes a boundary record so log readers can distinguish gateway incarnations and session-scoped runs.
All failure modes (missing log, failed delegation, stale stash, bad args) degrade silently — the plugin never interrupts the agent loop.
Why
pre_tool_callas a scoping anchor rather than filtering bysession_idFiltering by session + subagent type still returns all delegations from the session. A long-running session with multiple
delegate_taskcalls would aggregate unrelated records. The byte-offset approach costs onestat()call and gives exact per-invocation scoping with no false positives.Live verification (2026-05-03)
Tested end-to-end with the gateway running from this branch:
anthropic/claude-haiku-4.5anthropic/claude-4.5-haiku-20251001anthropic/claude-opus-4.7anthropic/claude-4.7-opus-20260416openrouter/autogoogle/gemini-2.5-flash-liteSame test on
main(without this patch): all three tasks reportMISMATCH— confirming the patch is the differentiator.Pareto router verification (2026-05-13)
After upstream added the OpenRouter Pareto Code router, the plugin needed to treat
openrouter/pareto-codeas a router request rather than a concrete model pin.Live smoke after gateway restart:
model="openrouter/pareto-code",provider="openrouter"sa-0-7780e3fadeepseek/deepseek-v4-pro-20260423observabilitycontainspareto_router_resolutionsoverride_mismatchesReader-script output agreed:
Tests
Original PR coverage:
tests/plugins/test_model_observability_v2.py— plugin lifecycle, hook registration, scoping, enrichment, mismatch/auto-router behavior, log isolation, edge casestests/tools/test_delegate.py— delegate coverage including credential resolution, model/provider overrides, mismatch guard, OpenRouter auto smoke, and dispatch forwardingLatest verification after Pareto-router compatibility update:
Files changed
tools/delegate_tool.py— schema fields, handler,_resolve_delegation_credentials()overrides,delegate_task()signature, observability enrichment helpers, router-aware inline metadatarun_agent.py—model=/provider=forwarded through_dispatch_delegate_task()plugins/model_observability/__init__.py+plugin.yaml— v2 plugin, 4 hooks, auto/Pareto-router classificationtests/plugins/test_model_observability_v2.py— plugin regression tests, including Pareto-router resolution behaviortests/tools/test_delegate.py— delegate model/provider coveragetests/tools/test_delegate_tool_observability.py— inline observability metadata regression testsscripts/refresh_openrouter_models.py— weekly model catalog maintenance script (price delta tracking, tier exclusivity validation)skills/autonomous-ai-agents/subagent-model-routing/SKILL.md— agent-optimized routing skillBackwards compatibility
All new parameters are optional. No behavior change when omitted. Observability enrichment degrades gracefully when log is absent.
Router requests are explicitly treated as router requests:
openrouter/autoandopenrouter/pareto-codemay return concrete backend model names without being classified as override failures.Prior art
#3172 (ReqX), #6771 (GusBot69), and #12715 (ViFigueiredo) all attempt per-call model overrides and remain open. None have been rebased onto the
_dispatch_delegate_task()dispatch refactor (48ecb98f) that landed in main after they were submitted — all three would have merge conflicts against current main on their current branches. This PR is the only open implementation rebased onto that refactor.The skill's escalation patterns are adapted from #3172 — credit to ReqX for that framing.