feat(delegate_task): per-call model/provider override (revives #3719)#25530
feat(delegate_task): per-call model/provider override (revives #3719)#25530thestark77 wants to merge 2 commits into
Conversation
Closes NousResearch#3719 (closed-without-merge; this is a fresh take with a smaller diff than NousResearch#3794 — no delegation pool, just the per-call override that unblocks model-routing plugins). Adds optional ``model`` and ``provider`` parameters to ``delegate_task``: both at the top level and per-task inside the ``tasks`` array. The pre- existing ``_build_child_agent`` plumbing already accepted these overrides; this PR just wires the schema → handler → ``_build_child_agent`` path. Precedence: per-task > top-level > delegation.{model,provider} from config. When ``provider`` is overridden, the credential bundle (base_url, api_key, api_mode) is re-resolved through ``_resolve_delegation_credentials`` so a different provider's endpoint and key are used. When only ``model`` is overridden, the surrounding credentials are preserved. If resolution fails, the batch falls back to the configured delegation credentials rather than aborting. Motivation: SDD-style workflows route different phases (explore, design, apply, verify) to different models for cost/quality balance. cobalt-agent has been monkey-patching this for ~4 weeks across two production VPS instances (12 measurement runs, 95% routing-accuracy on the last 6) by reading ``_routed_model``/``_routed_provider`` fields injected into task dicts. With this PR, the patch can be retired in favour of the official schema. Tests: 6 new unit tests for ``_resolve_per_task_creds`` covering no-override, model-only, per-task vs top-level precedence, provider re-resolution, and resolution-failure fallback.
Bartok9
left a comment
There was a problem hiding this comment.
This is a clean, minimal implementation of the per-call model/provider override. The scoping decision (override only, no delegation pool) is the right call — the previous PR #3794 stalled precisely because it tried to do both at once.
A few observations on the implementation:
Precedence model is clear and well-tested. The per-task > top-level > default fallback chain is explicit and covered by tests. The test_provider_resolution_failure_falls_back_to_default test is especially valuable — failure modes in credential resolution are the kind of thing that bites production users.
One edge case worth considering: When a top-level model override is provided but no provider override, _resolve_per_task_creds() swaps the model name but keeps provider from default_creds. If the top-level model name belongs to a different provider (e.g. gpt-4o with provider=anthropic in config), the task will be routed to the wrong provider. This is intentional by design (user must also specify provider to re-resolve), but might be worth a doc note in the schema description for model to avoid confusion.
Schema completeness: The tool description for model and provider in DELEGATE_TASK_SCHEMA is minimal. Would be helpful to note that provider triggers full credential re-resolution (including base_url / api_key), while model-only swaps just the model name within the existing provider bundle.
Verified on current main (0f0e20ef8): the plumbing paths targeted here (_build_child_agent accepting model/provider overrides) are present and intact. The PR applies cleanly.
Bot review on the PR flagged that model-only overrides keep the existing provider's credential bundle. If the new model belongs to a different provider, requests will hit the wrong endpoint. Behavior is intentional (model and provider often share endpoints, and re-resolving on every model swap would be wasteful), but the schema descriptions did not make this contract visible. Schema description for `model` now spells out: model-only swaps the model name within the current provider; pair with `provider` to re-resolve. Schema description for `provider` now spells out: provider override triggers full credential re-resolution and discards the original `base_url` / `api_key`. Applied to both top-level and per-task variants. No behavior change.
|
Thanks for the careful review — both points addressed in 9597c2f: Edge case (model-only override with cross-provider model): added an explicit note to the Schema completeness: the No code or test changes — pure docs. |
|
Tracked follow-up technical debt from this PR:
|
|
Verified
Diff is purely descriptions; no code or test changes — confirmed. LGTM on the docs follow-up. Thanks for the quick turnaround. 🎻 |
Summary
Adds optional
modelandproviderparameters todelegate_task, both at the top level and per-task inside thetasksarray. This unblocks model-routing plugins (per-phase models for SDD-style workflows) without forcing them to monkey-patchdelegate_tool.py.This is a fresh take on #3719 / #3794. #3794 bundled a delegation pool with per-call override; that PR was closed without merge. This PR keeps only the per-call override — the smaller, lower-risk half — so it can land and unblock downstream work. The pool can land as a follow-up.
Why
SDD-style workflows route different phases to different models for cost/quality balance: cheap models for scout/explore, mid-tier for apply, reasoning models for verify/design.
delegate_taskexposes no way to do that today —delegation.model/delegation.providerare global per-config.The plumbing already exists:
_build_child_agentacceptsmodel/override_provider/override_base_url/override_api_key/override_api_modenatively. This PR wires the public schema → handler →_build_child_agentpath.Evidence
I maintain cobalt-agent, a Hermes plugin that does this routing via a source patch (
apply_routing_patch.pyinjects_routed_*fields into task dicts before_build_child_agentis called). It has been running on two production VPS instances for ~4 weeks. 12 measurement runs, last 6 at 95% routing accuracy (logs in cobalt-agent CHANGELOG.md).The patch works, but with Hermes shipping weekly the source-level approach is fragile — that's the whole reason for the upstream request. Once this PR lands I can retire the patch in favour of the official schema.
Diff overview
model/providerparams ondelegate_task()tools/delegate_tool.py(signature + handler lambda)model/providerproperties onDELEGATE_TASK_SCHEMAtasks[].properties_resolve_per_task_creds()helpertools/delegate_tool.pytools/delegate_tool.pytests/tools/test_delegate.pyTotal: +192 / -7 lines.
API
Top-level (whole batch uses the same model)
{ "name": "delegate_task", "arguments": { "goal": "Implement the auth middleware", "model": "glm-5.1", "provider": "zai" } }Per-task (different model per task in one call)
{ "name": "delegate_task", "arguments": { "tasks": [ {"goal": "Research competitors", "model": "glm-5", "provider": "zai"}, {"goal": "Write integration tests", "model": "glm-5.1"} ] } }Precedence
Provider re-resolution
When
provideris overridden,_resolve_per_task_credsre-runs_resolve_delegation_credentialswith the new provider sobase_url/api_key/api_modematch the new provider (instead of dragging the original provider's credentials). The originalbase_url/api_keyare stripped from the override config before resolution — they belong to the old provider.When only
modelis overridden, the surrounding credentials are preserved (model and provider often share the same endpoint).Failure handling
If credential resolution fails for the overridden provider (missing API key, unknown provider, etc.), the task falls back to the configured delegation credentials rather than aborting the batch. The failure is logged at WARN level.
Backwards compatibility
The new params are optional and default to
None. Behavior is identical to today when no overrides are supplied — only difference is the schema now advertises the two new properties.Tests
TestPerTaskCredentialOverride(6 tests):test_no_override_returns_default_credstest_model_only_top_level_override_swaps_model_keeps_providertest_per_task_model_beats_top_leveltest_provider_override_re_resolves_full_bundle(verifies stalebase_url/api_keyare stripped)test_per_task_provider_beats_top_leveltest_provider_resolution_failure_falls_back_to_defaultExisting
TestDelegationCredentialResolutionandTestDelegationProviderIntegrationshould keep passing unchanged.Follow-ups (separate PRs)
base_urlper-call override for direct endpoints