feat(delegate_tool): per-task model/provider override by rafaumeu · Pull Request #43134 · NousResearch/hermes-agent

rafaumeu · 2026-06-09T22:55:54Z

Summary

Allows each task in a delegate_task batch to override the model and/or provider independently. Addresses issue #18591.

Problem

When delegating multiple tasks via delegate_task, all children share the same model/provider resolved from the top-level delegation config. There is no way to route individual tasks to different providers — for example, sending research tasks to a fast/cheap model while keeping code review on a stronger one.

Solution

Add two optional fields to each task item:

Field	Type	Description
`model`	`string`	Model name override (e.g. `gemini-2.5-flash`). Falls back to delegation config, then parent.
`provider`	`string`	Provider name override (e.g. `groq`, `openrouter`). Triggers independent credential resolution.

Fallback chain

per-task field  >>>  delegation config  >>>  parent agent model

If neither field is specified, behavior is identical to before — fully backward compatible.

Implementation

_resolve_task_credentials (new helper):

Fast path: same provider or no provider — reuses delegation credentials with model swapped in.
Slow path: different provider — resolves a fresh credential bundle via _resolve_delegation_credentials.
Returns an error string on failure instead of raising.

Dispatch loop change: before _build_child_agent, checks for per-task fields and resolves credentials when present. Falls through to delegation credentials otherwise.

Files changed

File	Change
`tools/delegate_tool.py`	Schema: `model`/`provider` properties on task items. Dispatch: per-task credential resolution. New `_resolve_task_credentials` helper.
`tests/tools/test_delegate_task_model_override.py`	14 new tests.

Tests

Suite	Result
New tests	14 passed
Existing `test_delegate.py`	140 passed
Total	154 passed, 0 failed

Example

{
  "tasks": [
    {
      "goal": "Research topic X",
      "model": "gemini-2.5-flash",
      "provider": "google"
    },
    {
      "goal": "Write summary",
      "model": "glm-5-turbo"
    },
    {
      "goal": "Review code"
    }
  ]
}

Task 1 routes to Google Gemini. Task 2 overrides the model but keeps the current provider. Task 3 inherits everything as usual.

Closes #18591

Add optional model and provider fields to individual tasks in delegate_tool batch mode. Fallback chain: per-task field > delegation config > parent model. Closes NousResearch#18591

ricardocamiloconsir · 2026-06-09T23:54:56Z

Multi-Provider Code Review — PR #43134

PR: feat(delegate_tool): per-task model/provider override
Author: rafaumeu | +89/-5 (prod), +534 (tests)
Reviewers: 8 independent reviews (DeepSeek, xAI/Grok, Groq/Llama, NVIDIA/Qwen, GitHub/GPT-4.1, Google/Gemini, Anthropic/Claude, Mistral)
Consolidated by: GLM-5.1 via ZAI provider

Verdict: APPROVED WITH WARNINGS

The implementation is well-structured, backward-compatible, and follows existing codebase patterns. All 8 reviewers converge on the same core assessment: the feature is sound but has a confirmed bug and several design-level concerns that warrant attention. No security blockers. One correctness bug (stale reference) and one design concern (batch abort semantics) should be addressed before or shortly after merge.

Consensus Items (all 8 reviewers agree)

Credential handling is correct and safe. Shallow copy via dict(delegation_creds) is sufficient because the credential dict is architecturally flat (all string values). No cross-task mutation risk. — confidence: HIGH
Broad except Exception at line 119 needs narrowing or logging. All 8 reviewers flag this. It catches TypeError, AttributeError, ImportError, etc. and silently converts them to user-facing error strings, masking genuine bugs. At minimum, add logger.exception() before returning the error string; ideally narrow to (ValueError, KeyError, LookupError). — confidence: HIGH
dict | str return type is a code smell. All 8 reviewers note that _resolve_task_credentials returning a credential dict on success or an error string on failure, with the caller discriminating via isinstance(task_creds, str), is fragile and not Pythonic. 6/8 reviewers specifically suggest replacing it with a custom exception (TaskCredentialError(ValueError)) to align with the existing _resolve_delegation_credentials pattern. — confidence: HIGH
Fail-fast batch abort needs documentation or redesign. All 8 reviewers flag that a credential resolution failure for task N in a batch immediately returns tool_error(...) and aborts the entire batch, orphaning already-spawned children from tasks 0..N-1. This should either be documented explicitly in the docstring or restructured (pre-validate all tasks, or collect errors and continue). — confidence: HIGH
Naming is clear and consistent. _resolve_task_credentials, effective_model, task_provider, task_creds all follow codebase conventions and are self-documenting. — confidence: HIGH
Architecture fit is clean. The fallback chain (per-task → delegation config → parent agent) resolves each field independently, reuses _resolve_delegation_credentials for the slow path, and adds zero overhead to the common case (no override). — confidence: HIGH
No security blockers. API keys are never logged or exposed in error messages. Thread safety is acceptable (each task constructs its own credential dict). — confidence: HIGH
Good unit test coverage (14 tests, 534 lines) covering fast path, slow path, error path, and fallback chain combinations. — confidence: HIGH

Warnings (majority flag)

Broad except Exception swallows unexpected errors — 8/8 reviewers flag this. DeepSeek rates it BLOCKING; the rest rate it MEDIUM/HIGH. Recommended fix: narrow to (ValueError, KeyError) or add logger.exception() as a safety net.
Fail-fast aborts entire batch on single-task error — 8/8 reviewers flag this. GitHub/GPT-4.1 calls it a SIGNIFICANT FINDING. Google/Gemini suggests pre-validating all task credentials before spawning any children as the simplest fix.
dict | str return type is an anti-pattern — 8/8 reviewers flag this. Anthropic/Claude calls it "the most significant design concern." Google/Gemini calls it "the most significant smell." Consensus suggestion: raise TaskCredentialError(ValueError) instead of returning error strings.
Missing integration tests — 6/8 reviewers note the 14 tests are all mock-based with no end-to-end validation of credential routing. A single integration test verifying the slow path with a stub provider would close this gap.
Schema changes not included in review diff — 3/8 reviewers (xAI, NVIDIA, Mistral) flag that _build_dynamic_schema_overrides changes are mentioned in the PR description but not in the diff. These should be verified to ensure model and provider are properly declared as optional string fields with descriptions.
Input validation gap — 3/8 reviewers (Google/Gemini, Anthropic/Claude, Mistral) flag that task_model and task_provider are taken directly from user-supplied task dicts with no sanitization. Whitespace-only strings like " " pass the truthiness check and enter credential resolution. Suggested fix: add .strip() or basic format validation.
Test boilerplate — 2/8 reviewers (xAI, Anthropic) note ~38 lines per test with repeated mock setup. Extracting shared fixtures (make_delegation_creds(), make_mock_parent()) would cut test code by ~40%.

Discordances (reviewers disagree)

Stale creds reference bug (unique to xAI/Grok). Reviewer 2 identified that override_acp_command still references creds.get("command") instead of task_creds.get("command") after per-task credential resolution. This means when a task overrides the provider, the ACP command falls back to the OLD provider's command, not the per-task one. No other reviewer caught this bug. If the stale reference is confirmed in the actual code, this is a correctness issue that should be fixed. — 1/8 identified; needs human verification of the actual codebase
Severity of except Exception differs. DeepSeek rates it BLOCKING (merge must wait); 7 others rate it MEDIUM or SHOULD FIX (merge with follow-up). The broader consensus leans toward SHOULD FIX, not a hard blocker.
Thread safety nuance. Google/Gemini raises a theoretical concern: if _build_child_agent mutates the credential dict in-place, concurrent tasks on the same provider could interfere. The other 7 reviewers assert thread safety is fine. This is a theoretical concern, not a confirmed bug — but worth a defensive .copy() comment.
Sentinel value: "" vs None. Anthropic/Claude suggests using None instead of "" as the sentinel for "not specified" (lines 39-40), arguing None is the canonical Python sentinel. The other 7 reviewers accept the or "" pattern with a brief comment. This is a style preference, not a correctness issue.
Pre-validation vs skip-on-error for batch mode. Google/Gemini suggests pre-validating all task credentials before spawning any children (two-pass approach). GitHub/GPT-4.1 and DeepSeek suggest collecting errors and continuing with remaining tasks. xAI/Grok suggests documenting fail-fast as intentional. No consensus on the preferred fix — this needs a team decision.

Action Items (prioritized)

[BLOCKER] Verify and fix the stale creds reference in override_acp_command (xAI/Grok finding). Change creds.get("command") → task_creds.get("command") if confirmed. Add a regression test for per-task provider override with different ACP commands.
[SHOULD FIX] Narrow the except Exception at line 119. Either:
- Catch only (ValueError, KeyError, LookupError) and let unexpected errors propagate, OR
- Add logger.exception("Unexpected error resolving per-task provider %s", task_provider) before the return to preserve full tracebacks in logs.
  (7/8 reviewers would accept merge with a follow-up issue; 1/8 considers it blocking.)
[SHOULD FIX] Document the fail-fast batch abort semantics. Add a comment near line 52 and a note in the docstring explaining that a per-task credential resolution failure aborts the entire batch. If the team decides to change this behavior, file a follow-up issue for pre-validation or partial-failure handling.
[SHOULD FIX] Add logging to the broad exception handler. Even if except Exception is kept temporarily, logger.exception() ensures infrastructure errors are captured in observability tooling rather than silently lost.
[NICE TO HAVE] Refactor _resolve_task_credentials to raise a TaskCredentialError(ValueError) instead of returning error strings. This aligns with the existing _resolve_delegation_credentials pattern and eliminates the isinstance(task_creds, str) check. 6/8 reviewers recommend this as a follow-up.
[NICE TO HAVE] Add .strip() to model/provider inputs (lines 39-40) to handle whitespace-only strings: task_model = (t.get("model") or "").strip() or "".
[NICE TO HAVE] Add 1-2 integration tests or contract tests that verify the full path from task dict → credential resolution → child agent construction. Verify the slow path calls _resolve_delegation_credentials with correct arguments.
[NICE TO HAVE] Extract shared test fixtures to reduce boilerplate. A make_delegation_creds() and make_mock_parent() fixture would cut ~40% of test code.
[NICE TO HAVE] Verify that _build_dynamic_schema_overrides properly declares model and provider as optional string fields with descriptions. This was not included in the review diff.

Positive Observations

Excellent fallback chain design. The per-task → delegation config → parent agent resolution is intuitive, composable, and resolves each field independently. All 8 reviewers praise this.
Zero overhead for existing callers. When neither task_model nor task_provider is set, the code falls through to task_creds = creds — a single else-branch. No performance regression for the common case.
Smart fast-path optimization. When the provider matches, the code reuses the existing credential dict with a shallow copy and model swap, avoiding unnecessary provider re-resolution.
Thorough docstrings. The fallback chain is documented inline with numbered priority levels. The return type contract is stated explicitly.
Strong DRY compliance. The slow path delegates to _resolve_delegation_credentials rather than duplicating provider resolution logic. The helper pattern is a good template for future per-task overrides.
Healthy test-to-code ratio. 534 lines of test for 89 lines of production code (6:1 ratio) demonstrates strong testing discipline.
Clean backward compatibility. Both fields are optional; omitting them preserves existing behavior exactly. No breaking changes to _build_child_agent's signature.

Reviewer Scorecard

Reviewer	Verdict	Unique Finding
DeepSeek	WARNING	Rates `except Exception` as BLOCKING; flags model name validation
xAI/Grok	WARNING	Identifies stale `creds` reference bug in `override_acp_command`
Groq/Llama	WARNING	Notes the pattern is reusable for future per-task overrides
NVIDIA/Qwen	WARNING	Suggests type alias `TaskCredentialResult = dict \| str` as minimal fix
GitHub/GPT-4.1	WARNING	Flags orphaned child agents from already-spawned tasks
Google/Gemini	WARNING	Suggests pre-validation two-pass approach; flags whitespace strings
Anthropic/Claude	WARNING	Suggests `None` sentinel instead of `""`; most detailed exception refactor
Mistral	WARNING	Rates 7.5/10; notes schema diff gap

Consolidated by GLM-5.1 (ZAI provider) from 8 independent reviews on 2026-06-09.

rafaumeu · 2026-06-10T00:28:33Z

Review Feedback Addressed

Thanks for the thorough multi-provider review! All actionable items have been fixed in PR #43185:

Priority	Item	Status
BLOCKER	Stale `creds` → `task_creds` reference	Fixed + regression test
SHOULD FIX	Narrow `except Exception` + `logger.exception()`	Fixed
SHOULD FIX	Fail-fast batch abort documented	Fixed
NICE TO HAVE	`.strip()` on model/provider inputs	Fixed + test

16/16 tests passing. Supersedes this PR.

Follow-up items (not included, can be addressed post-merge):

Refactor _resolve_task_credentials to raise TaskCredentialError(ValueError) instead of returning error strings (6/8 reviewers recommended)
Extract shared test fixtures to reduce boilerplate
Add integration tests for the full credential routing path

yimisunrise · 2026-06-11T05:07:04Z

+1 This feature will help enable cost-effective multi-agent workflows by allowing task-appropriate model selection. Hope to see this land soon.

rafaumeu · 2026-06-12T17:10:47Z

Closing in favor of #43185 which includes the original feature plus all review fixes (stale creds reference, narrowed exception handling, .strip() validation, documented fail-fast semantics). 16/16 tests passing.

feat(delegate_tool): per-task model/provider override

f9bfedf

Add optional model and provider fields to individual tasks in delegate_tool batch mode. Fallback chain: per-task field > delegation config > parent model. Closes NousResearch#18591

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/delegate Subagent delegation labels Jun 9, 2026

This was referenced Jun 9, 2026

feat: agent-callable model_switch tool #43145

Open

[Feature]: Automatic Role-Based Model Routing #38954

Open

rafaumeu mentioned this pull request Jun 10, 2026

fix(delegate_tool): address PR #43134 multi-provider review feedback #43185

Open

rafaumeu closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(delegate_tool): per-task model/provider override#43134

feat(delegate_tool): per-task model/provider override#43134
rafaumeu wants to merge 1 commit into
NousResearch:mainfrom
rafaumeu:feat/per-task-model-override-18591

rafaumeu commented Jun 9, 2026 •

edited

Loading

Uh oh!

ricardocamiloconsir commented Jun 9, 2026

Uh oh!

rafaumeu commented Jun 10, 2026

Uh oh!

yimisunrise commented Jun 11, 2026

Uh oh!

rafaumeu commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rafaumeu commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Fallback chain

Implementation

Files changed

Tests

Example

Uh oh!

ricardocamiloconsir commented Jun 9, 2026

Multi-Provider Code Review — PR #43134

Verdict: APPROVED WITH WARNINGS

Consensus Items (all 8 reviewers agree)

Warnings (majority flag)

Discordances (reviewers disagree)

Action Items (prioritized)

Positive Observations

Reviewer Scorecard

Uh oh!

rafaumeu commented Jun 10, 2026

Review Feedback Addressed

Uh oh!

yimisunrise commented Jun 11, 2026

Uh oh!

rafaumeu commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rafaumeu commented Jun 9, 2026 •

edited

Loading