feat(delegate): add acceptance criteria and independent judge to dele… by MorAlekss · Pull Request #17980 · NousResearch/hermes-agent

MorAlekss · 2026-04-30T14:28:34Z

Summary

Implements Phase 1 of #356: acceptance criteria and independent judge for delegate_task.

Root cause

Sub-agent results returned by delegate_task are self-reports with no independent verification. The parent agent must manually evaluate every result, consuming tokens and adding latency. There is no way to specify what "done" looks like when delegating, and no quality signal on whether the output actually meets the goal.

Fix

Added an optional acceptance_criteria parameter to delegate_task. When provided, a new _judge_output() function evaluates the sub-agent's output against the criteria using a cheap auxiliary LLM and returns a PASS/FAIL verdict with reasoning. The result is enriched with a judge field. If no criteria are provided, behavior is unchanged.

What changed

tools/delegate_tool.py: added acceptance_criteria: Optional[str] = None to delegate_task signature
tools/delegate_tool.py: added acceptance_criteria to DELEGATE_TASK_SCHEMA — both top-level and per-task in batch mode
tools/delegate_tool.py: added _judge_output() function — calls call_llm(task="judge"), parses PASS/FAIL JSON, graceful degrade on failure
tools/delegate_tool.py: wired judge evaluation in _run_single_child — enriches entry["judge"] when criteria provided
tests/tools/test_delegate.py: added TestJudgeOutput with 5 tests

What is not affected

delegate_task behavior when acceptance_criteria is not provided: unchanged, zero breaking changes
Batch mode: each task can have its own per-task acceptance_criteria
Judge failure: if auxiliary LLM is unavailable, returns {"verdict": "FAIL", "reasoning": "Judge unavailable: ..."} — no exception raised

Behavioral change

When acceptance_criteria is provided, delegate_task results now include a judge field:

{
  "task_index": 0,
  "status": "completed",
  "summary": "...",
  "judge": {
    "verdict": "PASS",
    "reasoning": "Output contains required columns"
  }
}

Tests

5 new tests in TestJudgeOutput, all pass. No regressions in existing tests.

test_judge_pass - judge returns PASS when criteria are met
test_judge_fail - judge returns FAIL when criteria are not met
test_no_criteria_no_judge - no criteria skips judge, returns PASS
test_judge_unavailable_skips - LLM failure returns FAIL with reasoning, no exception
test_delegate_task_passes_acceptance_criteria - integration: acceptance_criteria flows through to _run_single_child

Phase 2 (think tool for self-verification) is planned as a follow-up PR.

This is a prerequisite for the simplify skill (#379): independent judge evaluation directly addresses the need for quality-gating of parallel review agents.

Part of #356 (Phase 1/3)

alt-glitch · 2026-04-30T14:32:15Z

Implements #356. Related: #479 (best-of-N judge selection).

…gate_task

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/delegate Subagent delegation labels Apr 30, 2026

MorAlekss force-pushed the feat/delegate-acceptance-criteria branch from ceccf53 to b2f8f10 Compare May 7, 2026 22:20

feat(delegate): add acceptance criteria and independent judge to dele…

b2f8f10

…gate_task

afelipeps mentioned this pull request May 9, 2026

Cron delivery to private DM topics broken after PR #22410 — send_message_tool.py missing three-mode routing #22773

Open

MorAlekss mentioned this pull request Jun 8, 2026

feat(skills): add Phase 2.5 acceptance filter and Best-of-N judge to … #42275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(delegate): add acceptance criteria and independent judge to dele…#17980

feat(delegate): add acceptance criteria and independent judge to dele…#17980
MorAlekss wants to merge 1 commit into
NousResearch:mainfrom
MorAlekss:feat/delegate-acceptance-criteria

MorAlekss commented Apr 30, 2026

Uh oh!

alt-glitch commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MorAlekss commented Apr 30, 2026

Summary

Root cause

Fix

What changed

What is not affected

Behavioral change

Tests

Uh oh!

alt-glitch commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants