Skip to content

feat(delegate): add acceptance criteria and independent judge to dele…#17980

Open
MorAlekss wants to merge 1 commit into
NousResearch:mainfrom
MorAlekss:feat/delegate-acceptance-criteria
Open

feat(delegate): add acceptance criteria and independent judge to dele…#17980
MorAlekss wants to merge 1 commit into
NousResearch:mainfrom
MorAlekss:feat/delegate-acceptance-criteria

Conversation

@MorAlekss

Copy link
Copy Markdown
Contributor

Summary

Implements Phase 1 of #356: acceptance criteria and independent judge for delegate_task.


Root cause

Sub-agent results returned by delegate_task are self-reports with no independent verification. The parent agent must manually evaluate every result, consuming tokens and adding latency. There is no way to specify what "done" looks like when delegating, and no quality signal on whether the output actually meets the goal.


Fix

Added an optional acceptance_criteria parameter to delegate_task. When provided, a new _judge_output() function evaluates the sub-agent's output against the criteria using a cheap auxiliary LLM and returns a PASS/FAIL verdict with reasoning. The result is enriched with a judge field. If no criteria are provided, behavior is unchanged.


What changed

  • tools/delegate_tool.py: added acceptance_criteria: Optional[str] = None to delegate_task signature
  • tools/delegate_tool.py: added acceptance_criteria to DELEGATE_TASK_SCHEMA — both top-level and per-task in batch mode
  • tools/delegate_tool.py: added _judge_output() function — calls call_llm(task="judge"), parses PASS/FAIL JSON, graceful degrade on failure
  • tools/delegate_tool.py: wired judge evaluation in _run_single_child — enriches entry["judge"] when criteria provided
  • tests/tools/test_delegate.py: added TestJudgeOutput with 5 tests

What is not affected

  • delegate_task behavior when acceptance_criteria is not provided: unchanged, zero breaking changes
  • Batch mode: each task can have its own per-task acceptance_criteria
  • Judge failure: if auxiliary LLM is unavailable, returns {"verdict": "FAIL", "reasoning": "Judge unavailable: ..."} — no exception raised

Behavioral change

When acceptance_criteria is provided, delegate_task results now include a judge field:

{
  "task_index": 0,
  "status": "completed",
  "summary": "...",
  "judge": {
    "verdict": "PASS",
    "reasoning": "Output contains required columns"
  }
}

Tests

5 new tests in TestJudgeOutput, all pass. No regressions in existing tests.

  • test_judge_pass - judge returns PASS when criteria are met
  • test_judge_fail - judge returns FAIL when criteria are not met
  • test_no_criteria_no_judge - no criteria skips judge, returns PASS
  • test_judge_unavailable_skips - LLM failure returns FAIL with reasoning, no exception
  • test_delegate_task_passes_acceptance_criteria - integration: acceptance_criteria flows through to _run_single_child

Phase 2 (think tool for self-verification) is planned as a follow-up PR.

This is a prerequisite for the simplify skill (#379): independent judge evaluation directly addresses the need for quality-gating of parallel review agents.

Part of #356 (Phase 1/3)

@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/delegate Subagent delegation labels Apr 30, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Implements #356. Related: #479 (best-of-N judge selection).

@MorAlekss MorAlekss force-pushed the feat/delegate-acceptance-criteria branch from ceccf53 to b2f8f10 Compare May 7, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low — cosmetic, nice to have tool/delegate Subagent delegation type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants