You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CAMEL-AI's core paradigm is RolePlaying — instead of a single agent working alone on a task, two agents (an "AI User" and an "AI Assistant") collaborate through turn-based dialogue. The AI User keeps pushing, questioning, and refining while the AI Assistant produces and improves work. This adversarial iteration produces higher quality output than single-agent work on complex tasks.
Hermes Agent's delegate_task currently only supports single-agent delegation: one sub-agent works in isolation and returns a result. There's no concept of two agents iterating with each other. Adding an optional "debate mode" where two sub-agents challenge and refine each other's work could significantly improve output quality for complex tasks — code review, research synthesis, architectural decisions, etc.
Depends on:#344 (Workflow Formulas + Multi-Agent Orchestration — provides the agent-to-agent communication foundation) Related:#356 (Acceptance Criteria), #375 (Inception Prompting)
Research Findings
How CAMEL's RolePlaying Works
The RolePlaying paradigm has two agents in a structured conversation loop:
AI User (the challenger):
Drives the conversation by making requests, asking questions, pointing out problems
Evaluates the AI Assistant's responses and pushes for improvement
Decides when the task is complete (sends <CAMEL_TASK_DONE> signal)
AI Assistant (the worker):
Produces the actual work (code, analysis, writing)
Responds to the AI User's challenges and refines output
Has access to tools (terminal, web, files)
Turn-based loop:
AI User: "Write a function to parse CSV files with error handling"
AI Assistant: [writes function]
AI User: "This doesn't handle malformed rows or encoding issues. Also, what about very large files?"
AI Assistant: [rewrites with streaming, error handling, encoding detection]
AI User: "Better. But the error messages aren't actionable — include line numbers and context."
AI Assistant: [adds detailed error reporting]
AI User: <CAMEL_TASK_DONE> — "Final version handles edge cases well."
Key design decisions:
Max turns limit — Prevents infinite iteration (typically 10-20 turns)
Termination signal — Explicit completion token, not just running out of turns
Optional Critic — A third agent can evaluate proposals using Monte Carlo Tree Search-style scoring over multiple solution candidates before committing
Why This Beats Single-Agent
For complex tasks, a single agent has well-known failure modes:
Satisficing — Produces the first working solution, doesn't explore alternatives
Blind spots — Can't critique its own assumptions
Scope drift — Expands or contracts scope without external pressure
Premature convergence — Stops too early on a suboptimal solution
The adversarial debate forces the worker to defend and improve its output against a critical reviewer. It's the same reason code review exists in software development.
Current State in Hermes Agent
delegate_task today:
Single-agent delegation only — one child works alone
Batch mode runs up to 3 agents in parallel, but they can't see each other's work
No concept of iteration or refinement between agents
The parent agent is the only "reviewer" — it reads the summary and decides if it's good enough
mixture_of_agents tool:
Queries 4 frontier models in parallel, then synthesizes
Closest thing to multi-perspective reasoning, but it's one-shot (no iteration)
Models don't critique each other — the aggregator just merges responses
Gap: No mechanism for two agents to iterate on a problem together with back-and-forth dialogue.
Implementation Plan
Skill vs. Tool Classification
This is a codebase change — either an extension to delegate_tool.py or a new debate_tool.py. It needs to orchestrate two agent instances with a shared conversation loop, which requires custom Python logic for the turn management, not just instructions. It's a tool because it needs precise execution control over the turn-based protocol.
What We'd Need
Turn-based orchestration loop — Manages AI User ↔ AI Assistant conversation
Role-specific system prompts — Different prompts for the challenger vs the worker
Termination detection — Recognize when the debate has converged
Result extraction — Pull the final refined output from the conversation
Agent-to-agent communication — Sub-agents need to read each other's messages (currently impossible with isolated contexts)
Phased Rollout
Phase 1: Simple Debate Mode on delegate_task (Depends on #344)
Add an optional debate=True parameter to delegate_task:
delegate_task(
goal="Design a database schema for a multi-tenant SaaS application",
context="PostgreSQL, need row-level security, audit logging...",
debate=True, # Enable two-agent iterationmax_turns=10# Limit debate rounds
)
When debate=True:
Spawn two sub-agents: Worker (has tools, does the work) and Reviewer (critiques, pushes for improvement)
Worker produces initial output
Reviewer critiques it with specific feedback
Worker revises based on feedback
Repeat until Reviewer signals satisfaction or max_turns reached
Return the final refined output + the key critique points that drove improvements
Deliverable: Higher quality delegation output for complex tasks
Phase 2: Configurable Debate Roles
Allow users to customize the debate:
delegate_task(
goal="Implement the authentication module",
debate=True,
debate_roles={
"worker": "Senior backend developer focused on correctness",
"reviewer": "Security engineer focused on vulnerabilities and edge cases"
}
)
Different reviewer personas produce different critique angles — a security reviewer catches different things than a performance reviewer.
Deliverable: Domain-specific adversarial review
Phase 3: Multi-Reviewer Panel (Depends on #344 heavily)
Multiple reviewers with different perspectives, similar to mixture_of_agents but iterative:
delegate_task(
goal="Design the API for the new billing system",
debate=True,
debate_roles={
"worker": "API architect",
"reviewers": [
"Frontend developer (consumer perspective)",
"Security engineer",
"DevOps engineer (operational concerns)"
]
}
)
Worker gets consolidated feedback from all reviewers each round.
Diminishing returns — Most improvement happens in turns 1-3. Later turns often bike-shed on minor issues.
Reviewer quality matters — A bad reviewer prompt wastes turns on irrelevant critiques
Complexity — Adds a fundamentally new execution mode to delegation
Open Questions
Should this be a parameter on delegate_task or a separate tool? Extending delegate_task keeps things unified; a separate debate tool is cleaner but fragments the delegation story.
How to handle tool access? Should only the Worker have tools, or should the Reviewer also be able to run tests, read files, etc. to verify claims?
When should the agent auto-select debate mode? Should the parent agent autonomously decide when a task is complex enough to warrant debate, or should it always be explicit?
What's the right default max_turns? Too few = insufficient iteration; too many = wasted tokens on diminishing returns.
Can we reuse mixture_of_agents infrastructure? MoA already does multi-model parallel querying — could the aggregation step become iterative?
Overview
CAMEL-AI's core paradigm is RolePlaying — instead of a single agent working alone on a task, two agents (an "AI User" and an "AI Assistant") collaborate through turn-based dialogue. The AI User keeps pushing, questioning, and refining while the AI Assistant produces and improves work. This adversarial iteration produces higher quality output than single-agent work on complex tasks.
Hermes Agent's
delegate_taskcurrently only supports single-agent delegation: one sub-agent works in isolation and returns a result. There's no concept of two agents iterating with each other. Adding an optional "debate mode" where two sub-agents challenge and refine each other's work could significantly improve output quality for complex tasks — code review, research synthesis, architectural decisions, etc.Depends on: #344 (Workflow Formulas + Multi-Agent Orchestration — provides the agent-to-agent communication foundation)
Related: #356 (Acceptance Criteria), #375 (Inception Prompting)
Research Findings
How CAMEL's RolePlaying Works
The RolePlaying paradigm has two agents in a structured conversation loop:
AI User (the challenger):
<CAMEL_TASK_DONE>signal)AI Assistant (the worker):
Turn-based loop:
Key design decisions:
Why This Beats Single-Agent
For complex tasks, a single agent has well-known failure modes:
The adversarial debate forces the worker to defend and improve its output against a critical reviewer. It's the same reason code review exists in software development.
Current State in Hermes Agent
delegate_tasktoday:mixture_of_agentstool:Gap: No mechanism for two agents to iterate on a problem together with back-and-forth dialogue.
Implementation Plan
Skill vs. Tool Classification
This is a codebase change — either an extension to
delegate_tool.pyor a newdebate_tool.py. It needs to orchestrate two agent instances with a shared conversation loop, which requires custom Python logic for the turn management, not just instructions. It's a tool because it needs precise execution control over the turn-based protocol.What We'd Need
Phased Rollout
Phase 1: Simple Debate Mode on delegate_task (Depends on #344)
Add an optional
debate=Trueparameter todelegate_task:When
debate=True:Deliverable: Higher quality delegation output for complex tasks
Phase 2: Configurable Debate Roles
Allow users to customize the debate:
Different reviewer personas produce different critique angles — a security reviewer catches different things than a performance reviewer.
Deliverable: Domain-specific adversarial review
Phase 3: Multi-Reviewer Panel (Depends on #344 heavily)
Multiple reviewers with different perspectives, similar to
mixture_of_agentsbut iterative:Worker gets consolidated feedback from all reviewers each round.
Deliverable: Multi-perspective iterative refinement
Pros & Cons
Pros
mixture_of_agentsis one-shot multi-perspective; this is iterative refinement. Different tools for different needs.Cons / Risks
Open Questions
debatetool is cleaner but fragments the delegation story.mixture_of_agentsinfrastructure? MoA already does multi-model parallel querying — could the aggregation step become iterative?References
tools/delegate_tool.py— Current single-agent delegationtools/mixture_of_agents_tool.py— One-shot multi-model reasoning (non-iterative)