You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CAMEL-AI's original research contribution (NeurIPS 2023, arXiv:2303.17760) includes Inception Prompting — a systematic set of prompt techniques designed to prevent common failure modes when one LLM agent delegates work to another. These failure modes are real problems in Hermes Agent's delegate_task today, and will become even more critical when multi-agent orchestration (#344) is built.
This is partially applicable right now (improving delegate_tool.py's prompt construction), and becomes a prerequisite for reliable multi-agent workflows.
Prerequisite for full value:#344 (Workflow Formulas + Multi-Agent Orchestration) Related:#356 (Acceptance Criteria), #374 (Local Browser Backend — same Eigent/CAMEL research thread)
Research Findings
The Four Failure Modes CAMEL Identified
CAMEL's research systematically catalogued how LLM agents fail when communicating with each other. These map directly to problems observable in Hermes delegate_task:
1. Role-Flipping
The sub-agent stops doing the assigned work and starts acting like the parent — asking questions, requesting clarification, or delegating back.
In Hermes today: A delegated sub-agent sometimes responds with "I'd recommend you do X" or "What would you like me to focus on?" instead of actually doing the work. The sub-agent has no one to ask — it's isolated — so these responses are pure waste.
2. Instruction Echoing
The sub-agent restates the task in different words without actually performing it. The summary looks like it did something, but it just paraphrased the goal.
In Hermes today: Sub-agent returns "I analyzed the codebase for security issues and identified several areas of concern" without listing any actual issues or making any tool calls.
3. Flake Replies
The sub-agent gives vague, non-committal responses. "It seems fine," "There might be issues," "Further investigation needed."
In Hermes today: This is the most common delegation failure. The parent asks for a concrete deliverable and gets a wishy-washy summary.
4. Infinite Loops
The sub-agent gets stuck retrying the same broken approach, or cycles between two states without converging.
In Hermes today: A sub-agent hitting the same error repeatedly, trying the same fix each time until it runs out of iterations. The max_iterations parameter is the only guard, and it's a blunt instrument.
CAMEL's Inception Prompting Solution
CAMEL prevents these failures by injecting specific prompt components:
Role anchoring — Explicit instructions that the agent MUST stay in its assigned role and never flip to the requester role
Output format enforcement — Requiring concrete, actionable outputs rather than commentary
Completion signaling — Clear protocol for when the agent is done vs needs to continue
Anti-echo directives — "Do not restate the task. Perform it."
Convergence pressure — Instructions that push toward completing the task rather than expanding scope
Current State in Hermes Agent
delegate_tool.py — _build_child_system_prompt():
The current sub-agent system prompt is built from:
The base system prompt (same as parent, trimmed)
The goal/context provided by the parent
Basic instructions about available tools
What's missing:
No explicit role-anchoring ("you are a worker, not a coordinator")
No anti-echo directives
No output format requirements (concrete deliverables, not commentary)
No convergence pressure (finish the task, don't expand scope)
No stuck-detection prompting (if approach fails twice, try something different)
Implementation Plan
Skill vs. Tool Classification
This is a codebase change to tools/delegate_tool.py, specifically the system prompt construction. Not a skill or new tool.
Phased Rollout
Phase 1: Prompt Hardening for Current delegate_task (No dependencies)
Modify _build_child_system_prompt() to include inception prompting techniques
Add role-anchoring: "You are executing a delegated task. Do the work directly — do not ask questions, request clarification, or suggest the requester do it instead."
Add anti-echo: "Do not restate or paraphrase the task. Perform it using your tools and report concrete results."
Add output format guidance: "Your response must include specific findings, file paths, code snippets, or other concrete artifacts. Vague summaries like 'it seems fine' or 'further investigation needed' are not acceptable."
Add convergence pressure: "If an approach fails twice, try a fundamentally different approach rather than retrying the same thing."
Deliverable: Measurably better sub-agent output quality, zero architectural changes
Phase 2: Enhanced Guard Rails for Multi-Agent Workflows (Depends on #344)
Overview
CAMEL-AI's original research contribution (NeurIPS 2023, arXiv:2303.17760) includes Inception Prompting — a systematic set of prompt techniques designed to prevent common failure modes when one LLM agent delegates work to another. These failure modes are real problems in Hermes Agent's
delegate_tasktoday, and will become even more critical when multi-agent orchestration (#344) is built.This is partially applicable right now (improving
delegate_tool.py's prompt construction), and becomes a prerequisite for reliable multi-agent workflows.Prerequisite for full value: #344 (Workflow Formulas + Multi-Agent Orchestration)
Related: #356 (Acceptance Criteria), #374 (Local Browser Backend — same Eigent/CAMEL research thread)
Research Findings
The Four Failure Modes CAMEL Identified
CAMEL's research systematically catalogued how LLM agents fail when communicating with each other. These map directly to problems observable in Hermes
delegate_task:1. Role-Flipping
The sub-agent stops doing the assigned work and starts acting like the parent — asking questions, requesting clarification, or delegating back.
In Hermes today: A delegated sub-agent sometimes responds with "I'd recommend you do X" or "What would you like me to focus on?" instead of actually doing the work. The sub-agent has no one to ask — it's isolated — so these responses are pure waste.
2. Instruction Echoing
The sub-agent restates the task in different words without actually performing it. The summary looks like it did something, but it just paraphrased the goal.
In Hermes today: Sub-agent returns "I analyzed the codebase for security issues and identified several areas of concern" without listing any actual issues or making any tool calls.
3. Flake Replies
The sub-agent gives vague, non-committal responses. "It seems fine," "There might be issues," "Further investigation needed."
In Hermes today: This is the most common delegation failure. The parent asks for a concrete deliverable and gets a wishy-washy summary.
4. Infinite Loops
The sub-agent gets stuck retrying the same broken approach, or cycles between two states without converging.
In Hermes today: A sub-agent hitting the same error repeatedly, trying the same fix each time until it runs out of iterations. The
max_iterationsparameter is the only guard, and it's a blunt instrument.CAMEL's Inception Prompting Solution
CAMEL prevents these failures by injecting specific prompt components:
Current State in Hermes Agent
delegate_tool.py—_build_child_system_prompt():The current sub-agent system prompt is built from:
What's missing:
Implementation Plan
Skill vs. Tool Classification
This is a codebase change to
tools/delegate_tool.py, specifically the system prompt construction. Not a skill or new tool.Phased Rollout
Phase 1: Prompt Hardening for Current delegate_task (No dependencies)
_build_child_system_prompt()to include inception prompting techniquesPhase 2: Enhanced Guard Rails for Multi-Agent Workflows (Depends on #344)
Phase 3: Adaptive Prompting (Optional)
Pros & Cons
Pros
Cons / Risks
Open Questions
References
tools/delegate_tool.py— Current sub-agent prompt construction