Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows

## Overview

This is the umbrella issue for evolving Hermes Agent from a single-agent system with isolated sub-agent delegation into a true multi-agent architecture with orchestration, cooperation, specialized roles, and resilient workflows.

**What "multi-agent Hermes" means:** Today, Hermes is one agent that can spawn throwaway child agents via `delegate_task`. Those children work alone, can't talk to each other, can't share state, and return a summary to the parent. That's delegation, not multi-agent. True multi-agent means:

- **Specialized agent roles** — Agents with distinct identities, toolsets, and expertise (researcher, coder, reviewer, browser agent)
- **Structured workflows** — Task decomposition into dependency-aware DAGs, not just flat parallel dispatch
- **Inter-agent cooperation** — Agents that share context, build on each other's work, and iterate together
- **Resilient execution** — Crash recovery, stuck detection, retry with replanning, health monitoring
- **Cross-platform coordination** — Agents that can operate across different channels (different Discord channels, Telegram groups, etc.)

This issue captures the full vision, architectural decisions, and phased roadmap. Specific implementation details are broken out into focused sub-issues (linked below).

**Subsumes:** #299 (closed — original multi-agent support request)

**Component issues:**
- #356 — Acceptance Criteria & Independent Judge for delegation quality
- #375 — Inception Prompting for sub-agent communication reliability
- #376 — Adversarial Debate Mode (two-agent iterative refinement)
- #377 — Shared Memory Pools between workflow agents

---

## Current State

**`delegate_task` (tools/delegate_tool.py):**
- Spawns ephemeral `AIAgent` children with isolated context
- Two modes: single task (one child) or batch (up to 3 parallel via ThreadPoolExecutor)
- Children get: own conversation, own terminal session, restricted toolsets
- Children CANNOT: talk to each other, access parent memory, share state, communicate mid-task
- Depth limit: MAX_DEPTH=2 (parent → child OK, child → grandchild blocked)
- No dependency awareness — batch tasks all run simultaneously
- No crash recovery — if a child fails, work is lost
- No health monitoring — parent blocks until children complete or timeout
- No retry logic — failures are final
- No synthesis step — no aggregation of parallel results

**`mixture_of_agents` (tools/mixture_of_agents_tool.py):**
- Queries 4 frontier models in parallel, aggregator synthesizes
- One-shot (no iteration), models don't see each other's responses during generation
- Closest thing to multi-perspective reasoning, but not multi-agent coordination

**What works well today (keep these properties):**
- Sub-agent isolation prevents cascade failures and security issues
- Simple mental model — parent delegates, child works, result comes back
- Toolset restriction prevents children from doing dangerous things
- Fresh context prevents context pollution between tasks

---

## Architecture Design

### Core Building Blocks

**1. Agent Roles & Identities**

Move from ad-hoc "goal + context" delegation to pre-defined agent archetypes with specific capabilities:

```
Coordinator  — Decomposes tasks, assigns to workers, manages workflow
Researcher   — web_search, web_extract, arxiv tools, document analysis
Developer    — terminal, file ops, code execution, git
Browser Agent — browser tools, form filling, visual verification
Reviewer     — code review, acceptance criteria evaluation
Synthesizer  — Aggregates results from multiple agents into unified output
```

Roles would be defined as toolset + system prompt combinations. The agent decides (or a coordinator decides) which role handles each subtask. Not hard-coded routing — LLM-based assignment so new roles can be added without code changes (inspired by CAMEL-AI's coordinator pattern).

**2. Workflow DAG Engine**

Extend `delegate_task` to support dependency-aware task graphs:

```python
delegate_task(
    workflow=[
        {"id": "research", "goal": "Research the API", "context": "..."},
        {"id": "backend", "goal": "Implement client", "needs": ["research"]},
        {"id": "frontend", "goal": "Build UI", "needs": ["research"]},
        {"id": "integrate", "goal": "Integration test", "needs": ["backend", "frontend"]},
    ]
)
```

Execution engine:
- Parse workflow into a DAG, topological sort, cycle detection
- `ready_steps(completed)` returns steps whose dependencies are satisfied
- Run ready steps in parallel (respecting concurrency limits)
- Downstream steps receive upstream results as context
- Final result aggregates all step summaries

Convoy mode (parallel legs + synthesis):
```python
delegate_task(
    tasks=[...parallel tasks...],
    synthesis="Synthesize all findings into a unified report"
)
```

**3. Inter-Agent Communication**

Three levels of context sharing (progressively less isolated):

| Level | Mechanism | Use Case |
|-------|-----------|----------|
| **L0: Isolated** (current) | No sharing, parent relays | Simple delegation |
| **L1: Result passing** | Upstream results auto-injected into downstream context | Workflow DAGs |
| **L2: Shared scratchpad** | Read/write shared key-value store | Complex workflows needing fine-grained sharing |
| **L3: Live dialogue** | Turn-based agent-to-agent conversation | Debate/review modes |

L0 exists today. L1 comes with the workflow engine. L2 is #377. L3 is #376.

**4. Failure Recovery**

Three-level escalation (inspired by CAMEL-AI's Workforce and Gas Town):

```
Retry → Replan → Decompose Further
```

1. **Retry** — Same agent, same task, try again (simple transient failure)
2. **Replan** — Meta-agent rewrites the task description based on the failure reason
3. **Decompose** — Break the failed task into smaller, more tractable subtasks

Plus:
- **Checkpointing** — Persist sub-agent conversation state to `~/.hermes/checkpoints/` after each tool call. On failure, resume from checkpoint instead of restarting.
- **Stuck detection** — Monitor sub-agent activity; if no tool calls for N seconds, intervene (nudge, kill, retry, or escalate to user)
- **Health monitoring** — For long-running workflows, periodic health checks with escalation

**5. Cross-Platform Agent Distribution**

From the original #299 request: agents operating across different platforms/channels.

- An agent team where the Coordinator runs in CLI, but dispatches a Browser Agent that reports results to a Discord channel
- Different agent roles posting to different Telegram groups or Discord channels
- The `send_message` tool already supports cross-platform messaging — this extends it to agent-to-agent communication across platforms

---

## Research Sources

### Gas Town (Steve Yegge)
[steveyegge/gastown](https://github.com/steveyegge/gastown) — 348K LOC Go system orchestrating 20-50+ concurrent coding agents.

Key patterns extracted:
- **Workflow Formulas** — TOML-defined reusable workflows with 4 types: Convoy (parallel+synthesis), Workflow (sequential+deps), Expansion (template-based), Aspect (AOP cross-cutting). Uses Kahn's algorithm for topological sort.
- **GUPP (Gastown Universal Propulsion Principle)** — "If you find work on your hook, YOU RUN IT." Agent work is durable — separate work state from process state. When an agent dies, work is recoverable.
- **Hierarchical health monitoring** — 3-layer watchdog: Daemon (heartbeat) → Boot (ephemeral checker) → Deacon (persistent monitor). "Idle Town Principle": skip health checks when no active work.
- **Mail vs Nudge** — Persistent messages (survive crashes) vs ephemeral reminders (zero-cost). Agents have "mail budgets."

### CAMEL-AI / Eigent
[camel-ai/camel](https://github.com/camel-ai/camel) — NeurIPS 2023 multi-agent framework. [eigent-ai/Eigent](https://github.com/eigent-ai/Eigent) — Production desktop app built on CAMEL.

Key patterns extracted:
- **Workforce 5-step lifecycle** — Decompose → Assign → Execute → Complete → Handle Failures. The failure recovery escalation (Retry → Replan → Decompose) is the most valuable pattern.
- **LLM-based coordinator routing** — Not rule-based; the coordinator LLM decides which worker handles each subtask. New worker types can be added without changing routing logic.
- **Agent pooling** — Up to 10 pre-created agent clones per worker type with lazy initialization. Reduces spawning overhead for repetitive workflows.
- **Inception Prompting** — Systematic prevention of multi-agent communication failures: role-flipping, instruction echoing, flake replies, infinite loops. See #375.
- **RolePlaying paradigm** — Two-agent adversarial iteration (AI User + AI Assistant) for quality improvement. See #376.
- **Task classification gate** — Cheap classifier decides simple (single-agent) vs complex (multi-agent) before spinning up the workforce.

---

## Implementation Roadmap

### Phase 1: Workflow DAG + Synthesis (Foundation)

The minimum viable multi-agent upgrade. No new abstractions — just extend `delegate_task`.

- Add `workflow` parameter for dependency-aware task graphs
- DAG execution engine: topological sort, ready-step computation, parallel dispatch
- Result passing: downstream steps receive upstream summaries as context
- Add `synthesis` parameter for parallel tasks: aggregation sub-agent after batch completion
- Keep backward compatibility: existing `goal`/`tasks` calls unchanged

**Effort:** Medium (~300-400 LOC in delegate_tool.py)
**Dependencies:** None
**Unlocks:** #377 (shared memory), structured multi-step workflows

### Phase 2: Resilient Execution

Make multi-agent workflows survive failures.

- Checkpointing: persist sub-agent state after each tool call
- Retry with configurable count: `retry=2` on delegate_task
- Stuck detection: activity monitoring with configurable timeout
- Replan on failure: meta-agent rewrites failed task based on error
- Inception prompting (#375): harden sub-agent prompts against communication failures

**Effort:** Medium-High
**Dependencies:** Phase 1
**Unlocks:** Long-running workflows, unattended agent operation

### Phase 3: Agent Roles & Cooperation

Move from ad-hoc delegation to structured agent teams.

- Pre-defined agent role archetypes (toolset + system prompt combos)
- LLM-based coordinator for auto-assignment (optional — users can still specify manually)
- Shared memory pools (#377): inter-agent context sharing
- Acceptance criteria (#356): quality gating with independent judge
- Agent pooling: reuse agent instances across workflow steps

**Effort:** High
**Dependencies:** Phase 1 + 2
**Unlocks:** #376 (debate mode), specialized agent teams

### Phase 4: Advanced Patterns

- Adversarial debate mode (#376): two-agent iterative refinement
- Cross-platform agent distribution: agents posting to different channels
- Persistent agent teams: teams that survive across sessions
- Auto-orchestration: system decides single vs multi-agent based on task complexity
- Workflow templates: reusable workflow definitions (like Gas Town's formulas)

**Effort:** High
**Dependencies:** Phase 1 + 2 + 3

---

## Pros & Cons

### Pros
- **Fills a real gap** — Current delegate_task is the #1 most-requested area for improvement
- **Incremental** — Each phase delivers value independently. Phase 1 alone (DAG + synthesis) is worth doing.
- **Patterns are proven** — Gas Town runs 50+ concurrent agents; CAMEL-AI is NeurIPS-published research
- **No new dependencies** — We're extracting patterns, not importing libraries
- **Backward compatible** — All changes extend delegate_task, existing calls unchanged

### Cons / Risks
- **Complexity** — delegate_task is currently simple and well-understood. Multi-agent adds significant complexity.
- **Over-engineering risk** — Most delegate_task usage is 1-3 independent parallel tasks. How often do users actually need DAG workflows?
- **Checkpoint serialization** — Conversation state includes tool results referencing ephemeral resources. Perfect resumption is hard.
- **Cost multiplication** — Multi-agent workflows burn more tokens. A 5-step workflow with debate mode could cost 20-50x a single agent call.
- **Testing difficulty** — Multi-agent interactions are non-deterministic and hard to test reliably.

---

## Open Questions

1. **Should workflows be a new tool or extend delegate_task?** Extending keeps the interface unified but adds complexity. A separate `workflow` tool is cleaner but fragments the story.
2. **How to handle remote backends?** Checkpointing works for local/Docker but is harder for Modal/SSH.
3. **Agent role definitions — config or code?** Should roles be defined in YAML (like MCP server configs) or as Python classes?
4. **Concurrency limits?** Current batch limit is 3. Should workflow steps have a separate, higher limit?
5. **Should the coordinator be opt-in or automatic?** Users might want full control over task assignment vs auto-routing.
6. **How does this interact with the skills system?** Could agent roles be defined as skills? A "researcher" skill that configures the agent's toolset and persona?

---

## References

- [steveyegge/gastown](https://github.com/steveyegge/gastown) — Multi-agent coding orchestration (Go, MIT)
- [camel-ai/camel](https://github.com/camel-ai/camel) — Multi-agent framework, NeurIPS 2023 (Apache-2.0)
- [eigent-ai/Eigent](https://github.com/eigent-ai/Eigent) — Production multi-agent desktop app on CAMEL (Apache-2.0)
- [CAMEL NeurIPS paper](https://arxiv.org/abs/2303.17760) — Inception Prompting, RolePlaying paradigm
- [Gas Town blog post](https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16dd04) — Steve Yegge's architecture overview
- Hermes `tools/delegate_tool.py` — Current sub-agent delegation
- Hermes `tools/mixture_of_agents_tool.py` — One-shot multi-model reasoning
- #299 (closed) — Original multi-agent support request
- #356 — Acceptance Criteria & Independent Judge
- #375 — Inception Prompting for delegation reliability
- #376 — Adversarial Debate Mode
- #377 — Shared Memory Pools between workflow agents


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344

Overview

Current State

Architecture Design

Core Building Blocks

Research Sources

Gas Town (Steve Yegge)

CAMEL-AI / Eigent

Implementation Roadmap

Phase 1: Workflow DAG + Synthesis (Foundation)

Phase 2: Resilient Execution

Phase 3: Agent Roles & Cooperation

Phase 4: Advanced Patterns

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Level	Mechanism	Use Case
L0: Isolated (current)	No sharing, parent relays	Simple delegation
L1: Result passing	Upstream results auto-injected into downstream context	Workflow DAGs
L2: Shared scratchpad	Read/write shared key-value store	Complex workflows needing fine-grained sharing
L3: Live dialogue	Turn-based agent-to-agent conversation	Debate/review modes

Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344

Description

Overview

Current State

Architecture Design

Core Building Blocks

Research Sources

Gas Town (Steve Yegge)

CAMEL-AI / Eigent

Implementation Roadmap

Phase 1: Workflow DAG + Synthesis (Foundation)

Phase 2: Resilient Execution

Phase 3: Agent Roles & Cooperation

Phase 4: Advanced Patterns

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions