Skip to content

Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344

@teknium1

Description

@teknium1

Overview

This is the umbrella issue for evolving Hermes Agent from a single-agent system with isolated sub-agent delegation into a true multi-agent architecture with orchestration, cooperation, specialized roles, and resilient workflows.

What "multi-agent Hermes" means: Today, Hermes is one agent that can spawn throwaway child agents via delegate_task. Those children work alone, can't talk to each other, can't share state, and return a summary to the parent. That's delegation, not multi-agent. True multi-agent means:

  • Specialized agent roles — Agents with distinct identities, toolsets, and expertise (researcher, coder, reviewer, browser agent)
  • Structured workflows — Task decomposition into dependency-aware DAGs, not just flat parallel dispatch
  • Inter-agent cooperation — Agents that share context, build on each other's work, and iterate together
  • Resilient execution — Crash recovery, stuck detection, retry with replanning, health monitoring
  • Cross-platform coordination — Agents that can operate across different channels (different Discord channels, Telegram groups, etc.)

This issue captures the full vision, architectural decisions, and phased roadmap. Specific implementation details are broken out into focused sub-issues (linked below).

Subsumes: #299 (closed — original multi-agent support request)

Component issues:


Current State

delegate_task (tools/delegate_tool.py):

  • Spawns ephemeral AIAgent children with isolated context
  • Two modes: single task (one child) or batch (up to 3 parallel via ThreadPoolExecutor)
  • Children get: own conversation, own terminal session, restricted toolsets
  • Children CANNOT: talk to each other, access parent memory, share state, communicate mid-task
  • Depth limit: MAX_DEPTH=2 (parent → child OK, child → grandchild blocked)
  • No dependency awareness — batch tasks all run simultaneously
  • No crash recovery — if a child fails, work is lost
  • No health monitoring — parent blocks until children complete or timeout
  • No retry logic — failures are final
  • No synthesis step — no aggregation of parallel results

mixture_of_agents (tools/mixture_of_agents_tool.py):

  • Queries 4 frontier models in parallel, aggregator synthesizes
  • One-shot (no iteration), models don't see each other's responses during generation
  • Closest thing to multi-perspective reasoning, but not multi-agent coordination

What works well today (keep these properties):

  • Sub-agent isolation prevents cascade failures and security issues
  • Simple mental model — parent delegates, child works, result comes back
  • Toolset restriction prevents children from doing dangerous things
  • Fresh context prevents context pollution between tasks

Architecture Design

Core Building Blocks

1. Agent Roles & Identities

Move from ad-hoc "goal + context" delegation to pre-defined agent archetypes with specific capabilities:

Coordinator  — Decomposes tasks, assigns to workers, manages workflow
Researcher   — web_search, web_extract, arxiv tools, document analysis
Developer    — terminal, file ops, code execution, git
Browser Agent — browser tools, form filling, visual verification
Reviewer     — code review, acceptance criteria evaluation
Synthesizer  — Aggregates results from multiple agents into unified output

Roles would be defined as toolset + system prompt combinations. The agent decides (or a coordinator decides) which role handles each subtask. Not hard-coded routing — LLM-based assignment so new roles can be added without code changes (inspired by CAMEL-AI's coordinator pattern).

2. Workflow DAG Engine

Extend delegate_task to support dependency-aware task graphs:

delegate_task(
    workflow=[
        {"id": "research", "goal": "Research the API", "context": "..."},
        {"id": "backend", "goal": "Implement client", "needs": ["research"]},
        {"id": "frontend", "goal": "Build UI", "needs": ["research"]},
        {"id": "integrate", "goal": "Integration test", "needs": ["backend", "frontend"]},
    ]
)

Execution engine:

  • Parse workflow into a DAG, topological sort, cycle detection
  • ready_steps(completed) returns steps whose dependencies are satisfied
  • Run ready steps in parallel (respecting concurrency limits)
  • Downstream steps receive upstream results as context
  • Final result aggregates all step summaries

Convoy mode (parallel legs + synthesis):

delegate_task(
    tasks=[...parallel tasks...],
    synthesis="Synthesize all findings into a unified report"
)

3. Inter-Agent Communication

Three levels of context sharing (progressively less isolated):

Level Mechanism Use Case
L0: Isolated (current) No sharing, parent relays Simple delegation
L1: Result passing Upstream results auto-injected into downstream context Workflow DAGs
L2: Shared scratchpad Read/write shared key-value store Complex workflows needing fine-grained sharing
L3: Live dialogue Turn-based agent-to-agent conversation Debate/review modes

L0 exists today. L1 comes with the workflow engine. L2 is #377. L3 is #376.

4. Failure Recovery

Three-level escalation (inspired by CAMEL-AI's Workforce and Gas Town):

Retry → Replan → Decompose Further
  1. Retry — Same agent, same task, try again (simple transient failure)
  2. Replan — Meta-agent rewrites the task description based on the failure reason
  3. Decompose — Break the failed task into smaller, more tractable subtasks

Plus:

  • Checkpointing — Persist sub-agent conversation state to ~/.hermes/checkpoints/ after each tool call. On failure, resume from checkpoint instead of restarting.
  • Stuck detection — Monitor sub-agent activity; if no tool calls for N seconds, intervene (nudge, kill, retry, or escalate to user)
  • Health monitoring — For long-running workflows, periodic health checks with escalation

5. Cross-Platform Agent Distribution

From the original #299 request: agents operating across different platforms/channels.

  • An agent team where the Coordinator runs in CLI, but dispatches a Browser Agent that reports results to a Discord channel
  • Different agent roles posting to different Telegram groups or Discord channels
  • The send_message tool already supports cross-platform messaging — this extends it to agent-to-agent communication across platforms

Research Sources

Gas Town (Steve Yegge)

steveyegge/gastown — 348K LOC Go system orchestrating 20-50+ concurrent coding agents.

Key patterns extracted:

  • Workflow Formulas — TOML-defined reusable workflows with 4 types: Convoy (parallel+synthesis), Workflow (sequential+deps), Expansion (template-based), Aspect (AOP cross-cutting). Uses Kahn's algorithm for topological sort.
  • GUPP (Gastown Universal Propulsion Principle) — "If you find work on your hook, YOU RUN IT." Agent work is durable — separate work state from process state. When an agent dies, work is recoverable.
  • Hierarchical health monitoring — 3-layer watchdog: Daemon (heartbeat) → Boot (ephemeral checker) → Deacon (persistent monitor). "Idle Town Principle": skip health checks when no active work.
  • Mail vs Nudge — Persistent messages (survive crashes) vs ephemeral reminders (zero-cost). Agents have "mail budgets."

CAMEL-AI / Eigent

camel-ai/camel — NeurIPS 2023 multi-agent framework. eigent-ai/Eigent — Production desktop app built on CAMEL.

Key patterns extracted:

  • Workforce 5-step lifecycle — Decompose → Assign → Execute → Complete → Handle Failures. The failure recovery escalation (Retry → Replan → Decompose) is the most valuable pattern.
  • LLM-based coordinator routing — Not rule-based; the coordinator LLM decides which worker handles each subtask. New worker types can be added without changing routing logic.
  • Agent pooling — Up to 10 pre-created agent clones per worker type with lazy initialization. Reduces spawning overhead for repetitive workflows.
  • Inception Prompting — Systematic prevention of multi-agent communication failures: role-flipping, instruction echoing, flake replies, infinite loops. See Feature: Inception Prompting — Hardened Sub-Agent Prompts Against Delegation Failures (inspired by CAMEL-AI) #375.
  • RolePlaying paradigm — Two-agent adversarial iteration (AI User + AI Assistant) for quality improvement. See Feature: Adversarial Debate Mode for Delegation — Two-Agent Iterative Refinement (inspired by CAMEL-AI) #376.
  • Task classification gate — Cheap classifier decides simple (single-agent) vs complex (multi-agent) before spinning up the workforce.

Implementation Roadmap

Phase 1: Workflow DAG + Synthesis (Foundation)

The minimum viable multi-agent upgrade. No new abstractions — just extend delegate_task.

  • Add workflow parameter for dependency-aware task graphs
  • DAG execution engine: topological sort, ready-step computation, parallel dispatch
  • Result passing: downstream steps receive upstream summaries as context
  • Add synthesis parameter for parallel tasks: aggregation sub-agent after batch completion
  • Keep backward compatibility: existing goal/tasks calls unchanged

Effort: Medium (~300-400 LOC in delegate_tool.py)
Dependencies: None
Unlocks: #377 (shared memory), structured multi-step workflows

Phase 2: Resilient Execution

Make multi-agent workflows survive failures.

Effort: Medium-High
Dependencies: Phase 1
Unlocks: Long-running workflows, unattended agent operation

Phase 3: Agent Roles & Cooperation

Move from ad-hoc delegation to structured agent teams.

Effort: High
Dependencies: Phase 1 + 2
Unlocks: #376 (debate mode), specialized agent teams

Phase 4: Advanced Patterns

Effort: High
Dependencies: Phase 1 + 2 + 3


Pros & Cons

Pros

  • Fills a real gap — Current delegate_task is the Terminal tool #1 most-requested area for improvement
  • Incremental — Each phase delivers value independently. Phase 1 alone (DAG + synthesis) is worth doing.
  • Patterns are proven — Gas Town runs 50+ concurrent agents; CAMEL-AI is NeurIPS-published research
  • No new dependencies — We're extracting patterns, not importing libraries
  • Backward compatible — All changes extend delegate_task, existing calls unchanged

Cons / Risks

  • Complexity — delegate_task is currently simple and well-understood. Multi-agent adds significant complexity.
  • Over-engineering risk — Most delegate_task usage is 1-3 independent parallel tasks. How often do users actually need DAG workflows?
  • Checkpoint serialization — Conversation state includes tool results referencing ephemeral resources. Perfect resumption is hard.
  • Cost multiplication — Multi-agent workflows burn more tokens. A 5-step workflow with debate mode could cost 20-50x a single agent call.
  • Testing difficulty — Multi-agent interactions are non-deterministic and hard to test reliably.

Open Questions

  1. Should workflows be a new tool or extend delegate_task? Extending keeps the interface unified but adds complexity. A separate workflow tool is cleaner but fragments the story.
  2. How to handle remote backends? Checkpointing works for local/Docker but is harder for Modal/SSH.
  3. Agent role definitions — config or code? Should roles be defined in YAML (like MCP server configs) or as Python classes?
  4. Concurrency limits? Current batch limit is 3. Should workflow steps have a separate, higher limit?
  5. Should the coordinator be opt-in or automatic? Users might want full control over task assignment vs auto-routing.
  6. How does this interact with the skills system? Could agent roles be defined as skills? A "researcher" skill that configures the agent's toolset and persona?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions