Feature: Cross-CLI Agent Orchestration — Mixed Workflows with External Agent CLIs (inspired by AgentWorkforce/relay)

## Overview

Today, Hermes Agent's `delegate_task` spawns in-process `AIAgent` children — clones of the Hermes agent runtime with restricted toolsets. This is powerful for tasks that benefit from Hermes's tool ecosystem (terminal, file ops, web, skills), but it means **every subagent is a Hermes instance running the same LLM**.

[AgentWorkforce/relay](https://github.com/AgentWorkforce/relay) takes a fundamentally different approach: it orchestrates **external agent CLIs** (Claude Code, Codex CLI, Gemini CLI, Aider, Goose, OpenCode) by wrapping them in PTY sessions, injecting messages as stdin, and capturing stdout. This means a single workflow can have Claude Code writing backend code, Codex CLI writing tests, and Gemini CLI handling documentation — each using their native tool ecosystem.

Hermes already has skills for spawning individual external agents (`claude-code`, `codex`, `hermes-agent`), but these are **sequential, isolated invocations** — one agent at a time, no inter-agent communication, no shared workflow context. This issue proposes extending our multi-agent architecture to support **mixed workflows** where Hermes subagents and external CLI agents coexist and collaborate.

**Inspired by:** relay's PTY-based agent wrapping and cross-CLI orchestration

**Depends on:** #344 (Multi-Agent Architecture — provides the workflow DAG infrastructure)

**Related:** Existing skills: `claude-code`, `codex`, `hermes-agent` (autonomous agent spawning skills)

---

## Research Findings

### How Relay's Cross-CLI Orchestration Works

Relay's broker spawns each agent CLI in a PTY session:

1. **Spawn:** `portable-pty` crate opens a native PTY pair, spawns the CLI (e.g., `claude --model sonnet`) on the slave side
2. **Inject:** Messages are written to PTY stdin as formatted text. The broker applies human cooldown (3s) and message coalescing (500ms window) to mimic natural interaction timing.
3. **Capture:** A background thread reads 4KB chunks from PTY stdout via mpsc channel, forwarding as `worker_stream` events to the SDK
4. **Verify:** Delivery verification checks that agent output echoes the injected message (confirms the agent actually received and processed it)
5. **Auto-approve:** For CLIs that prompt for permission (Claude's "Allow?" dialogs), auto-approval logic sends "y" automatically

**Supported CLIs:** claude, codex, gemini, aider, goose, opencode, droid — each with CLI-specific argument injection (e.g., `--model`, auto-approval flags)

**Two transport modes:**
- `pty` — Full PTY wrapping with ANSI capture (most agents)
- `headless` — Subprocess without PTY for agents that support non-interactive mode (Claude, OpenCode)

### Why This Matters for Hermes

Different agent CLIs have **different strengths:**

| Agent CLI | Strength | Native Capabilities |
|-----------|----------|-------------------|
| Claude Code | Deep codebase understanding, large context | Multi-file editing, bash, MCP tools |
| Codex CLI | Fast iteration, good at tests | Sandboxed execution, auto-apply |
| Gemini CLI | Google ecosystem integration | Search grounding, long context |
| Aider | Git-aware editing, repo maps | Auto-commit, architect mode |
| Hermes Agent | Rich tool ecosystem, skills, memory | Everything in our toolset |

A workflow that leverages the right agent for each step would produce better results than one-size-fits-all Hermes subagents for everything.

**Example workflow:**
```
Step 1: Hermes subagent researches the API (web tools, arxiv)
Step 2: Claude Code implements the backend (deep codebase context)
Step 3: Codex CLI writes tests (fast, sandboxed execution)
Step 4: Hermes subagent reviews and integrates (skills, memory, git workflow)
```

---

## Current State in Hermes Agent

**Native subagents (`delegate_tool.py`):**
- Spawns in-process `AIAgent` instances
- Full access to Hermes tool ecosystem (minus blocked tools)
- All subagents use the same LLM provider/model (inherited from parent)
- In-memory, fast, no PTY overhead

**External agent skills:**
- `claude-code` skill — Delegates to Claude Code CLI via `terminal(pty=true)`
- `codex` skill — Delegates to Codex CLI via `terminal(pty=true)` or subprocess
- `hermes-agent` skill — Spawns additional Hermes instances

**Gap:** These skills are invoked **sequentially by the parent agent**, one at a time. There's no way to:
- Run Claude Code and Codex in parallel on different parts of a workflow
- Have an external agent's output feed into another agent's input (inter-agent communication)
- Mix Hermes subagents and external CLIs in the same workflow DAG
- Apply the orchestration patterns from #344 (fan-out, pipeline, debate) to external agents

---

## Implementation Plan

### Skill vs. Tool Classification

This is a **codebase change** extending `delegate_tool.py` to support external agent backends. It needs custom Python logic for PTY management, process lifecycle, output parsing, and integration with the workflow DAG engine from #344. This is a **tool extension**, not a skill — skills can't manage concurrent PTY sessions with inter-agent routing.

### What We'd Need

1. **Agent backend abstraction** — `AgentBackend` base class with implementations for `HermesBackend` (current AIAgent spawning) and `ExternalCLIBackend` (PTY-based CLI wrapping)
2. **PTY management** — Spawn, inject, capture, and verify for external CLIs using Python's `pty` module or `pexpect`
3. **CLI-specific adapters** — Model flag injection, auto-approval, output parsing per CLI
4. **Output extraction** — Parse the relevant output from CLI stdout (stripping ANSI codes, tool noise, prompts)
5. **Integration with workflow DAG** — External agent steps in `delegate_task(workflow=[...])` with `backend: "claude"` or `backend: "codex"`

### Phased Rollout

**Phase 1: External Agent Steps in Workflows (Depends on #344 Phase 1)**

Add a `cli` parameter to workflow steps in `delegate_task`:

```python
delegate_task(
    workflow=[
        {"id": "research", "goal": "Research the Stripe API",
         "context": "..."},  # Hermes subagent (default)
        {"id": "implement", "goal": "Implement the payment client",
         "needs": ["research"],
         "cli": "claude",  # Use Claude Code CLI
         "model": "sonnet"},
        {"id": "test", "goal": "Write integration tests",
         "needs": ["implement"],
         "cli": "codex",  # Use Codex CLI
         "model": "gpt-5.2-codex"},
        {"id": "review", "goal": "Review and integrate",
         "needs": ["implement", "test"]}  # Back to Hermes subagent
    ]
)
```

When `cli` is specified:
1. Spawn the CLI in a PTY session (using existing `terminal(pty=true)` infrastructure)
2. Inject the task + upstream context as the initial prompt
3. Capture output, strip ANSI codes, extract the meaningful result
4. Pass result downstream via output chaining

Initially supported CLIs: `claude`, `codex` (we already have skills with the CLI invocation patterns)

- Deliverable: Mixed Hermes + external agent workflows

**Phase 2: Parallel External Agents**

Allow multiple external CLI agents to run concurrently in the same workflow:
- Separate PTY sessions per agent
- Thread-safe output collection
- Process lifecycle management (timeout, kill, restart)
- CLI-specific auto-approval (Claude's permission prompts, Codex's confirmation dialogs)

- Deliverable: Fan-out patterns with mixed agent types

**Phase 3: Bidirectional Communication with External Agents**

Enable external agents to participate in iterative workflows:
- Inject follow-up messages into running PTY sessions (not just initial prompts)
- Parse agent responses for structured signals (completion, failure, questions)
- Support debate/review-loop patterns between a Hermes subagent and an external CLI
- Handle CLI-specific conversation patterns (Claude's `/clear`, Codex's `--json` mode)

- Deliverable: External agents as full participants in orchestration patterns

---

## Pros & Cons

### Pros
- **Best agent for each task** — Claude Code for deep refactoring, Codex for fast tests, Hermes for research and integration
- **Leverages existing infrastructure** — Our `terminal(pty=true)` and agent skills already handle CLI spawning. This extends, not replaces.
- **Model diversity without API complexity** — Each CLI handles its own auth, context, and tool ecosystem. We just orchestrate.
- **Incremental** — Phase 1 is a thin layer over existing skill patterns. No need to build everything at once.
- **Future-proof** — As new agent CLIs emerge (Goose, Aider, OpenCode), adding support is just a new adapter

### Cons / Risks
- **Output parsing is fragile** — CLI output formats change between versions. ANSI stripping, prompt detection, and result extraction need maintenance.
- **No tool-level integration** — External agents use their own tools, not ours. We can't see what files they edited, what commands they ran, etc. (only their stdout).
- **Auth complexity** — Each CLI needs its own API keys configured. Claude needs `ANTHROPIC_API_KEY`, Codex needs `OPENAI_API_KEY`, etc.
- **Latency** — PTY spawning + CLI startup + model loading is slower than in-process AIAgent creation
- **Debugging difficulty** — When an external agent fails, we only see its stdout. No structured error data, no tool call logs.
- **Depends on CLIs being installed** — Users need `claude`, `codex`, etc. installed separately. Preflight checks needed.
- **Version coupling** — CLI behavior changes (new flags, changed output format) can break adapters

---

## Open Questions

1. **Should this use the existing skill infrastructure?** The `claude-code` and `codex` skills already know how to invoke these CLIs. Could the workflow engine invoke skills as steps rather than building a separate PTY layer?
2. **Output extraction strategy?** Options: (a) capture all stdout and let the parent LLM parse it, (b) structured output parsing per CLI, (c) require external agents to write results to a file that we read. Option (c) is most reliable but least flexible.
3. **How to handle context passing?** External CLIs don't have our `context` parameter. Options: include context in the task prompt text, write context to a file the CLI can read, or use CLI-specific mechanisms (Claude's `/add-context`, Codex's `--file` flag).
4. **Should external agents be able to write to our shared memory (#377)?** They'd need a mechanism to do so (file-based bridge?), or we accept they're output-only participants.
5. **Preflight vs runtime CLI detection?** Should we check for installed CLIs at workflow parse time (fail fast) or at step execution time (allow partial workflows)?
6. **License considerations?** We're orchestrating, not importing. The CLIs are user-installed tools. No license concerns for our codebase, but users need valid subscriptions to each service.

---

## References

- [AgentWorkforce/relay](https://github.com/AgentWorkforce/relay) — PTY-based cross-CLI orchestration (Apache-2.0)
- relay `src/pty.rs` — PTY session management with `portable-pty`
- relay `src/inject.rs` — Message injection with retry and verification
- relay `src/spawner.rs` — CLI-specific spawn configuration
- Hermes skills: `claude-code`, `codex`, `hermes-agent` — Existing CLI invocation patterns
- Hermes `tools/delegate_tool.py` — Current in-process subagent spawning
- #344 — Multi-Agent Architecture (prerequisite — provides workflow DAG infrastructure)
- #376 — Adversarial Debate Mode (could include external agents as participants)
- #377 — Shared Memory Pools (context sharing with external agents is an open question)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Cross-CLI Agent Orchestration — Mixed Workflows with External Agent CLIs (inspired by AgentWorkforce/relay) #413

Overview

Research Findings

How Relay's Cross-CLI Orchestration Works

Why This Matters for Hermes

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Agent CLI	Strength	Native Capabilities
Claude Code	Deep codebase understanding, large context	Multi-file editing, bash, MCP tools
Codex CLI	Fast iteration, good at tests	Sandboxed execution, auto-apply
Gemini CLI	Google ecosystem integration	Search grounding, long context
Aider	Git-aware editing, repo maps	Auto-commit, architect mode
Hermes Agent	Rich tool ecosystem, skills, memory	Everything in our toolset

Feature: Cross-CLI Agent Orchestration — Mixed Workflows with External Agent CLIs (inspired by AgentWorkforce/relay) #413

Description

Overview

Research Findings

How Relay's Cross-CLI Orchestration Works

Why This Matters for Hermes

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions