Skip to content

Feature: Cross-CLI Agent Orchestration — Mixed Workflows with External Agent CLIs (inspired by AgentWorkforce/relay) #413

@teknium1

Description

@teknium1

Overview

Today, Hermes Agent's delegate_task spawns in-process AIAgent children — clones of the Hermes agent runtime with restricted toolsets. This is powerful for tasks that benefit from Hermes's tool ecosystem (terminal, file ops, web, skills), but it means every subagent is a Hermes instance running the same LLM.

AgentWorkforce/relay takes a fundamentally different approach: it orchestrates external agent CLIs (Claude Code, Codex CLI, Gemini CLI, Aider, Goose, OpenCode) by wrapping them in PTY sessions, injecting messages as stdin, and capturing stdout. This means a single workflow can have Claude Code writing backend code, Codex CLI writing tests, and Gemini CLI handling documentation — each using their native tool ecosystem.

Hermes already has skills for spawning individual external agents (claude-code, codex, hermes-agent), but these are sequential, isolated invocations — one agent at a time, no inter-agent communication, no shared workflow context. This issue proposes extending our multi-agent architecture to support mixed workflows where Hermes subagents and external CLI agents coexist and collaborate.

Inspired by: relay's PTY-based agent wrapping and cross-CLI orchestration

Depends on: #344 (Multi-Agent Architecture — provides the workflow DAG infrastructure)

Related: Existing skills: claude-code, codex, hermes-agent (autonomous agent spawning skills)


Research Findings

How Relay's Cross-CLI Orchestration Works

Relay's broker spawns each agent CLI in a PTY session:

  1. Spawn: portable-pty crate opens a native PTY pair, spawns the CLI (e.g., claude --model sonnet) on the slave side
  2. Inject: Messages are written to PTY stdin as formatted text. The broker applies human cooldown (3s) and message coalescing (500ms window) to mimic natural interaction timing.
  3. Capture: A background thread reads 4KB chunks from PTY stdout via mpsc channel, forwarding as worker_stream events to the SDK
  4. Verify: Delivery verification checks that agent output echoes the injected message (confirms the agent actually received and processed it)
  5. Auto-approve: For CLIs that prompt for permission (Claude's "Allow?" dialogs), auto-approval logic sends "y" automatically

Supported CLIs: claude, codex, gemini, aider, goose, opencode, droid — each with CLI-specific argument injection (e.g., --model, auto-approval flags)

Two transport modes:

  • pty — Full PTY wrapping with ANSI capture (most agents)
  • headless — Subprocess without PTY for agents that support non-interactive mode (Claude, OpenCode)

Why This Matters for Hermes

Different agent CLIs have different strengths:

Agent CLI Strength Native Capabilities
Claude Code Deep codebase understanding, large context Multi-file editing, bash, MCP tools
Codex CLI Fast iteration, good at tests Sandboxed execution, auto-apply
Gemini CLI Google ecosystem integration Search grounding, long context
Aider Git-aware editing, repo maps Auto-commit, architect mode
Hermes Agent Rich tool ecosystem, skills, memory Everything in our toolset

A workflow that leverages the right agent for each step would produce better results than one-size-fits-all Hermes subagents for everything.

Example workflow:

Step 1: Hermes subagent researches the API (web tools, arxiv)
Step 2: Claude Code implements the backend (deep codebase context)
Step 3: Codex CLI writes tests (fast, sandboxed execution)
Step 4: Hermes subagent reviews and integrates (skills, memory, git workflow)

Current State in Hermes Agent

Native subagents (delegate_tool.py):

  • Spawns in-process AIAgent instances
  • Full access to Hermes tool ecosystem (minus blocked tools)
  • All subagents use the same LLM provider/model (inherited from parent)
  • In-memory, fast, no PTY overhead

External agent skills:

  • claude-code skill — Delegates to Claude Code CLI via terminal(pty=true)
  • codex skill — Delegates to Codex CLI via terminal(pty=true) or subprocess
  • hermes-agent skill — Spawns additional Hermes instances

Gap: These skills are invoked sequentially by the parent agent, one at a time. There's no way to:


Implementation Plan

Skill vs. Tool Classification

This is a codebase change extending delegate_tool.py to support external agent backends. It needs custom Python logic for PTY management, process lifecycle, output parsing, and integration with the workflow DAG engine from #344. This is a tool extension, not a skill — skills can't manage concurrent PTY sessions with inter-agent routing.

What We'd Need

  1. Agent backend abstractionAgentBackend base class with implementations for HermesBackend (current AIAgent spawning) and ExternalCLIBackend (PTY-based CLI wrapping)
  2. PTY management — Spawn, inject, capture, and verify for external CLIs using Python's pty module or pexpect
  3. CLI-specific adapters — Model flag injection, auto-approval, output parsing per CLI
  4. Output extraction — Parse the relevant output from CLI stdout (stripping ANSI codes, tool noise, prompts)
  5. Integration with workflow DAG — External agent steps in delegate_task(workflow=[...]) with backend: "claude" or backend: "codex"

Phased Rollout

Phase 1: External Agent Steps in Workflows (Depends on #344 Phase 1)

Add a cli parameter to workflow steps in delegate_task:

delegate_task(
    workflow=[
        {"id": "research", "goal": "Research the Stripe API",
         "context": "..."},  # Hermes subagent (default)
        {"id": "implement", "goal": "Implement the payment client",
         "needs": ["research"],
         "cli": "claude",  # Use Claude Code CLI
         "model": "sonnet"},
        {"id": "test", "goal": "Write integration tests",
         "needs": ["implement"],
         "cli": "codex",  # Use Codex CLI
         "model": "gpt-5.2-codex"},
        {"id": "review", "goal": "Review and integrate",
         "needs": ["implement", "test"]}  # Back to Hermes subagent
    ]
)

When cli is specified:

  1. Spawn the CLI in a PTY session (using existing terminal(pty=true) infrastructure)
  2. Inject the task + upstream context as the initial prompt
  3. Capture output, strip ANSI codes, extract the meaningful result
  4. Pass result downstream via output chaining

Initially supported CLIs: claude, codex (we already have skills with the CLI invocation patterns)

  • Deliverable: Mixed Hermes + external agent workflows

Phase 2: Parallel External Agents

Allow multiple external CLI agents to run concurrently in the same workflow:

  • Separate PTY sessions per agent

  • Thread-safe output collection

  • Process lifecycle management (timeout, kill, restart)

  • CLI-specific auto-approval (Claude's permission prompts, Codex's confirmation dialogs)

  • Deliverable: Fan-out patterns with mixed agent types

Phase 3: Bidirectional Communication with External Agents

Enable external agents to participate in iterative workflows:

  • Inject follow-up messages into running PTY sessions (not just initial prompts)

  • Parse agent responses for structured signals (completion, failure, questions)

  • Support debate/review-loop patterns between a Hermes subagent and an external CLI

  • Handle CLI-specific conversation patterns (Claude's /clear, Codex's --json mode)

  • Deliverable: External agents as full participants in orchestration patterns


Pros & Cons

Pros

  • Best agent for each task — Claude Code for deep refactoring, Codex for fast tests, Hermes for research and integration
  • Leverages existing infrastructure — Our terminal(pty=true) and agent skills already handle CLI spawning. This extends, not replaces.
  • Model diversity without API complexity — Each CLI handles its own auth, context, and tool ecosystem. We just orchestrate.
  • Incremental — Phase 1 is a thin layer over existing skill patterns. No need to build everything at once.
  • Future-proof — As new agent CLIs emerge (Goose, Aider, OpenCode), adding support is just a new adapter

Cons / Risks

  • Output parsing is fragile — CLI output formats change between versions. ANSI stripping, prompt detection, and result extraction need maintenance.
  • No tool-level integration — External agents use their own tools, not ours. We can't see what files they edited, what commands they ran, etc. (only their stdout).
  • Auth complexity — Each CLI needs its own API keys configured. Claude needs ANTHROPIC_API_KEY, Codex needs OPENAI_API_KEY, etc.
  • Latency — PTY spawning + CLI startup + model loading is slower than in-process AIAgent creation
  • Debugging difficulty — When an external agent fails, we only see its stdout. No structured error data, no tool call logs.
  • Depends on CLIs being installed — Users need claude, codex, etc. installed separately. Preflight checks needed.
  • Version coupling — CLI behavior changes (new flags, changed output format) can break adapters

Open Questions

  1. Should this use the existing skill infrastructure? The claude-code and codex skills already know how to invoke these CLIs. Could the workflow engine invoke skills as steps rather than building a separate PTY layer?
  2. Output extraction strategy? Options: (a) capture all stdout and let the parent LLM parse it, (b) structured output parsing per CLI, (c) require external agents to write results to a file that we read. Option (c) is most reliable but least flexible.
  3. How to handle context passing? External CLIs don't have our context parameter. Options: include context in the task prompt text, write context to a file the CLI can read, or use CLI-specific mechanisms (Claude's /add-context, Codex's --file flag).
  4. Should external agents be able to write to our shared memory (Feature: Shared Memory Pools Between Sub-Agents in Workflows (inspired by CAMEL-AI) #377)? They'd need a mechanism to do so (file-based bridge?), or we accept they're output-only participants.
  5. Preflight vs runtime CLI detection? Should we check for installed CLIs at workflow parse time (fail fast) or at step execution time (allow partial workflows)?
  6. License considerations? We're orchestrating, not importing. The CLIs are user-installed tools. No license concerns for our codebase, but users need valid subscriptions to each service.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions