You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, Hermes Agent's delegate_task spawns in-process AIAgent children — clones of the Hermes agent runtime with restricted toolsets. This is powerful for tasks that benefit from Hermes's tool ecosystem (terminal, file ops, web, skills), but it means every subagent is a Hermes instance running the same LLM.
AgentWorkforce/relay takes a fundamentally different approach: it orchestrates external agent CLIs (Claude Code, Codex CLI, Gemini CLI, Aider, Goose, OpenCode) by wrapping them in PTY sessions, injecting messages as stdin, and capturing stdout. This means a single workflow can have Claude Code writing backend code, Codex CLI writing tests, and Gemini CLI handling documentation — each using their native tool ecosystem.
Hermes already has skills for spawning individual external agents (claude-code, codex, hermes-agent), but these are sequential, isolated invocations — one agent at a time, no inter-agent communication, no shared workflow context. This issue proposes extending our multi-agent architecture to support mixed workflows where Hermes subagents and external CLI agents coexist and collaborate.
Inspired by: relay's PTY-based agent wrapping and cross-CLI orchestration
Depends on:#344 (Multi-Agent Architecture — provides the workflow DAG infrastructure)
Relay's broker spawns each agent CLI in a PTY session:
Spawn:portable-pty crate opens a native PTY pair, spawns the CLI (e.g., claude --model sonnet) on the slave side
Inject: Messages are written to PTY stdin as formatted text. The broker applies human cooldown (3s) and message coalescing (500ms window) to mimic natural interaction timing.
Capture: A background thread reads 4KB chunks from PTY stdout via mpsc channel, forwarding as worker_stream events to the SDK
Verify: Delivery verification checks that agent output echoes the injected message (confirms the agent actually received and processed it)
Auto-approve: For CLIs that prompt for permission (Claude's "Allow?" dialogs), auto-approval logic sends "y" automatically
Supported CLIs: claude, codex, gemini, aider, goose, opencode, droid — each with CLI-specific argument injection (e.g., --model, auto-approval flags)
Two transport modes:
pty — Full PTY wrapping with ANSI capture (most agents)
headless — Subprocess without PTY for agents that support non-interactive mode (Claude, OpenCode)
Why This Matters for Hermes
Different agent CLIs have different strengths:
Agent CLI
Strength
Native Capabilities
Claude Code
Deep codebase understanding, large context
Multi-file editing, bash, MCP tools
Codex CLI
Fast iteration, good at tests
Sandboxed execution, auto-apply
Gemini CLI
Google ecosystem integration
Search grounding, long context
Aider
Git-aware editing, repo maps
Auto-commit, architect mode
Hermes Agent
Rich tool ecosystem, skills, memory
Everything in our toolset
A workflow that leverages the right agent for each step would produce better results than one-size-fits-all Hermes subagents for everything.
Example workflow:
Step 1: Hermes subagent researches the API (web tools, arxiv)
Step 2: Claude Code implements the backend (deep codebase context)
Step 3: Codex CLI writes tests (fast, sandboxed execution)
Step 4: Hermes subagent reviews and integrates (skills, memory, git workflow)
Current State in Hermes Agent
Native subagents (delegate_tool.py):
Spawns in-process AIAgent instances
Full access to Hermes tool ecosystem (minus blocked tools)
All subagents use the same LLM provider/model (inherited from parent)
In-memory, fast, no PTY overhead
External agent skills:
claude-code skill — Delegates to Claude Code CLI via terminal(pty=true)
codex skill — Delegates to Codex CLI via terminal(pty=true) or subprocess
This is a codebase change extending delegate_tool.py to support external agent backends. It needs custom Python logic for PTY management, process lifecycle, output parsing, and integration with the workflow DAG engine from #344. This is a tool extension, not a skill — skills can't manage concurrent PTY sessions with inter-agent routing.
What We'd Need
Agent backend abstraction — AgentBackend base class with implementations for HermesBackend (current AIAgent spawning) and ExternalCLIBackend (PTY-based CLI wrapping)
PTY management — Spawn, inject, capture, and verify for external CLIs using Python's pty module or pexpect
CLI-specific adapters — Model flag injection, auto-approval, output parsing per CLI
Output extraction — Parse the relevant output from CLI stdout (stripping ANSI codes, tool noise, prompts)
Integration with workflow DAG — External agent steps in delegate_task(workflow=[...]) with backend: "claude" or backend: "codex"
Phased Rollout
Phase 1: External Agent Steps in Workflows (Depends on #344 Phase 1)
Add a cli parameter to workflow steps in delegate_task:
delegate_task(
workflow=[
{"id": "research", "goal": "Research the Stripe API",
"context": "..."}, # Hermes subagent (default)
{"id": "implement", "goal": "Implement the payment client",
"needs": ["research"],
"cli": "claude", # Use Claude Code CLI"model": "sonnet"},
{"id": "test", "goal": "Write integration tests",
"needs": ["implement"],
"cli": "codex", # Use Codex CLI"model": "gpt-5.2-codex"},
{"id": "review", "goal": "Review and integrate",
"needs": ["implement", "test"]} # Back to Hermes subagent
]
)
When cli is specified:
Spawn the CLI in a PTY session (using existing terminal(pty=true) infrastructure)
Inject the task + upstream context as the initial prompt
Capture output, strip ANSI codes, extract the meaningful result
Pass result downstream via output chaining
Initially supported CLIs: claude, codex (we already have skills with the CLI invocation patterns)
Deliverable: External agents as full participants in orchestration patterns
Pros & Cons
Pros
Best agent for each task — Claude Code for deep refactoring, Codex for fast tests, Hermes for research and integration
Leverages existing infrastructure — Our terminal(pty=true) and agent skills already handle CLI spawning. This extends, not replaces.
Model diversity without API complexity — Each CLI handles its own auth, context, and tool ecosystem. We just orchestrate.
Incremental — Phase 1 is a thin layer over existing skill patterns. No need to build everything at once.
Future-proof — As new agent CLIs emerge (Goose, Aider, OpenCode), adding support is just a new adapter
Cons / Risks
Output parsing is fragile — CLI output formats change between versions. ANSI stripping, prompt detection, and result extraction need maintenance.
No tool-level integration — External agents use their own tools, not ours. We can't see what files they edited, what commands they ran, etc. (only their stdout).
Auth complexity — Each CLI needs its own API keys configured. Claude needs ANTHROPIC_API_KEY, Codex needs OPENAI_API_KEY, etc.
Latency — PTY spawning + CLI startup + model loading is slower than in-process AIAgent creation
Debugging difficulty — When an external agent fails, we only see its stdout. No structured error data, no tool call logs.
Depends on CLIs being installed — Users need claude, codex, etc. installed separately. Preflight checks needed.
Version coupling — CLI behavior changes (new flags, changed output format) can break adapters
Open Questions
Should this use the existing skill infrastructure? The claude-code and codex skills already know how to invoke these CLIs. Could the workflow engine invoke skills as steps rather than building a separate PTY layer?
Output extraction strategy? Options: (a) capture all stdout and let the parent LLM parse it, (b) structured output parsing per CLI, (c) require external agents to write results to a file that we read. Option (c) is most reliable but least flexible.
How to handle context passing? External CLIs don't have our context parameter. Options: include context in the task prompt text, write context to a file the CLI can read, or use CLI-specific mechanisms (Claude's /add-context, Codex's --file flag).
Preflight vs runtime CLI detection? Should we check for installed CLIs at workflow parse time (fail fast) or at step execution time (allow partial workflows)?
License considerations? We're orchestrating, not importing. The CLIs are user-installed tools. No license concerns for our codebase, but users need valid subscriptions to each service.
Overview
Today, Hermes Agent's
delegate_taskspawns in-processAIAgentchildren — clones of the Hermes agent runtime with restricted toolsets. This is powerful for tasks that benefit from Hermes's tool ecosystem (terminal, file ops, web, skills), but it means every subagent is a Hermes instance running the same LLM.AgentWorkforce/relay takes a fundamentally different approach: it orchestrates external agent CLIs (Claude Code, Codex CLI, Gemini CLI, Aider, Goose, OpenCode) by wrapping them in PTY sessions, injecting messages as stdin, and capturing stdout. This means a single workflow can have Claude Code writing backend code, Codex CLI writing tests, and Gemini CLI handling documentation — each using their native tool ecosystem.
Hermes already has skills for spawning individual external agents (
claude-code,codex,hermes-agent), but these are sequential, isolated invocations — one agent at a time, no inter-agent communication, no shared workflow context. This issue proposes extending our multi-agent architecture to support mixed workflows where Hermes subagents and external CLI agents coexist and collaborate.Inspired by: relay's PTY-based agent wrapping and cross-CLI orchestration
Depends on: #344 (Multi-Agent Architecture — provides the workflow DAG infrastructure)
Related: Existing skills:
claude-code,codex,hermes-agent(autonomous agent spawning skills)Research Findings
How Relay's Cross-CLI Orchestration Works
Relay's broker spawns each agent CLI in a PTY session:
portable-ptycrate opens a native PTY pair, spawns the CLI (e.g.,claude --model sonnet) on the slave sideworker_streamevents to the SDKSupported CLIs: claude, codex, gemini, aider, goose, opencode, droid — each with CLI-specific argument injection (e.g.,
--model, auto-approval flags)Two transport modes:
pty— Full PTY wrapping with ANSI capture (most agents)headless— Subprocess without PTY for agents that support non-interactive mode (Claude, OpenCode)Why This Matters for Hermes
Different agent CLIs have different strengths:
A workflow that leverages the right agent for each step would produce better results than one-size-fits-all Hermes subagents for everything.
Example workflow:
Current State in Hermes Agent
Native subagents (
delegate_tool.py):AIAgentinstancesExternal agent skills:
claude-codeskill — Delegates to Claude Code CLI viaterminal(pty=true)codexskill — Delegates to Codex CLI viaterminal(pty=true)or subprocesshermes-agentskill — Spawns additional Hermes instancesGap: These skills are invoked sequentially by the parent agent, one at a time. There's no way to:
Implementation Plan
Skill vs. Tool Classification
This is a codebase change extending
delegate_tool.pyto support external agent backends. It needs custom Python logic for PTY management, process lifecycle, output parsing, and integration with the workflow DAG engine from #344. This is a tool extension, not a skill — skills can't manage concurrent PTY sessions with inter-agent routing.What We'd Need
AgentBackendbase class with implementations forHermesBackend(current AIAgent spawning) andExternalCLIBackend(PTY-based CLI wrapping)ptymodule orpexpectdelegate_task(workflow=[...])withbackend: "claude"orbackend: "codex"Phased Rollout
Phase 1: External Agent Steps in Workflows (Depends on #344 Phase 1)
Add a
cliparameter to workflow steps indelegate_task:When
cliis specified:terminal(pty=true)infrastructure)Initially supported CLIs:
claude,codex(we already have skills with the CLI invocation patterns)Phase 2: Parallel External Agents
Allow multiple external CLI agents to run concurrently in the same workflow:
Separate PTY sessions per agent
Thread-safe output collection
Process lifecycle management (timeout, kill, restart)
CLI-specific auto-approval (Claude's permission prompts, Codex's confirmation dialogs)
Deliverable: Fan-out patterns with mixed agent types
Phase 3: Bidirectional Communication with External Agents
Enable external agents to participate in iterative workflows:
Inject follow-up messages into running PTY sessions (not just initial prompts)
Parse agent responses for structured signals (completion, failure, questions)
Support debate/review-loop patterns between a Hermes subagent and an external CLI
Handle CLI-specific conversation patterns (Claude's
/clear, Codex's--jsonmode)Deliverable: External agents as full participants in orchestration patterns
Pros & Cons
Pros
terminal(pty=true)and agent skills already handle CLI spawning. This extends, not replaces.Cons / Risks
ANTHROPIC_API_KEY, Codex needsOPENAI_API_KEY, etc.claude,codex, etc. installed separately. Preflight checks needed.Open Questions
claude-codeandcodexskills already know how to invoke these CLIs. Could the workflow engine invoke skills as steps rather than building a separate PTY layer?contextparameter. Options: include context in the task prompt text, write context to a file the CLI can read, or use CLI-specific mechanisms (Claude's/add-context, Codex's--fileflag).References
src/pty.rs— PTY session management withportable-ptysrc/inject.rs— Message injection with retry and verificationsrc/spawner.rs— CLI-specific spawn configurationclaude-code,codex,hermes-agent— Existing CLI invocation patternstools/delegate_tool.py— Current in-process subagent spawning