Feature: Zeroshot Skill — Multi-Agent Blind Validation Orchestration via CLI

## Overview

[Zeroshot](https://github.com/covibes/zeroshot) is an open-source (MIT) multi-agent CLI orchestrator that runs autonomous coding workflows using a **planner → implementer → validator** pipeline with "blind validation." Unlike single-agent tools, Zeroshot spawns independent validator agents that verify code changes by examining the codebase directly — without ever seeing the worker's internal reasoning or output history. This produces significantly more reliable code than single-agent workflows.

Hermes already has skills for [Claude Code](https://github.com/NousResearch/hermes-agent) and [Codex](https://github.com/NousResearch/hermes-agent) as individual agent CLIs, but uses them as single-agent tools. A Zeroshot skill would add a multi-agent orchestration layer on top of those same CLIs, giving Hermes users immediate access to battle-tested multi-agent coding workflows without waiting for native multi-agent architecture (#344).

This is a concrete, available-today capability that complements the longer-term native multi-agent work planned in #344, #356, and #406. Use zeroshot now; build native later.

---

## Research Findings

### How Zeroshot Works

**Architecture (63K LOC, Node.js, v5.4.0, 1,274 stars):**

Zeroshot is a message-driven coordination layer built on four primitives:

1. **Template-Driven Agent Topologies** — JSON workflow definitions specify agent roles, triggers, context strategies, and hooks. Built-in templates: `single-worker`, `worker-validator`, `full-workflow`, `heavy-validation`. Users can create custom cluster templates.

2. **Two-Tier Conductor Classification** — A cheap "junior conductor" (Sonnet-class) classifies tasks into a 2D matrix of complexity (TRIVIAL/SIMPLE/STANDARD/CRITICAL) × task type (INQUIRY/TASK/DEBUG). Uncertain cases escalate to a "senior conductor" (Opus-class). This routes to the appropriate template:

   | Complexity | Agents | Validators | Use Case |
   |:---|:---|:---|:---|
   | TRIVIAL | 1 | 0 | Fix typo in README |
   | SIMPLE | 2 | 1 (generic) | Add dark mode toggle |
   | STANDARD | 4 | 2 (reqs + code) | Refactor auth system |
   | CRITICAL | 7 | 5 (security, testing, etc.) | Implement payment flow |

3. **SQLite Event Ledger** — Every message, agent output, and state transition is persisted in a per-cluster SQLite database (`~/.zeroshot/<clusterId>.db`). This enables crash recovery (`zeroshot resume <id>`), post-mortem analysis, and token usage tracking.

4. **Message Bus** — EventEmitter-based pub/sub over the ledger. Topics include `ISSUE_OPENED`, `PLAN_READY`, `IMPLEMENTATION_READY`, `VALIDATION_RESULT`. Agent triggers subscribe to specific topics with optional JS logic conditions.

**The Blind Validation Pattern:**

This is the key innovation. It works through context isolation, not code isolation:

- Validators' `contextStrategy.sources` config explicitly EXCLUDES worker output topics
- Validators receive: the original issue, the plan, and a notification that implementation is ready
- Validators do NOT receive: the worker's code output, reasoning, or internal context
- Validators must independently examine the actual codebase on disk using file read/grep/glob tools
- This means "the validator can't lie about code they didn't write" — they verify ground truth

**Provider Abstraction:**

Zeroshot shells out to existing CLI tools. Each provider implements `isAvailable()`, `buildCommand()`, and `parseEvent()`:
- **Claude Code:** `claude --print --output-format json --dangerously-skip-permissions`
- **Codex:** `codex exec --json --dangerously-bypass-approvals-and-sandbox`
- **Gemini:** `gemini -p <context> --output-format stream-json --yolo`
- **OpenCode:** Similar pattern

Safety hooks block user-interactive prompts and dangerous git commands.

**Isolation Modes:**
- **None** (default) — Direct filesystem, manual review
- **Git Worktree** (`--worktree`) — Separate branch/directory, clean PR workflow
- **Docker** (`--docker`) — Full container isolation for risky or parallel tasks

**Full Automation:** `zeroshot run 123 --ship` = worktree isolation → implement → validate → create PR → auto-merge

### Key Design Decisions

1. **CLI shelling over API calls** — Zeroshot delegates to existing provider CLIs rather than making API calls directly. This means auth, model selection, and tool execution are handled by each provider's own CLI. Trade-off: simpler integration but heavier process overhead.

2. **Template-driven over code-driven** — Workflow topologies are JSON configs, not code. This makes it easy to create custom workflows but limits programmatic flexibility.

3. **Two-tier conductor** — Avoids paying for expensive models on trivial tasks. The cheap classifier handles 80%+ of routing decisions.

4. **Per-cluster SQLite** — Each cluster gets its own database file, avoiding cross-contamination and enabling clean resume/cleanup.

---

## Current State in Hermes Agent

**Existing capabilities:**
- `claude-code` skill — single-agent delegation to Claude Code CLI
- `codex` skill — single-agent delegation to Codex CLI
- `hermes-agent-spawning` skill — spawn Hermes sub-processes
- `delegate_task` tool — spawn sub-agents within Hermes

**What's missing (the gap):**
- No multi-agent orchestration with validation loops
- No blind validation (independent verification of agent output)
- No complexity-based routing (all tasks get the same treatment)
- No crash recovery for long-running multi-agent workflows
- No automated issue-to-PR pipeline with verification gates

**Related open issues:**
- #344: Multi-Agent Architecture (umbrella — native implementation planned)
- #356: Acceptance Criteria & Independent Judge (blind validation for delegate_task)
- #406: Independent Code Verification & Quality Gates (code-specific validation)
- #413: Cross-CLI Agent Orchestration (external CLI wrapping)

**Relationship to existing issues:** This skill is complementary, not duplicative. Existing issues propose building these patterns natively into Hermes. The Zeroshot skill provides the same capabilities TODAY by wrapping an external tool, serving as a bridge until native multi-agent support lands.

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **skill** (not a tool) because:
- Zeroshot is an external CLI invoked via `terminal` — no custom Python integration needed
- No API key management by Hermes — zeroshot and its providers handle their own auth
- All interaction is through shell commands and text output
- Fits the same pattern as the existing `claude-code` and `codex` skills

**Placement:** Skills Hub (not bundled). Requires Node 18+, npm, and at least one provider CLI — too specialized for the default install.

**Category:** `autonomous-ai-agents` (alongside `claude-code`, `codex`, `hermes-agent-spawning`)

### What We'd Need

- Zeroshot skill (SKILL.md) with:
  - Prerequisites and installation instructions
  - Usage patterns for each workflow type
  - Provider setup guidance
  - Monitoring and management commands
  - Integration with existing Hermes workflows (e.g., issue triage → zeroshot run)

### Phased Rollout

**Phase 1: Basic Skill**
- SKILL.md covering installation, basic usage, and key commands
- Patterns: `zeroshot run <issue>`, `zeroshot run "text"`, `zeroshot run file.md`
- Monitoring: `zeroshot status`, `zeroshot logs`, `zeroshot list`
- Management: `zeroshot resume`, `zeroshot stop`, `zeroshot kill`
- Provider detection and setup guidance

**Phase 2: Advanced Workflows**
- Git worktree and Docker isolation patterns
- PR automation: `--pr` and `--ship` workflows
- Custom cluster templates for Hermes-specific use cases
- Parallel cluster management (multiple issues at once)
- Integration with `schedule_cronjob` for batch issue processing

**Phase 3: Native Pattern Adoption**
- Extract architectural insights from zeroshot into Hermes' native multi-agent system (#344)
- Blind validation pattern for `delegate_task` (#356)
- Template-driven agent topologies for native workflows
- This phase makes the skill partially redundant as native capabilities mature

---

## Pros & Cons

### Pros
- **Immediate multi-agent capability** — No need to wait for native #344 implementation. Works today with Claude Code (already installed on this system).
- **Battle-tested blind validation** — 375 commits, v5.4.0, addressing the "context degradation" problem that single-agent tools suffer from. Independent validators produce more reliable results.
- **Complexity routing saves costs** — Trivial tasks get 1 agent, critical tasks get 7. No wasted compute on simple fixes.
- **Crash recovery** — SQLite ledger enables `zeroshot resume` after failures. Long-running workflows survive interruptions.
- **Full automation pipeline** — `zeroshot run 123 --ship` takes an issue from description to merged PR with verification gates. Pairs naturally with Hermes' `schedule_cronjob` for batch processing.
- **MIT license** — No licensing concerns for any integration approach.
- **Complements existing skills** — Uses the same provider CLIs (Claude Code, Codex) that Hermes already has skills for, but adds orchestration on top.

### Cons / Risks
- **Heavy dependency chain** — 297 npm packages, including native modules (`better-sqlite3`, `node-pty`). Installation can be fragile on some systems.
- **Young project** — 2.5 months old, 2 primary maintainers. Long-term maintenance uncertain.
- **Architectural concerns** — 3,454-line god object (`orchestrator.js`), 5,322-line CLI file. Code quality may limit community contributions.
- **Redundancy with native plans** — Once #344 and #356 land, the zeroshot skill becomes partially redundant. But that could be months/years away.
- **Provider CLI requirement** — User must have at least one provider CLI installed AND authenticated separately. Not a "just works" experience.
- **Node.js dependency** — Hermes is Python-based. Adding a Node.js tool to the stack increases system requirements. (Node 18+ is needed.)
- **4 moderate npm vulnerabilities** — Flagged during install. Not critical but worth monitoring.

---

## Open Questions

- Should we also create a custom zeroshot cluster template optimized for Hermes-style workflows (e.g., skill-aware validators that check Hermes conventions)?
- Should the skill include guidance for using zeroshot with Hermes' `schedule_cronjob` to auto-process batches of GitHub issues?
- Should we extract zeroshot's blind validation and template patterns into a design doc for #344 to accelerate native implementation?
- Is the Node.js dependency acceptable for the target user base, or should this wait for a future Python-native alternative?

---

## References

- [covibes/zeroshot](https://github.com/covibes/zeroshot) — Source repo (MIT, 1,274 stars, v5.4.0)
- [npm: @covibes/zeroshot](https://www.npmjs.com/package/@covibes/zeroshot) — Package
- Related Hermes issues: #344 (Multi-Agent Architecture), #356 (Acceptance Criteria & Judge), #406 (Code Verification & Quality Gates), #413 (Cross-CLI Agent Orchestration)
- Existing Hermes skills: `claude-code`, `codex`, `hermes-agent-spawning`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Zeroshot Skill — Multi-Agent Blind Validation Orchestration via CLI #488

Overview

Research Findings

How Zeroshot Works

Key Design Decisions

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Complexity	Agents	Validators	Use Case
TRIVIAL	1	0	Fix typo in README
SIMPLE	2	1 (generic)	Add dark mode toggle
STANDARD	4	2 (reqs + code)	Refactor auth system
CRITICAL	7	5 (security, testing, etc.)	Implement payment flow

Feature: Zeroshot Skill — Multi-Agent Blind Validation Orchestration via CLI #488

Description

Overview

Research Findings

How Zeroshot Works

Key Design Decisions

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions