Skip to content

Feature: Zeroshot Skill — Multi-Agent Blind Validation Orchestration via CLI #488

@teknium1

Description

@teknium1

Overview

Zeroshot is an open-source (MIT) multi-agent CLI orchestrator that runs autonomous coding workflows using a planner → implementer → validator pipeline with "blind validation." Unlike single-agent tools, Zeroshot spawns independent validator agents that verify code changes by examining the codebase directly — without ever seeing the worker's internal reasoning or output history. This produces significantly more reliable code than single-agent workflows.

Hermes already has skills for Claude Code and Codex as individual agent CLIs, but uses them as single-agent tools. A Zeroshot skill would add a multi-agent orchestration layer on top of those same CLIs, giving Hermes users immediate access to battle-tested multi-agent coding workflows without waiting for native multi-agent architecture (#344).

This is a concrete, available-today capability that complements the longer-term native multi-agent work planned in #344, #356, and #406. Use zeroshot now; build native later.


Research Findings

How Zeroshot Works

Architecture (63K LOC, Node.js, v5.4.0, 1,274 stars):

Zeroshot is a message-driven coordination layer built on four primitives:

  1. Template-Driven Agent Topologies — JSON workflow definitions specify agent roles, triggers, context strategies, and hooks. Built-in templates: single-worker, worker-validator, full-workflow, heavy-validation. Users can create custom cluster templates.

  2. Two-Tier Conductor Classification — A cheap "junior conductor" (Sonnet-class) classifies tasks into a 2D matrix of complexity (TRIVIAL/SIMPLE/STANDARD/CRITICAL) × task type (INQUIRY/TASK/DEBUG). Uncertain cases escalate to a "senior conductor" (Opus-class). This routes to the appropriate template:

    Complexity Agents Validators Use Case
    TRIVIAL 1 0 Fix typo in README
    SIMPLE 2 1 (generic) Add dark mode toggle
    STANDARD 4 2 (reqs + code) Refactor auth system
    CRITICAL 7 5 (security, testing, etc.) Implement payment flow
  3. SQLite Event Ledger — Every message, agent output, and state transition is persisted in a per-cluster SQLite database (~/.zeroshot/<clusterId>.db). This enables crash recovery (zeroshot resume <id>), post-mortem analysis, and token usage tracking.

  4. Message Bus — EventEmitter-based pub/sub over the ledger. Topics include ISSUE_OPENED, PLAN_READY, IMPLEMENTATION_READY, VALIDATION_RESULT. Agent triggers subscribe to specific topics with optional JS logic conditions.

The Blind Validation Pattern:

This is the key innovation. It works through context isolation, not code isolation:

  • Validators' contextStrategy.sources config explicitly EXCLUDES worker output topics
  • Validators receive: the original issue, the plan, and a notification that implementation is ready
  • Validators do NOT receive: the worker's code output, reasoning, or internal context
  • Validators must independently examine the actual codebase on disk using file read/grep/glob tools
  • This means "the validator can't lie about code they didn't write" — they verify ground truth

Provider Abstraction:

Zeroshot shells out to existing CLI tools. Each provider implements isAvailable(), buildCommand(), and parseEvent():

  • Claude Code: claude --print --output-format json --dangerously-skip-permissions
  • Codex: codex exec --json --dangerously-bypass-approvals-and-sandbox
  • Gemini: gemini -p <context> --output-format stream-json --yolo
  • OpenCode: Similar pattern

Safety hooks block user-interactive prompts and dangerous git commands.

Isolation Modes:

  • None (default) — Direct filesystem, manual review
  • Git Worktree (--worktree) — Separate branch/directory, clean PR workflow
  • Docker (--docker) — Full container isolation for risky or parallel tasks

Full Automation: zeroshot run 123 --ship = worktree isolation → implement → validate → create PR → auto-merge

Key Design Decisions

  1. CLI shelling over API calls — Zeroshot delegates to existing provider CLIs rather than making API calls directly. This means auth, model selection, and tool execution are handled by each provider's own CLI. Trade-off: simpler integration but heavier process overhead.

  2. Template-driven over code-driven — Workflow topologies are JSON configs, not code. This makes it easy to create custom workflows but limits programmatic flexibility.

  3. Two-tier conductor — Avoids paying for expensive models on trivial tasks. The cheap classifier handles 80%+ of routing decisions.

  4. Per-cluster SQLite — Each cluster gets its own database file, avoiding cross-contamination and enabling clean resume/cleanup.


Current State in Hermes Agent

Existing capabilities:

  • claude-code skill — single-agent delegation to Claude Code CLI
  • codex skill — single-agent delegation to Codex CLI
  • hermes-agent-spawning skill — spawn Hermes sub-processes
  • delegate_task tool — spawn sub-agents within Hermes

What's missing (the gap):

  • No multi-agent orchestration with validation loops
  • No blind validation (independent verification of agent output)
  • No complexity-based routing (all tasks get the same treatment)
  • No crash recovery for long-running multi-agent workflows
  • No automated issue-to-PR pipeline with verification gates

Related open issues:

Relationship to existing issues: This skill is complementary, not duplicative. Existing issues propose building these patterns natively into Hermes. The Zeroshot skill provides the same capabilities TODAY by wrapping an external tool, serving as a bridge until native multi-agent support lands.


Implementation Plan

Skill vs. Tool Classification

This should be a skill (not a tool) because:

  • Zeroshot is an external CLI invoked via terminal — no custom Python integration needed
  • No API key management by Hermes — zeroshot and its providers handle their own auth
  • All interaction is through shell commands and text output
  • Fits the same pattern as the existing claude-code and codex skills

Placement: Skills Hub (not bundled). Requires Node 18+, npm, and at least one provider CLI — too specialized for the default install.

Category: autonomous-ai-agents (alongside claude-code, codex, hermes-agent-spawning)

What We'd Need

  • Zeroshot skill (SKILL.md) with:
    • Prerequisites and installation instructions
    • Usage patterns for each workflow type
    • Provider setup guidance
    • Monitoring and management commands
    • Integration with existing Hermes workflows (e.g., issue triage → zeroshot run)

Phased Rollout

Phase 1: Basic Skill

  • SKILL.md covering installation, basic usage, and key commands
  • Patterns: zeroshot run <issue>, zeroshot run "text", zeroshot run file.md
  • Monitoring: zeroshot status, zeroshot logs, zeroshot list
  • Management: zeroshot resume, zeroshot stop, zeroshot kill
  • Provider detection and setup guidance

Phase 2: Advanced Workflows

  • Git worktree and Docker isolation patterns
  • PR automation: --pr and --ship workflows
  • Custom cluster templates for Hermes-specific use cases
  • Parallel cluster management (multiple issues at once)
  • Integration with schedule_cronjob for batch issue processing

Phase 3: Native Pattern Adoption


Pros & Cons

Pros

  • Immediate multi-agent capability — No need to wait for native Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344 implementation. Works today with Claude Code (already installed on this system).
  • Battle-tested blind validation — 375 commits, v5.4.0, addressing the "context degradation" problem that single-agent tools suffer from. Independent validators produce more reliable results.
  • Complexity routing saves costs — Trivial tasks get 1 agent, critical tasks get 7. No wasted compute on simple fixes.
  • Crash recovery — SQLite ledger enables zeroshot resume after failures. Long-running workflows survive interruptions.
  • Full automation pipelinezeroshot run 123 --ship takes an issue from description to merged PR with verification gates. Pairs naturally with Hermes' schedule_cronjob for batch processing.
  • MIT license — No licensing concerns for any integration approach.
  • Complements existing skills — Uses the same provider CLIs (Claude Code, Codex) that Hermes already has skills for, but adds orchestration on top.

Cons / Risks

  • Heavy dependency chain — 297 npm packages, including native modules (better-sqlite3, node-pty). Installation can be fragile on some systems.
  • Young project — 2.5 months old, 2 primary maintainers. Long-term maintenance uncertain.
  • Architectural concerns — 3,454-line god object (orchestrator.js), 5,322-line CLI file. Code quality may limit community contributions.
  • Redundancy with native plans — Once Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344 and Feature: Acceptance Criteria & Independent Judge for Sub-agent Delegation (inspired by OpenPlanter) #356 land, the zeroshot skill becomes partially redundant. But that could be months/years away.
  • Provider CLI requirement — User must have at least one provider CLI installed AND authenticated separately. Not a "just works" experience.
  • Node.js dependency — Hermes is Python-based. Adding a Node.js tool to the stack increases system requirements. (Node 18+ is needed.)
  • 4 moderate npm vulnerabilities — Flagged during install. Not critical but worth monitoring.

Open Questions

  • Should we also create a custom zeroshot cluster template optimized for Hermes-style workflows (e.g., skill-aware validators that check Hermes conventions)?
  • Should the skill include guidance for using zeroshot with Hermes' schedule_cronjob to auto-process batches of GitHub issues?
  • Should we extract zeroshot's blind validation and template patterns into a design doc for Feature: Multi-Agent Architecture — Orchestration, Cooperation, Specialized Roles & Resilient Workflows #344 to accelerate native implementation?
  • Is the Node.js dependency acceptable for the target user base, or should this wait for a future Python-native alternative?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions