Feature: Blackbox CLI Skill — Multi-Model Agent Delegation with Built-in Judge (inspired by Blackbox AI)


## Overview

[Blackbox AI](https://www.blackbox.ai/) is a multi-agent coding platform that unifies Claude Code, Codex, Gemini, and Blackbox's own models into a single workflow. Its standout architectural feature is the **"Chairman" pattern** — tasks are dispatched to multiple models simultaneously, and a specialized judge LLM evaluates all outputs to select the best implementation.

The [Blackbox CLI](https://github.com/blackboxaicode/cli) is open-source (GPL-3.0, TypeScript, forked from Gemini CLI) and provides a terminal-based interface for multi-agent code generation with a built-in judge mechanism. It supports non-interactive mode, MCP, checkpointing/resume, and vision model switching — making it viable as another delegatable agent alongside our existing \`claude-code\` and \`codex\` skills.

This issue proposes a **Blackbox CLI skill** following the same pattern as the existing autonomous agent skills, enabling Hermes users who subscribe to Blackbox AI to delegate coding tasks through their multi-model pipeline.

---

## Research Findings

### How the Blackbox CLI Works

**Architecture:** The CLI is a Node.js (v20+) application built on the Google Gemini CLI codebase (\`gemini.tsx\` core), extended with:

- **SubagentManager** (\`packages/core/src/subagents/\`) — Orchestrates multiple sub-agents with events, hooks, statistics tracking, and validation
- **Built-in agents** (\`builtin-agents.ts\`) — Pre-configured agent definitions for Claude, Codex, Gemini, and Blackbox models
- **Non-interactive mode** (\`nonInteractiveCli.ts\`) — Full headless execution with tool calls, checkpointing, and structured output formatting. Separates thoughts (stderr) from results (stdout)
- **Judge mechanism** — Evaluates competing agent outputs and selects the best implementation
- **Checkpoint system** — Save/resume sessions via \`.blackboxcli/checkpoint-*.json\` files
- **MCP support** — Model Context Protocol integration for external tool connectivity

**Installation:**
\`\`\`bash
# From npm (prebuilt)
npm install -g @blackboxai/cli

# From source
git clone https://github.com/blackboxaicode/cli.git
cd cli && npm install && npm install -g .

# Configure
blackbox configure  # Enter API key from app.blackbox.ai/dashboard
\`\`\`

**Non-interactive usage (key for Hermes delegation):**
\`\`\`bash
# One-shot task execution
blackbox --prompt "Implement JWT authentication for the Express API"

# Resume from checkpoint
blackbox --resume-checkpoint "task-abc123" --prompt "Add refresh token support"
\`\`\`

**Session commands:** \`/compress\` (shrink history), \`/clear\` (reset), \`/stats\` (token usage)

### Key Design Decisions

1. **Gemini CLI as base** — Rather than building from scratch, Blackbox forked Google's Gemini CLI for its tool execution engine, event handling, and terminal UI (React/Ink). This gave them a production-quality foundation.
2. **Multi-model by default** — The CLI can run the same prompt through multiple models (Blackbox Pro, Claude Sonnet 4.5, GPT-5.2 Codex, Gemini 2.5 Pro) and use a judge to pick the best.
3. **Credit-based pricing** — Free tier has basic access; Pro ($10/mo) adds $30 credits for premium models; Pro Plus ($20/mo) unlocks multi-agent execution.
4. **GPL-3.0 license** — The CLI is copyleft, which is important to note for any integration approach.

### Platform Capabilities Beyond the CLI

The Blackbox platform also offers:

- **Multi-Agent Task API** (\`POST cloud.blackbox.ai/api/tasks\`) — Programmatic dispatch to 2-5 agents with automated judge selection. Requires \`bb_xxxxxxxx\` API key.
- **Semantic Knowledge Graph** — Indexes entire repos, commits, and URLs for cross-file reasoning
- **Isolated sandboxes** — Long-running tasks in background environments
- **IDE support** — VS Code extension with 4.2M+ installs, JetBrains, Blackbox IDE

---

## Current State in Hermes Agent

**Existing agent delegation skills:**

| Skill | Agent | Installation | Non-interactive | Auth Required |
|-------|-------|-------------|----------------|---------------|
| \`claude-code\` | Claude Code CLI | \`npm install -g @anthropic-ai/claude-code\` | Yes (\`--dangerously-skip-permissions\`) | Anthropic API key |
| \`codex\` | OpenAI Codex CLI | \`npm install -g @openai/codex\` | Yes (\`exec\` subcommand) | OpenAI API key |
| \`hermes-agent\` | Hermes Agent | Already installed | Yes (\`-q\` flag) | Configured LLM provider |
| **\`blackbox\`** (proposed) | Blackbox CLI | \`npm install -g @blackboxai/cli\` | Yes (\`--prompt\`) | Blackbox API key |

**No existing Blackbox integration.** Zero mentions of "blackbox" in the hermes-agent codebase.

**Related issues:**
- #413 — Cross-CLI Agent Orchestration (framework for mixed CLI agent workflows — Blackbox CLI would be another backend)
- #344 — Multi-Agent Architecture (orchestration infrastructure)
- #412 — Consensus & Voting Engine (related but distinct from Blackbox's judge pattern — see sister issue)

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **skill** because:
- It wraps an external CLI (\`blackbox\`) callable via \`terminal(pty=true)\`
- Follows the exact same pattern as existing \`claude-code\` and \`codex\` skills
- No custom Python integration needed — all interaction is through the terminal
- API key management is handled by the user running \`blackbox configure\`

**Not bundled — Skills Hub.** Blackbox AI requires a paid subscription, has GPL-3.0 licensing, and modest adoption (191 stars). It's specialized enough to be a Skills Hub skill rather than bundled.

### What We'd Need

1. **Skill file** (\`SKILL.md\`) following the \`claude-code\`/\`codex\` pattern
2. **Prerequisites section** — Node.js 20+, npm, Blackbox API key
3. **One-shot mode** — \`blackbox --prompt "task"\` via \`terminal(pty=true)\`
4. **Background mode** — Long-running sessions with \`process(poll/log)\` monitoring
5. **Checkpoint/resume** — Leverage Blackbox's built-in checkpoint system
6. **Multi-model hints** — Document how to configure which models run (agent selection)
7. **PR review pattern** — Adapt the existing PR review workflow to Blackbox's capabilities

### Phased Rollout

**Phase 1: Basic Delegation Skill**
- SKILL.md with one-shot and background delegation patterns
- Prerequisites, configuration, and usage documentation
- Standard PR review workflow adaptation
- Metadata: tags, related_skills (claude-code, codex, hermes-agent)

**Phase 2: Multi-Model Features**
- Document Blackbox's built-in multi-agent/judge mode for skill users
- Add guidance for when to use Blackbox's multi-model mode vs. Hermes's native delegation
- Checkpoint/resume workflow for long-running tasks

**Phase 3: Cross-CLI Integration**
- When #413 lands, register Blackbox as an \`ExternalCLIBackend\` in the cross-CLI orchestration framework
- Enable mixed workflows: Blackbox + Claude Code + Codex in a single workflow DAG

---

## Pros & Cons

### Pros
- **Consistent pattern** — Follows the established claude-code/codex skill architecture exactly
- **Multi-model value** — Users get access to Blackbox's built-in judge/multi-model pipeline, which is unique among CLI agents
- **Low effort** — ~200 lines of SKILL.md, no code changes needed
- **MCP compatibility** — Blackbox CLI speaks MCP, potentially enabling tool sharing with Hermes
- **Checkpoint/resume** — Built-in session persistence for long-running tasks

### Cons / Risks
- **GPL-3.0 license** — Copyleft license on the CLI; doesn't affect a skill (we're just calling it), but worth noting
- **Paid service** — Requires Blackbox subscription ($10-40/mo) with credit-based consumption
- **Modest adoption** — 191 GitHub stars, forked from Gemini CLI; uncertain long-term maintenance
- **Wrapper-of-wrappers concern** — Blackbox CLI itself wraps Claude Code, Codex, and Gemini. If Hermes already delegates to those directly, the added value is mainly the judge layer
- **Last commit Dec 2025** — ~3 months since last meaningful update; maintenance risk
- **TrustPilot 1.9/5** — Mixed user reviews, primarily around billing transparency and credit consumption

---

## Open Questions

- Is the Blackbox CLI's built-in judge mechanism valuable enough to justify using it over direct claude-code/codex delegation + a native Hermes judge (see related best-of-N issue)?
- Should we wait for #413 (Cross-CLI Orchestration) before creating individual agent skills, or proceed independently?
- Does the GPL-3.0 license on the CLI create any concerns for Skills Hub distribution?

---

## References

- [Blackbox AI Website](https://www.blackbox.ai/)
- [Blackbox CLI GitHub (GPL-3.0)](https://github.com/blackboxaicode/cli) — 191 stars, TypeScript, forked from Gemini CLI
- [Blackbox CLI Docs](https://docs.blackbox.ai/features/blackbox-cli/getting-started)
- [Multi-Agent Execution Docs](https://docs.blackbox.ai/features/blackbox-cloud-multi-agent)
- [Multi-Agent Task API](https://docs.blackbox.ai/api-reference/multi-agent-task)
- [Blackbox AI Review 2026 (Banani)](https://www.banani.co/blog/blackbox-ai-review) — Detailed independent review
- Existing Hermes skills: \`claude-code\`, \`codex\`, \`hermes-agent\` (autonomous-ai-agents category)
- Related issues: #413, #344, #412
ISSUE_EOF; __hermes_rc=$?; printf '__HERMES_FENCE_a9f7b3__'; exit $__hermes_rc


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Blackbox CLI Skill — Multi-Model Agent Delegation with Built-in Judge (inspired by Blackbox AI) #475

Overview

Research Findings

How the Blackbox CLI Works

From npm (prebuilt)

From source

Configure

One-shot task execution

Resume from checkpoint

Key Design Decisions

Platform Capabilities Beyond the CLI

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Skill	Agent	Installation	Non-interactive	Auth Required
`claude-code`	Claude Code CLI	`npm install -g @anthropic-ai/claude-code`	Yes (`--dangerously-skip-permissions`)	Anthropic API key
`codex`	OpenAI Codex CLI	`npm install -g @openai/codex`	Yes (`exec` subcommand)	OpenAI API key
`hermes-agent`	Hermes Agent	Already installed	Yes (`-q` flag)	Configured LLM provider
`blackbox` (proposed)	Blackbox CLI	`npm install -g @blackboxai/cli`	Yes (`--prompt`)	Blackbox API key

Feature: Blackbox CLI Skill — Multi-Model Agent Delegation with Built-in Judge (inspired by Blackbox AI) #475

Description

Overview

Research Findings

How the Blackbox CLI Works

From npm (prebuilt)

From source

Configure

One-shot task execution

Resume from checkpoint

Key Design Decisions

Platform Capabilities Beyond the CLI

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions