___
/ \
( ) ___ ___ ___ _ _ _ _ _ ___ _ _ ___ ___ ___
`~w~` / _ \| _ \| __| \| | || | /_\ | _ \ \| | __/ __/ __|
(( )) | (_) | _/| _|| .` | __ |/ _ \| / .` | _|\__ \__ \
))(( \___/|_| |___|_|\_|_||_/_/ \_\_|_\_|\_|___|___/___/
(( ))
`--`
AI coding agent in your terminal. Works with any LLM -- free local models or cloud APIs.
English | 简体中文
- Quick Start
- Why OpenHarness?
- Terminal UI
- Tools (44)
- Slash Commands
- Permission Modes
- Hooks
- Checkpoints & Rewind
- Agent Roles
- Headless Mode & CI/CD
- Cybergotchi
- MCP Servers
- Providers
- Auth
- Update
- Evals
- FAQ
- Install
- Development
- Contributing
- Community
npm install -g @zhijiewang/openharness
ohThat's it. OpenHarness auto-detects Ollama and starts chatting. No API key needed.
Python SDK: there's also an official Python SDK for driving oh from Python programs (notebooks, batch scripts, ML pipelines). Install with pip install openharness-sdk after the npm install (the PyPI distribution is openharness-sdk because the unqualified name is taken), then from openharness import query. See python/README.md.
TypeScript SDK: drive oh from Node.js (VS Code extensions, Electron apps, build scripts) with @zhijiewang/openharness-sdk — npm install @zhijiewang/openharness-sdk, then import { query, OpenHarnessClient, tool } from "@zhijiewang/openharness-sdk". Mirrors the Python SDK surface (streaming events, stateful sessions, custom tools, permission callback, session resume). See packages/sdk/README.md.
oh init # interactive setup wizard (provider + cybergotchi)
oh # auto-detect local model
oh --model ollama/qwen2.5:7b # specific model
oh --model gpt-4o # cloud model (needs OPENAI_API_KEY)
oh --trust # auto-approve all tool calls
oh --auto # auto-approve, block dangerous bash
oh -p "fix the tests" --trust # headless mode (single prompt, exit)
oh run "review code" --json # CI/CD with JSON outputIn-session commands:
/rewind # undo last AI file change (checkpoint restore)
/roles # list agent specializations
/vim # toggle vim mode
Ctrl+O # flush transcript to scrollback for review
Most AI coding agents are locked to one provider or cost $20+/month. OpenHarness works with any LLM -- run it free with Ollama on your own machine, or connect to any cloud API. Every AI edit is git-committed and reversible with /undo.
OpenHarness features a sequential terminal renderer inspired by Ink/Claude Code's default mode. Completed messages flush to native scrollback (scrollable), while the live area (streaming, spinner, input) rewrites in-place using relative cursor movement.
| Key | Action |
|---|---|
Enter |
Submit prompt |
Alt+Enter |
Insert newline (multi-line input) |
↑ / ↓ |
Navigate input history |
Ctrl+C |
Cancel current request / exit |
Ctrl+A / Ctrl+E |
Jump to start / end of input |
Ctrl+O |
Toggle thinking block expansion |
Ctrl+K |
Toggle code block expansion in messages |
Tab |
Autocomplete slash commands / file paths / cycle tool outputs |
/vim |
Toggle Vim mode (normal/insert) |
Scrolling is handled by the terminal's native scrollbar. Completed messages flow into the terminal scrollback buffer. Use your terminal's search (e.g., Ctrl+Shift+F in VS Code) to search conversation history.
- Markdown rendering — headings, code blocks, bold, italic, lists, tables, blockquotes, links
- Syntax highlighting — keywords, strings, comments, numbers, types (JS/TS/Python/Rust/Go and 20+ languages)
- Collapsible code blocks — blocks over 8 lines auto-collapse;
Ctrl+Kto expand all - Collapsible thinking — thinking blocks collapse to a one-line summary after completion;
Ctrl+Oto expand - Shimmer spinner — animated indicator with stage label (
Thinking,Running <Tool>,Calling <server>:<tool>,Running N tools) and color transitions (magenta → yellow at 30s → red at 60s) - Tool call display — args preview, live streaming output, result summaries (line counts, elapsed time), expand/collapse with
Tab. Tool name color-coded by category (read tools cyan, mutating tools yellow, exec tools magenta, MCP tools green) - Rich tool output — JSON files render as a colored static tree (depth-3 collapse, line truncation); markdown files render with full styling (headings, code blocks, tables) instead of plain split-on-newline. Renderer dispatches via
outputTypefield stamped by FileReadTool / WebFetchTool, with a heuristic fallback for unstamped tools - Nested tool calls — when
AgentorParallelAgentsspawns inner tool calls (Read, Bash, Edit), the children render indented under their spawning parent. ParallelAgents shows per-taskTaskwrapper rows so child calls group by task instead of flat under the bundled parent. Depth-3 indent limit with… (N more level)collapse marker - Multi-line input wrap glyph — every non-last line of a multi-line input ends with a dim
↵continuation marker so the wrap is visually obvious - Permission prompts — bordered box with risk coloring, bold colored Yes/No/Diff keys, syntax-highlighted inline diffs
- Status line — model name, token count, cost, context usage bar (customizable via config)
- Context warning — yellow alert when context window exceeds 75%
- Native terminal scrollbar — completed messages flow into scrollback; use your terminal's scrollbar and search
- Multi-line input —
Alt+Enterfor newlines; paste detection auto-inserts newlines - Autocomplete — slash commands and file paths with descriptions; Tab to cycle
- File path autocomplete — Tab-completes paths with
[dir]/[file]indicators - Session browser —
/browseto interactively browse and resume past sessions - Companion mascot — animated Cybergotchi in the footer (toggle with
/companion off|on)
oh --light # light theme for bright terminals
/theme light # switch mid-session (saved automatically)
/theme dark # switch backTheme preference is saved to .oh/config.yaml and persists across sessions.
Customize the status bar format in .oh/config.yaml:
statusLineFormat: '{model} │ {tokens} │ {cost} │ {ctx}'Available variables: {model}, {tokens} (input↑ output↓), {cost} ($X.XXXX), {ctx} (context usage bar). Empty sections are automatically collapsed.
| Tool | Risk | Description |
|---|---|---|
| Core | ||
| Bash | high | Execute shell commands with live streaming output (AST safety analysis) |
| PowerShell | high | Execute PowerShell commands (Windows-native scripting) |
| Read | low | Read files with line ranges, PDF support |
| ImageRead | low | Read images/PDFs for multimodal analysis |
| Write | medium | Create or overwrite files |
| Edit | medium | Search-and-replace edits |
| MultiEdit | medium | Atomic multi-file edits (all succeed or none) |
| Glob | low | Find files by pattern |
| Grep | low | Regex content search with context lines |
| LS | low | List directory contents with sizes |
| Web | ||
| WebFetch | medium | Fetch URL content (SSRF-protected) |
| WebSearch | medium | Search the web |
| ExaSearch | medium | Neural web search via Exa (requires EXA_API_KEY) |
| RemoteTrigger | high | HTTP requests to webhooks/APIs |
| Tasks | ||
| TaskCreate | low | Create structured tasks |
| TaskUpdate | low | Update task status |
| TaskList | low | List all tasks |
| TaskGet | low | Get task details |
| TaskStop | low | Stop a running task |
| TaskOutput | low | Get task output |
| TodoWrite | low | Manage session task checklist (Claude Code-compatible) |
| Agents | ||
| Agent | medium | Spawn a sub-agent (with role specialization) |
| ParallelAgent | medium | Dispatch multiple agents with DAG dependencies |
| SendMessage | low | Agent-to-agent peer messaging |
| AskUser | low | Ask user a question with options |
| Scheduling | ||
| CronCreate | medium | Schedule recurring tasks |
| CronDelete | medium | Remove scheduled tasks |
| CronList | low | List all scheduled tasks |
| ScheduleWakeup | low | Self-pace the next /loop iteration (cache-aware) |
| Planning | ||
| EnterPlanMode | low | Enter structured planning mode |
| ExitPlanMode | low | Exit planning mode |
| Pipelines | ||
| Pipeline | medium | Run a sequence of tasks with output passed between steps |
| Code Intelligence | ||
| Diagnostics | low | LSP-based code diagnostics |
| NotebookEdit | medium | Edit Jupyter notebooks |
| Memory & Discovery | ||
| Memory | low | Save/list/search persistent memories |
| Skill | low | Invoke a skill from .oh/skills/ |
| ToolSearch | low | Find tools by description |
| SessionSearch | low | Search prior sessions for relevant context |
| MCP | ||
| ListMcpResources | low | List resources from connected MCP servers |
| ReadMcpResource | low | Read a specific MCP resource by URI |
| Git Worktrees | ||
| EnterWorktree | medium | Create isolated git worktree |
| ExitWorktree | medium | Remove a git worktree |
| Process | ||
| KillProcess | high | Stop processes by PID or name |
| Monitor | medium | Run a background command and stream each output line back to the agent |
Low-risk read-only tools auto-approve. Medium and high risk tools require confirmation in ask mode. Use --trust or --auto to skip prompts.
Over 80 commands are registered. The most-used ones are grouped below; see /help in-session for the full list. Aliases: /q exit, /h help, /c commit, /m model, /s status.
Session:
| Command | Description |
|---|---|
/clear |
Clear conversation history |
/compact |
Compress conversation to free context |
/export |
Export conversation to markdown |
/copy [n] |
Copy the Nth-last assistant response to the system clipboard |
/history [n] |
List recent sessions; /history search <term> to search |
/browse |
Interactive session browser with preview |
/resume <id> |
Resume a saved session |
/fork |
Fork current session |
Git:
| Command | Description |
|---|---|
/diff |
Show uncommitted git changes |
/undo |
Undo last AI commit |
/commit [msg] |
Create a git commit |
/log |
Show recent git commits |
Info:
| Command | Description |
|---|---|
/help |
Show all available commands (categorized) |
/cost |
Show session cost and token usage |
/status |
Show model, mode, git branch, MCP servers |
/config |
Show configuration |
/files |
List files in context |
/model <name> |
Switch model mid-session |
/memory |
View and search memories |
/doctor |
Run diagnostic health checks |
/hooks |
List loaded hooks grouped by event |
/reload-plugins |
Hot-reload plugins, skills, hooks, and MCP server connections without restarting the session |
Settings:
| Command | Description |
|---|---|
/theme dark|light |
Switch theme (saved to config) |
/vim |
Toggle Vim mode |
/companion off|on |
Toggle companion visibility |
/keys |
Show keyboard shortcuts |
/keybindings |
Open ~/.oh/keybindings.json in $EDITOR (creates a starter file if missing) |
AI:
| Command | Description |
|---|---|
/plan <task> |
Enter plan mode |
/review |
Review recent code changes |
/summarize |
Summarize the current conversation |
/recap |
One-sentence recap of the session (lighter than /summarize) |
Pet:
| Command | Description |
|---|---|
/cybergotchi |
Feed, pet, rest, status, rename, or reset your companion |
Control how aggressively OpenHarness auto-approves tool calls:
| Mode | Flag | Behavior |
|---|---|---|
ask |
--permission-mode ask |
Prompt for medium/high risk operations (default) |
trust |
--trust |
Auto-approve everything |
deny |
--deny |
Only allow low-risk read-only operations |
acceptEdits |
--permission-mode acceptEdits |
Auto-approve file edits, ask for Bash/WebFetch/Agent |
plan |
--permission-mode plan |
Read-only mode — block all write operations |
auto |
--auto |
Auto-approve all, block dangerous bash (AST-analyzed) |
bypassPermissions |
--permission-mode bypassPermissions |
Approve everything unconditionally (CI only) |
Bash commands are analyzed by a lightweight AST parser that detects destructive patterns (rm -rf, git push --force, curl | bash, etc.) and adjusts risk level accordingly.
Set permanently in .oh/config.yaml: permissionMode: 'acceptEdits'
Run shell scripts automatically at key session events by adding a hooks block to .oh/config.yaml:
hooks:
- event: sessionStart
command: "echo 'Session started' >> ~/.oh/session.log"
- event: preToolUse
command: "scripts/check-tool.sh"
match: Bash # optional: only trigger for this tool name
- event: postToolUse
command: "scripts/after-tool.sh"
- event: sessionEnd
command: "scripts/cleanup.sh"Event types (27 total — matches Claude Code's stable surface):
| Event | When it fires | Can block? |
|---|---|---|
sessionStart |
Session begins | — |
sessionEnd |
Session ends | — |
turnStart |
Top-level agent turn begins (after user prompt accepted) | — |
turnStop |
Top-level agent turn ends (mirrors Claude Code's Stop) |
— |
userPromptSubmit |
Before user prompt reaches the LLM | yes — decision: deny |
userPromptExpansion |
Slash command produces an expanded prompt (audit trail) | — |
preToolUse |
Before each tool call | yes — exit code 1 / decision: deny |
postToolUse |
After successful tool execution | — |
postToolUseFailure |
After tool throws or returns isError: true |
— |
postToolBatch |
Once after a turn's full set of tool calls all resolve, before the next model call | — |
permissionRequest |
When a tool needs approval (between preToolUse and the prompt) |
yes — decision: allow|deny|ask |
permissionDenied |
When a tool call is denied (hook / user / headless / policy) | — |
fileChanged |
After a tool modifies a file | — |
cwdChanged |
After working directory changes | — |
subagentStart |
A sub-agent is spawned | — |
subagentStop |
A sub-agent completes | — |
preCompact |
Before conversation compaction | — |
postCompact |
After conversation compaction | — |
configChange |
.oh/config.yaml is modified during the session |
— |
notification |
A notification is dispatched | — |
taskCreated |
TaskCreate persists a new task |
— |
taskCompleted |
TaskUpdate transitions a task to completed |
— |
worktreeCreate |
EnterWorktreeTool creates an isolated git worktree |
— |
worktreeRemove |
ExitWorktreeTool removes a git worktree |
— |
elicitation |
An MCP server requests user input via elicitation/create |
yes — decision: allow|deny |
elicitationResult |
After the elicitation response has been decided (audit trail) | — |
instructionsLoaded |
loadRulesAsPrompt rebuilt the system prompt with rules in scope |
— |
Set disableAllHooks: true in .oh/config.yaml to globally disable hook execution while keeping definitions on disk for auditability.
Live introspection: run /hooks in-session to see exactly which hooks are loaded, grouped by event.
Environment variables available to hook scripts:
| Variable | Description |
|---|---|
OH_EVENT |
Event type (sessionStart, preToolUse, etc.) |
OH_TOOL_NAME |
Name of the tool being called (tool events only) |
OH_TOOL_ARGS |
JSON-encoded tool arguments (tool events only) |
OH_TOOL_OUTPUT |
JSON-encoded tool output (postToolUse only) |
OH_TOOL_INPUT_JSON |
Full JSON tool input (tool events only) |
OH_SESSION_ID / OH_MODEL / OH_PROVIDER / OH_PERMISSION_MODE |
Current session context |
OH_COST / OH_TOKENS |
Running cost and token totals |
OH_FILE_PATH |
Path that changed (fileChanged only) |
OH_NEW_CWD |
New working directory (cwdChanged only) |
OH_TURN_NUMBER / OH_TURN_REASON |
Turn boundary context (turnStart / turnStop) |
Use match to restrict a hook to a specific tool name (e.g., match: Bash only triggers for the Bash tool). Substring, glob (Cron*), and /regex/flags patterns are all supported.
Set jsonIO: true on a command hook to opt into structured JSON I/O — the harness sends {event, ...context} on stdin and reads {decision, reason, hookSpecificOutput} from stdout. HTTP hooks accept the same response shape. See docs/hooks.md for the full reference.
OpenHarness ships with a Tamagotchi-style companion that lives in the side panel. It reacts to your session in real time — celebrating streaks, complaining when tools fail, and getting hungry if you ignore it.
Hatch one:
oh init # wizard includes cybergotchi setup
/cybergotchi # or hatch mid-session
Commands:
/cybergotchi feed # +30 hunger
/cybergotchi pet # +20 happiness
/cybergotchi rest # +40 energy
/cybergotchi status # show needs + lifetime stats
/cybergotchi rename # give it a new name
/cybergotchi reset # start over with a new species
Needs decay over time (hunger fastest, happiness slowest). Feed and pet your gotchi to keep it happy.
Evolution — your gotchi evolves based on lifetime milestones:
- Stage 1 (✦ magenta): 10 sessions or 50 commits
- Stage 2 (★ yellow + crown): 100 tasks completed or a 25-tool streak
18 species to choose from: duck, cat, owl, penguin, rabbit, turtle, snail, octopus, axolotl, cactus, mushroom, chonk, capybara, goose, and more.
Connect any MCP (Model Context Protocol) server by editing .oh/config.yaml:
provider: anthropic
model: claude-sonnet-4-6
permissionMode: ask
mcpServers:
- name: filesystem
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
- name: github
command: npx
args: ["-y", "@modelcontextprotocol/server-github"]
env:
GITHUB_PERSONAL_ACCESS_TOKEN: ghp_...MCP tools appear alongside built-in tools. /status shows connected servers.
MCP server prompts as slash commands — servers that expose prompts/list (e.g., GitHub, Sentry, Linear) get their prompts surfaced as /<server>:<prompt> slash commands automatically. Arguments use a key=value syntax with quoting:
/github:summarize-pr repo=acme/widget pr=42
/sentry:triage-issue issue=ABC-123 severity="high priority"
Required arguments declared by the prompt template surface a usage error if missing (no model call). Run /reload-plugins to re-discover prompts after editing your MCP config.
mcpServers:
- name: linear
type: http
url: https://mcp.linear.app/mcp
headers:
Authorization: "Bearer ${LINEAR_API_KEY}"See docs/mcp-servers.md for the full reference.
See docs/mcp-servers.md for OAuth 2.1 setup (auto-triggered on 401; /mcp-login and /mcp-logout commands available).
MCP Server Registry — browse and install from a curated catalog:
/mcp-registry # browse all available servers
/mcp-registry github # show install config for a specific server
/mcp-registry database # search by category
Categories: filesystem, git, database, api, search, productivity, dev-tools, ai.
OpenHarness auto-commits AI edits in git repos:
oh: Edit src/app.ts # auto-committed with "oh:" prefix
oh: Write tests/app.test.ts
- Every AI file change is committed automatically
/undoreverts the last AI commit (only OH commits, never yours)/diffshows what changed- Your dirty files are safe — committed separately before AI edits
Every file modification is automatically checkpointed before execution. If something goes wrong:
/rewind # restore files from the last checkpoint
/undo # revert the last AI git commit
Checkpoints are stored in .oh/checkpoints/ and cover FileWrite, FileEdit, and Bash commands that modify files.
After every file edit (Edit, Write, MultiEdit), openHarness automatically runs language-appropriate lint/typecheck commands and feeds the results back into the agent context. This is the single highest-impact harness engineering pattern — research shows 2-3x quality improvement from automated feedback.
Auto-detection — if your project has tsconfig.json, .eslintrc*, pyproject.toml, go.mod, or Cargo.toml, verification rules are detected automatically. No configuration needed.
Custom rules via .oh/config.yaml:
verification:
enabled: true # default: true (auto-detect)
mode: warn # 'warn' appends to output, 'block' marks as error
rules:
- extensions: [".ts", ".tsx"]
lint: "npx tsc --noEmit 2>&1 | head -20"
timeout: 15000
- extensions: [".py"]
lint: "ruff check {file} 2>&1 | head -10"The agent sees [Verification passed] or [Verification FAILED] with the linter output after each edit, enabling self-correction.
On session exit, openHarness automatically prunes stale memories using temporal decay:
- Memories not accessed in 30+ days lose 0.1 relevance per 30-day period
- Memories below 0.1 relevance are permanently deleted
- Updated relevance scores are persisted to memory files
This keeps the memory system lean and relevant. Configure in .oh/config.yaml:
memory:
consolidateOnExit: true # default: trueCreate recurring tasks that run automatically in the background:
# Via slash commands
/cron list # show all scheduled tasks
/cron create "check-tests" # create a new task (interactive)
/cron delete <id> # remove a task
Schedule syntax: every 5m, every 2h, every 1d
The cron executor checks every 60 seconds for due tasks and runs them via sub-queries. Results are stored in ~/.oh/crons/history/.
Dispatch specialized sub-agents for focused tasks:
/roles # list all available roles
| Role | Description | Tools |
|---|---|---|
code-reviewer |
Find bugs, security issues, style problems | Read-only |
test-writer |
Generate unit and integration tests | Read + Write |
docs-writer |
Write documentation and comments | Read + Write + Edit |
debugger |
Systematic bug investigation | Read-only + Bash |
refactorer |
Simplify code without changing behavior | All file tools + Bash |
security-auditor |
OWASP, injection, secrets, CVE scanning | Read-only + Bash |
evaluator |
Evaluate code quality and run tests (read-only) | Read-only + Bash + Diagnostics |
planner |
Design step-by-step implementation plans | Read-only + Bash |
architect |
Analyze architecture and design structural changes (hands off to editor) | Read-only |
editor |
Apply an architect's plan as code edits, no re-planning | Read + Edit + Write + MultiEdit + Bash |
migrator |
Systematic codebase migrations and upgrades | All file tools + Bash |
Each role restricts the sub-agent to only its suggested tools. You can also pass allowed_tools explicitly:
Agent({ subagent_type: 'evaluator', prompt: 'Run all tests and report results' })
Agent({ allowed_tools: ['Read', 'Grep'], prompt: 'Search for all TODO comments' })
For larger changes that span multiple files, dispatch a two-pass architect → editor workflow. The architect (powerful model) reads the codebase and outputs a structured plan; the editor (fast model) applies it mechanically without re-planning. When modelRouter is configured, OH automatically routes the architect role to your powerful tier and the editor role to your fast tier — typical cost reduction is 30-50% on multi-file edits versus running both passes on the powerful model.
Agent({ subagent_type: 'architect', prompt: 'Plan a migration from option A to option B across src/' })
# Hand the resulting plan to:
Agent({ subagent_type: 'editor', prompt: '<paste plan>' })
Each Agent call accepts a permission_mode override that narrows the parent's permission mode (never loosens it). Useful when running in trust and you want a subagent's review/audit pass to stay strictly read-only:
Agent({ subagent_type: 'code-reviewer', prompt: '...', permission_mode: 'plan' })
Agent({ subagent_type: 'security-auditor', prompt: '...', permission_mode: 'deny' })
If a less-restrictive mode is requested (e.g. parent is ask, subagent requests trust), the harness silently clamps to the parent — a model can never use a sub-agent to escape user-approval gates.
Read-only roles default to plan automatically. code-reviewer, evaluator, security-auditor, architect, and planner ship with permissionMode: 'plan' — spawn them under any parent and they're statically read-only, no permission_mode override needed. Markdown-defined agents in .oh/agents/*.md can set their own default with permissionMode: plan (or permission-mode: plan) frontmatter.
Run a single prompt without interactive UI — perfect for CI/CD and scripting:
# Chat command with -p flag (recommended)
oh -p "fix the failing tests" --model ollama/llama3 --trust
oh -p "review src/query.ts" --auto --output-format json
# Run command (alternative)
oh run "fix the failing tests" --model ollama/llama3 --trust
oh run "add error handling to api.ts" --json # JSON output
# Pipe stdin
cat error.log | oh run "what's wrong here?"
git diff | oh run "review these changes"
# Hard cap on session cost — agent halts at the threshold with reason: "budget_exceeded"
oh run "review the diff" --model claude-sonnet-4-6 --max-budget-usd 0.50
oh session --model gpt-4o --max-budget-usd 5| Flag | Effect |
|---|---|
--bare |
Skip optional startup work (project detection, plugins, memory, skills, MCP). System prompt is just the tool-use baseline. Faster startup on repos with many CLAUDE.md / RULES.md files. |
--debug [categories] |
Enable categorized debug logs. --debug alone enables all; --debug mcp,hooks filters. Falls back to OH_DEBUG env var. |
--debug-file <path> |
Append debug lines to a file instead of stderr. Falls back to OH_DEBUG_FILE. |
--mcp-config <path> |
Load MCP servers from an external JSON file (in addition to .oh/config.yaml). |
--strict-mcp-config |
With --mcp-config, ignore .oh/config.yaml MCP servers entirely. |
--system-prompt-file <path> / --append-system-prompt-file <path> |
File-path variants of --system-prompt / --append-system-prompt. |
--no-session-persistence |
Skip writing the session record to ~/.oh/sessions/ for ephemeral CI runs. |
--fallback-model <model> |
Fallback used when the primary fails with a retriable error. REPLACES .oh/config.yaml fallbackProviders for this run. |
--permission-prompt-tool <mcp_tool> |
Delegate tool-permission decisions to a configured MCP tool (e.g. mcp__myperm__check). |
--init / --init-only |
Run the interactive setup wizard before / instead of the command. |
All flags work on both oh run and oh session. See oh run --help and oh session --help for the full surface.
Constrain the model's output to a JSON Schema. Useful for CI scripts that parse model output programmatically without regex heuristics:
oh -p "output {\"ok\": true, \"count\": 3} as JSON" \
--trust \
--json-schema '{"type":"object","properties":{"ok":{"type":"boolean"},"count":{"type":"integer"}},"required":["ok","count"]}'Behavior:
- stdout: the validated JSON (single line), only when it passes the schema.
- stderr: structured errors on failure, plus the raw model output for debugging.
- Exit codes: 0 valid, 2 malformed schema, 3 model output was not JSON, 4 JSON didn't match the schema.
Supported keywords: type, properties, required, items, enum. For richer validation, pipe the output through a dedicated validator.
OpenHarness includes a built-in GitHub Action for automated code review:
# .github/workflows/ai-review.yml
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: ./.github/actions/review
with:
model: 'claude-sonnet-4-6'
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}Exit code 0 on success, 1 on failure.
# Local (free, no API key needed)
oh --model ollama/llama3
oh --model ollama/qwen2.5:7b
# Cloud
OPENAI_API_KEY=sk-... oh --model gpt-4o
ANTHROPIC_API_KEY=sk-ant-... oh --model claude-sonnet-4-6
OPENROUTER_API_KEY=sk-or-... oh --model openrouter/meta-llama/llama-3-70b
# llama.cpp / GGUF
oh --model llamacpp/my-model
# LM Studio
oh --model lmstudio/my-modelFor direct GGUF support via llama-server, without the overhead of Ollama. Often faster for large models.
Prerequisites:
- Install llama.cpp:
brew install llama.cppor download from github.com/ggml-org/llama.cpp - Download a GGUF model (e.g., from HuggingFace)
Start llama-server:
llama-server --model ./your-model.gguf --port 8080 --alias my-modelConfigure via oh init:
- Run
oh initand select "llama.cpp / GGUF" when prompted
Or configure manually in .oh/config.yaml:
provider: llamacpp
model: my-model
baseUrl: http://localhost:8080
permissionMode: askRun:
oh
oh --model llamacpp/my-model
oh models # list available modelsSpeak Agent Client Protocol over stdin/stdout so editors that support ACP — Zed, JetBrains via the ACP plugin, Cline, OpenCode — can drive openHarness as the underlying agent. No bespoke IDE extension required:
oh acp # uses provider/model from .oh/config.yaml
oh acp --provider anthropic --model claude-sonnet-4-6Configure your editor's ACP integration to launch oh acp as the agent command. The session-update events (text chunks, tool calls, tool results) are translated automatically from openHarness's stream protocol; permission prompts currently use openHarness's own flow rather than the ACP requestPermission path (filed for follow-up). The @agentclientprotocol/sdk package is an optionalDependency — if it didn't install, oh acp exits with a clear install hint rather than silently failing.
Provider-agnostic credential management. Local LLMs (Ollama / llama.cpp / LM Studio) need no auth — configure them via oh init.
oh auth login [provider] [--key <value>] # store API key for a provider
oh auth logout [provider] # clear stored API key
oh auth status # show stored providers + env-var overrides[provider] defaults to your configured default. --key supplies the value inline; otherwise OH prompts (TTY) or reads from stdin (piped).
Avoid storing keys in plaintext / the encrypted store by plugging in a helper script (1Password, pass, vault, cloud secret manager). The configured command runs at credential-fetch time with OH_PROVIDER set, and its trimmed stdout becomes the key.
# .oh/config.yaml
apiKeyHelper: 'op read "op://Personal/Anthropic/key"'Resolution priority: env var → encrypted store → apiKeyHelper → legacy plaintext config.
oh update # detects how OH was installed (npm-global / npx / local clone) and prints the right upgrade commandConfig is loaded in layers (later overrides earlier):
- Global
~/.oh/config.yaml— default provider, model, theme for all projects - Project
.oh/config.yaml— project-specific settings - Local
.oh/config.local.yaml— personal overrides (gitignored)
Set your default provider once globally:
# ~/.oh/config.yaml
provider: ollama
model: llama3
permissionMode: ask
theme: dark
language: zh-CN # optional — respond in this language (code stays as-is)
outputStyle: default # optional — "default", "explanatory", "learning", or a custom nameThen per-project configs only need what's different:
# .oh/config.yaml
model: codellama # override just the modelSwap the agent's personality without touching its core instructions. Built-ins:
default— standard software engineering assistant (no preface)explanatory— adds an## Insightssection after each task explaining why the agent made its choiceslearning— leaves 1–3TODO(human)markers at strategic points so you write the instructive parts yourself
Create your own styles as markdown files with YAML frontmatter. Save to .oh/output-styles/<name>.md (project) or ~/.oh/output-styles/<name>.md (user). Project shadows user shadows built-in.
---
name: code-review
description: Focused code review mode
---
Review rigorously. For every function, ask: is the logic correct, is error handling complete, are there edge cases ignored?Activate with outputStyle: code-review in .oh/config.yaml.
Create .oh/RULES.md in any repo (or run oh init):
- Always run tests after changes
- Use strict TypeScript
- Never commit to main directlyRules load automatically into every session.
openHarness also reads any of the following project-instruction files if present (additive, parent-first):
CLAUDE.md(Anthropic convention) — and hierarchicalCLAUDE.mdfrom parent dirs, plus~/.claude/CLAUDE.mdfor user-globalAGENTS.md(agents.md cross-tool standard, used by Codex / Cursor / Copilot / Cline / Aider) — same parent-first walkCLAUDE.local.md(gitignored personal overrides)
If a repo has AGENTS.md already configured for another agent, openHarness picks it up unchanged — no migration step needed.
Skills are markdown files with YAML frontmatter that add reusable behaviors:
---
name: deploy
description: Deploy the application to production
trigger: deploy
tools: [Bash, Read]
---
Run the deploy script with health checks...Locations (searched in order):
.oh/skills/— project-level skills~/.oh/skills/— global skills (available in all projects)
Skills auto-trigger when the user's message contains the trigger keyword, or can be invoked explicitly with /skill deploy.
Plugins are npm packages that bundle skills, hooks, and MCP servers:
{
"name": "my-openharness-plugin",
"version": "1.0.0",
"skills": ["skills/deploy.md", "skills/review.md"],
"hooks": {
"sessionStart": "scripts/setup.sh"
},
"mcpServers": [
{ "name": "my-api", "command": "npx", "args": ["-y", "@my-org/mcp-server"] }
]
}Save as openharness-plugin.json in your npm package root. Install with npm install, and openHarness discovers it automatically from node_modules/.
oh evals runs SWE-bench-Lite-compatible evaluations against any provider, locally, with mandatory cost caps. Useful for measuring real-world bug-fix performance instead of synthetic benchmarks.
# Run a custom pack with a $5 total cap, 2 parallel agents
oh evals run my-pack --max-cost-usd 5 --concurrency 2
# Run a specific instance
oh evals run my-pack --max-cost-usd 1 --instance django__django-11551
# Random sample of 3
oh evals run my-pack --max-cost-usd 2 --sample 3
# Resume a partial run that hit its cost cap
oh evals run my-pack --max-cost-usd 10 --resume 2026-05-05T14-30-00
# List installed packs
oh evals list-packs
# Show summary of a past run
oh evals show 2026-05-05T14-30-00Output lives at ~/.oh/evals/runs/<run-id>/:
results.json— full per-task data with cost, turns, duration, tests_status, error_message.predictions.json— submittable to the SWE-bench leaderboard at https://www.swebench.com/.transcripts/<instance_id>.jsonl— verbatim subprocessstream-jsonoutput per task.
A pluggable pack contract (pack.json + instances.jsonl + fixtures/<id>/) lets you author packs against any test suite. The scripts/build-evals-pack.mjs helper bakes a SWE-bench-Lite-compatible repo at a given base_commit into a fixture; see CONTRIBUTING.md.
A bundled swe-bench-lite-mini pack (10 cherry-picked instances, ready to run out-of-the-box) is shipping in v2.40.2.
graph LR
User[User Input] --> REPL[REPL Loop]
REPL --> Query[Query Engine]
Query --> Provider[LLM Provider]
Provider --> LLM[Ollama / OpenAI / Anthropic]
LLM --> Tools[Tool Execution]
Tools --> Permissions{Permission Check}
Permissions -->|Approved| Execute[Run Tool]
Permissions -->|Blocked| Deny[Deny & Report]
Execute --> Response[Stream Response]
Response --> REPL
Does it work offline? Yes. Use Ollama with a local model — no internet or API key needed.
How much does it cost? Free. OpenHarness is MIT licensed. You bring your own API key (BYOK) for cloud models, or use Ollama for free.
Is it safe?
Yes. 7 permission modes control what tools can do. Bash commands are analyzed by an AST parser that blocks destructive patterns (rm -rf, curl | bash, etc.). Every file change is checkpointed and reversible with /rewind.
Can I use it in CI/CD?
Yes. Use oh -p "prompt" --auto for headless execution, or the built-in GitHub Action for PR reviews.
Does it support my language/framework? Yes. OpenHarness is language-agnostic — it reads, writes, and executes code in any language. Syntax highlighting covers 20+ languages.
How does it compare to Claude Code? ~95% feature parity for CLI use cases. Main advantage: works with ANY LLM (not just Anthropic) and is MIT-licensed. See Why OpenHarness? above.
Requires Node.js 18+.
# From npm
npm install -g @zhijiewang/openharness
# From source
git clone https://github.com/zhijiewong/openharness.git
cd openharness
npm install && npm install -g .npm install
npx tsx src/main.tsx # run in dev mode
npx tsc --noEmit # type check
npm test # run testsCreate src/tools/YourTool/index.ts implementing the Tool interface with a Zod input schema, register it in src/tools.ts.
Create src/providers/yourprovider.ts implementing the Provider interface, add a case in src/providers/index.ts.
See CONTRIBUTING.md.
Join the OpenHarness community to get help, share your workflows, and discuss the future of AI coding agents!
| Platform | Details & Links |
|---|---|
| 🟣 Discord | Join our Discord Server to chat with developers and get real-time support. |
| 🔵 Feishu / Lark | Scan the QR code below to collaborate with the community:![]() |
Scan the QR code below to join our WeChat group:![]() |
MIT



