Feature: OpenHands Coding Agent Skill — Model-Agnostic Sandboxed Code Agent Delegation

## Overview

[OpenHands](https://github.com/All-Hands-AI/OpenHands) (formerly OpenDevin) is an MIT-licensed, open-source AI-powered software development platform with 68.6k+ GitHub stars and $18.8M Series A funding. It provides autonomous coding agents that can edit files, run terminal commands, browse the web, and execute multi-step development tasks end-to-end — similar to Devin but fully open-source and model-agnostic.

We already have skills for delegating coding tasks to [Claude Code](../../skills/autonomous-ai-agents/claude-code/) and [Codex CLI](../../skills/autonomous-ai-agents/codex/). OpenHands fills a critical gap: it is the **only model-agnostic option** — users can run it with Nous models, DeepSeek, Qwen, Llama, Claude, GPT, or even local Ollama models. It also provides **Docker-sandboxed execution by default**, multi-agent delegation, and built-in browser automation — capabilities neither Claude Code nor Codex offer.

This issue proposes adding an `openhands` skill to the `autonomous-ai-agents` category, following the established pattern of `claude-code` and `codex` skills.

---

## Research Findings

### How OpenHands Works

OpenHands uses an **event stream architecture** where all agent-environment interactions flow as typed events through a central hub:

```
User Message → Agent → LLM → Action → Runtime (sandbox) → Observation → Agent → ...
```

**Key components:**
- **Agent**: Analyzes conversation state, produces Actions (CmdRunAction, FileWriteAction, BrowseURLAction, etc.)
- **Runtime**: Executes Actions in isolated environments, returns Observations
- **EventStream**: Central pub/sub hub for all communication between components
- **LLM**: Brokers model interactions via LiteLLM (100+ provider support)

**Runtime backends:**
| Backend | Description |
|---------|-------------|
| Docker (default) | Sandboxed container with cap-drop ALL, no-new-privileges |
| Local | Direct host execution, no isolation |
| Kubernetes | Enterprise orchestration across clusters |
| Modal | Cloud GPU execution |
| Remote API | Custom HTTP-based lifecycle management |

**Product tiers:**
1. **Software Agent SDK** — Core Python library (`pip install openhands-sdk`)
2. **CLI** — Terminal interface (`oh` command, similar to `claude` / `codex`)
3. **Local GUI** — React SPA + FastAPI (similar to Devin/Jules)
4. **Cloud** — Hosted at app.all-hands.dev
5. **Enterprise** — Self-hosted Kubernetes with RBAC

### SWE-Bench Performance

OpenHands reports **77.6% on SWE-bench Verified** using their own harness (with Claude 3.5 Sonnet Thinking). On the standardized mini-SWE-agent harness, scores are typically lower (~72-76%). For context, the leaderboard shows Claude 4.5 Opus at 76.8% and GPT-5-2 Codex at 72.8% on the standard harness.

### Key Design Decisions

1. **V1 SDK redesign** (in progress, V0 deprecated April 2026): Moving from mandatory Docker to optional sandboxing, LocalWorkspace by default for lower friction
2. **LiteLLM for model routing**: Supports 100+ providers without custom integrations, including prompt-based fallback for non-function-calling models (important for open-source/local LLMs)
3. **Event sourcing for state**: Immutable event log enables replay, recovery, and incremental persistence
4. **Context condensation**: LLMSummarizingCondenser replaces old conversation history with summaries to prevent context overflow (~2x cost reduction)
5. **Security analyzer**: LLM-based risk assessment (Low/Medium/High) with configurable confirmation policy for dangerous commands
6. **MCP integration**: First-class MCP support with OAuth flows, tool filtering, and configurable timeouts

### Unique Advantages Over Claude Code / Codex

| Feature | OpenHands | Claude Code | Codex CLI |
|---------|-----------|-------------|-----------|
| Model support | Any (LiteLLM, 100+) | Claude only | OpenAI only |
| Local/open models | Yes (Ollama, vLLM) | No | No |
| Docker sandbox | Default, hardened | Basic container | Basic container |
| Browser automation | Yes (Playwright) | No | No |
| Multi-agent | Yes (delegation) | No | No |
| Web GUI | Yes (React SPA) | No | No |
| Self-hosted enterprise | Yes (K8s/RBAC) | No | No |
| MCP support | Yes (native) | Yes | Yes |
| License | MIT | Proprietary | Open Source |

---

## Current State in Hermes Agent

**Existing coding agent skills:**
- `claude-code` — Delegates to Anthropic's Claude Code CLI (Claude-only)
- `codex` — Delegates to OpenAI's Codex CLI (OpenAI-only)
- `hermes-agent` — Spawns additional Hermes Agent instances

**Related existing features:**
- Robust multi-backend terminal isolation: Docker, SSH, Singularity, Modal, Bubblewrap (in `tools/environments/`)
- `mini-swe-agent/` — Embedded SWE agent with Docker/Modal/Bubblewrap backends
- Native MCP support (`tools/mcp_tool.py`)
- Issue #466 references OpenHands for file transfer patterns
- Issue #404 covers Symphony-style autonomous issue resolution

**Gap:** No model-agnostic coding agent delegation. Users locked to Claude or OpenAI for delegated coding work. No way to use Nous models, DeepSeek, Qwen, or local models for autonomous coding tasks.

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **skill** because:
- OpenHands has a CLI (`oh` command) invokable via terminal
- No custom Python integration needed — shell commands + existing tools suffice
- No API key management beyond what the user configures for OpenHands itself
- Follows the exact same pattern as `claude-code` and `codex` skills
- Should be **bundled** (in `skills/`) since model-agnostic coding is broadly useful

### What We'd Need

- OpenHands CLI installed: `pip install openhands-ai` or via Docker
- User's own LLM API key configured for their chosen provider
- Docker for sandboxed execution (optional but recommended)
- Skill SKILL.md following the claude-code/codex pattern

### Phased Rollout

**Phase 1: Basic CLI Skill**
- SKILL.md with installation instructions, one-shot tasks, background mode
- Key flags table (`--model`, `--sandbox`, `--max-iterations`, etc.)
- Usage patterns: one-shot coding tasks, PR reviews, bug fixes
- PTY mode support (pty=true, same as claude-code/codex)
- Model configuration examples (Claude, GPT, DeepSeek, Ollama, Nous models)

**Phase 2: Advanced Patterns**
- Parallel issue fixing with git worktrees (like codex skill)
- Docker sandbox configuration guidance
- MCP server passthrough (OpenHands MCP ↔ Hermes MCP)
- Browser automation tasks (QA, scraping, testing)
- Multi-agent delegation patterns

**Phase 3: Deep Integration**
- OpenHands as an alternative backend for issue #404 (Symphony-style issue resolution)
- Integration with file transfer (issue #466) for sandboxed workspace access
- Benchmark comparison skill: run same task on OpenHands vs Claude Code vs Codex
- Custom agent configurations (model routing, cost optimization)

---

## Pros & Cons

### Pros
- **Model freedom**: Only coding agent skill that works with ANY LLM provider, including local/open models (Nous, DeepSeek, Qwen, Ollama)
- **Stronger sandbox**: Docker-based isolation with security hardening (cap-drop ALL, no-new-privileges) by default
- **Browser automation**: Can interact with web apps, run QA, fill forms — unique among our coding agent skills
- **Multi-agent**: OpenHands handles task decomposition internally via agent delegation
- **MIT licensed**: No licensing concerns for any use case
- **Community**: 68.6k+ stars, active development, $18.8M funding — not going away
- **SWE-bench validated**: 77.6% on Verified (own harness), competitive with commercial agents
- **Completes the trifecta**: Claude Code (Anthropic) + Codex (OpenAI) + OpenHands (any model)

### Cons / Risks
- **Heavy dependency**: `pip install openhands-ai` pulls 70+ packages — potential conflicts with Hermes Agent's own dependencies if installed in same environment (recommend separate venv or Docker)
- **V0→V1 migration**: Active architecture transition. V0 deprecated April 2026. CLI/SDK interface may change
- **Docker dependency**: Full sandbox requires Docker daemon running, which may not be available on all systems
- **Installation friction**: Heavier setup than Claude Code (single binary) or Codex (npm install)
- **Performance variability**: Quality depends heavily on which LLM the user configures — Nous/open models may underperform vs Claude/GPT on complex coding tasks
- **Overlap with mini-swe-agent**: Hermes already has `mini-swe-agent/` for SWE-bench-style tasks, though OpenHands is far more feature-complete

---

## Open Questions

- Should the skill recommend installing OpenHands in a separate virtualenv to avoid dependency conflicts, or via Docker?
- Which OpenHands CLI commands/flags are stable across the V0→V1 transition?
- Should we create a default OpenHands config that uses Hermes Agent's configured LLM provider, or keep configuration independent?
- Should the skill support both `openhands` CLI mode and Docker mode (`docker run` the full platform)?
- How should we handle the browser automation use case — is that better as a separate skill or integrated into this one?

---

## References

- OpenHands Documentation: https://docs.openhands.dev/overview/introduction
- OpenHands GitHub: https://github.com/All-Hands-AI/OpenHands
- OpenHands SDK (V1): https://github.com/OpenHands/software-agent-sdk
- OpenHands Architecture: https://docs.openhands.dev/overview/architecture
- SWE-bench Leaderboard: https://www.swebench.com/
- Related issue #466: File transfer between sandboxed environments
- Related issue #404: Symphony-Style Autonomous Issue Resolution
- Related issue #344: Multi-Agent Architecture
- Existing skills: `claude-code`, `codex`, `hermes-agent`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: OpenHands Coding Agent Skill — Model-Agnostic Sandboxed Code Agent Delegation #477

Overview

Research Findings

How OpenHands Works

SWE-Bench Performance

Key Design Decisions

Unique Advantages Over Claude Code / Codex

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Backend	Description
Docker (default)	Sandboxed container with cap-drop ALL, no-new-privileges
Local	Direct host execution, no isolation
Kubernetes	Enterprise orchestration across clusters
Modal	Cloud GPU execution
Remote API	Custom HTTP-based lifecycle management

Feature	OpenHands	Claude Code	Codex CLI
Model support	Any (LiteLLM, 100+)	Claude only	OpenAI only
Local/open models	Yes (Ollama, vLLM)	No	No
Docker sandbox	Default, hardened	Basic container	Basic container
Browser automation	Yes (Playwright)	No	No
Multi-agent	Yes (delegation)	No	No
Web GUI	Yes (React SPA)	No	No
Self-hosted enterprise	Yes (K8s/RBAC)	No	No
MCP support	Yes (native)	Yes	Yes
License	MIT	Proprietary	Open Source

Feature: OpenHands Coding Agent Skill — Model-Agnostic Sandboxed Code Agent Delegation #477

Description

Overview

Research Findings

How OpenHands Works

SWE-Bench Performance

Key Design Decisions

Unique Advantages Over Claude Code / Codex

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions