Skip to content

Feature: OpenHands Coding Agent Skill — Model-Agnostic Sandboxed Code Agent Delegation #477

@teknium1

Description

@teknium1

Overview

OpenHands (formerly OpenDevin) is an MIT-licensed, open-source AI-powered software development platform with 68.6k+ GitHub stars and $18.8M Series A funding. It provides autonomous coding agents that can edit files, run terminal commands, browse the web, and execute multi-step development tasks end-to-end — similar to Devin but fully open-source and model-agnostic.

We already have skills for delegating coding tasks to Claude Code and Codex CLI. OpenHands fills a critical gap: it is the only model-agnostic option — users can run it with Nous models, DeepSeek, Qwen, Llama, Claude, GPT, or even local Ollama models. It also provides Docker-sandboxed execution by default, multi-agent delegation, and built-in browser automation — capabilities neither Claude Code nor Codex offer.

This issue proposes adding an openhands skill to the autonomous-ai-agents category, following the established pattern of claude-code and codex skills.


Research Findings

How OpenHands Works

OpenHands uses an event stream architecture where all agent-environment interactions flow as typed events through a central hub:

User Message → Agent → LLM → Action → Runtime (sandbox) → Observation → Agent → ...

Key components:

  • Agent: Analyzes conversation state, produces Actions (CmdRunAction, FileWriteAction, BrowseURLAction, etc.)
  • Runtime: Executes Actions in isolated environments, returns Observations
  • EventStream: Central pub/sub hub for all communication between components
  • LLM: Brokers model interactions via LiteLLM (100+ provider support)

Runtime backends:

Backend Description
Docker (default) Sandboxed container with cap-drop ALL, no-new-privileges
Local Direct host execution, no isolation
Kubernetes Enterprise orchestration across clusters
Modal Cloud GPU execution
Remote API Custom HTTP-based lifecycle management

Product tiers:

  1. Software Agent SDK — Core Python library (pip install openhands-sdk)
  2. CLI — Terminal interface (oh command, similar to claude / codex)
  3. Local GUI — React SPA + FastAPI (similar to Devin/Jules)
  4. Cloud — Hosted at app.all-hands.dev
  5. Enterprise — Self-hosted Kubernetes with RBAC

SWE-Bench Performance

OpenHands reports 77.6% on SWE-bench Verified using their own harness (with Claude 3.5 Sonnet Thinking). On the standardized mini-SWE-agent harness, scores are typically lower (~72-76%). For context, the leaderboard shows Claude 4.5 Opus at 76.8% and GPT-5-2 Codex at 72.8% on the standard harness.

Key Design Decisions

  1. V1 SDK redesign (in progress, V0 deprecated April 2026): Moving from mandatory Docker to optional sandboxing, LocalWorkspace by default for lower friction
  2. LiteLLM for model routing: Supports 100+ providers without custom integrations, including prompt-based fallback for non-function-calling models (important for open-source/local LLMs)
  3. Event sourcing for state: Immutable event log enables replay, recovery, and incremental persistence
  4. Context condensation: LLMSummarizingCondenser replaces old conversation history with summaries to prevent context overflow (~2x cost reduction)
  5. Security analyzer: LLM-based risk assessment (Low/Medium/High) with configurable confirmation policy for dangerous commands
  6. MCP integration: First-class MCP support with OAuth flows, tool filtering, and configurable timeouts

Unique Advantages Over Claude Code / Codex

Feature OpenHands Claude Code Codex CLI
Model support Any (LiteLLM, 100+) Claude only OpenAI only
Local/open models Yes (Ollama, vLLM) No No
Docker sandbox Default, hardened Basic container Basic container
Browser automation Yes (Playwright) No No
Multi-agent Yes (delegation) No No
Web GUI Yes (React SPA) No No
Self-hosted enterprise Yes (K8s/RBAC) No No
MCP support Yes (native) Yes Yes
License MIT Proprietary Open Source

Current State in Hermes Agent

Existing coding agent skills:

  • claude-code — Delegates to Anthropic's Claude Code CLI (Claude-only)
  • codex — Delegates to OpenAI's Codex CLI (OpenAI-only)
  • hermes-agent — Spawns additional Hermes Agent instances

Related existing features:

Gap: No model-agnostic coding agent delegation. Users locked to Claude or OpenAI for delegated coding work. No way to use Nous models, DeepSeek, Qwen, or local models for autonomous coding tasks.


Implementation Plan

Skill vs. Tool Classification

This should be a skill because:

  • OpenHands has a CLI (oh command) invokable via terminal
  • No custom Python integration needed — shell commands + existing tools suffice
  • No API key management beyond what the user configures for OpenHands itself
  • Follows the exact same pattern as claude-code and codex skills
  • Should be bundled (in skills/) since model-agnostic coding is broadly useful

What We'd Need

  • OpenHands CLI installed: pip install openhands-ai or via Docker
  • User's own LLM API key configured for their chosen provider
  • Docker for sandboxed execution (optional but recommended)
  • Skill SKILL.md following the claude-code/codex pattern

Phased Rollout

Phase 1: Basic CLI Skill

  • SKILL.md with installation instructions, one-shot tasks, background mode
  • Key flags table (--model, --sandbox, --max-iterations, etc.)
  • Usage patterns: one-shot coding tasks, PR reviews, bug fixes
  • PTY mode support (pty=true, same as claude-code/codex)
  • Model configuration examples (Claude, GPT, DeepSeek, Ollama, Nous models)

Phase 2: Advanced Patterns

  • Parallel issue fixing with git worktrees (like codex skill)
  • Docker sandbox configuration guidance
  • MCP server passthrough (OpenHands MCP ↔ Hermes MCP)
  • Browser automation tasks (QA, scraping, testing)
  • Multi-agent delegation patterns

Phase 3: Deep Integration


Pros & Cons

Pros

  • Model freedom: Only coding agent skill that works with ANY LLM provider, including local/open models (Nous, DeepSeek, Qwen, Ollama)
  • Stronger sandbox: Docker-based isolation with security hardening (cap-drop ALL, no-new-privileges) by default
  • Browser automation: Can interact with web apps, run QA, fill forms — unique among our coding agent skills
  • Multi-agent: OpenHands handles task decomposition internally via agent delegation
  • MIT licensed: No licensing concerns for any use case
  • Community: 68.6k+ stars, active development, $18.8M funding — not going away
  • SWE-bench validated: 77.6% on Verified (own harness), competitive with commercial agents
  • Completes the trifecta: Claude Code (Anthropic) + Codex (OpenAI) + OpenHands (any model)

Cons / Risks

  • Heavy dependency: pip install openhands-ai pulls 70+ packages — potential conflicts with Hermes Agent's own dependencies if installed in same environment (recommend separate venv or Docker)
  • V0→V1 migration: Active architecture transition. V0 deprecated April 2026. CLI/SDK interface may change
  • Docker dependency: Full sandbox requires Docker daemon running, which may not be available on all systems
  • Installation friction: Heavier setup than Claude Code (single binary) or Codex (npm install)
  • Performance variability: Quality depends heavily on which LLM the user configures — Nous/open models may underperform vs Claude/GPT on complex coding tasks
  • Overlap with mini-swe-agent: Hermes already has mini-swe-agent/ for SWE-bench-style tasks, though OpenHands is far more feature-complete

Open Questions

  • Should the skill recommend installing OpenHands in a separate virtualenv to avoid dependency conflicts, or via Docker?
  • Which OpenHands CLI commands/flags are stable across the V0→V1 transition?
  • Should we create a default OpenHands config that uses Hermes Agent's configured LLM provider, or keep configuration independent?
  • Should the skill support both openhands CLI mode and Docker mode (docker run the full platform)?
  • How should we handle the browser automation use case — is that better as a separate skill or integrated into this one?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions