Feature: Local Browser Backend via Playwright/CDP — Zero-Cost Alternative to Browserbase (inspired by Eigent)

## Overview

Hermes Agent's browser tools (`tools/browser_tool.py`) currently require [Browserbase](https://www.browserbase.com/) — a paid cloud service — for all browser automation. This creates a hard paywall for browser capabilities: users without a Browserbase subscription cannot use `browser_navigate`, `browser_click`, `browser_snapshot`, or any of the 10 browser tools.

[Eigent](https://github.com/eigent-ai/Eigent) (Apache-2.0, 12.8k stars), an open-source multi-agent desktop app built on [CAMEL-AI](https://github.com/camel-ai/camel), demonstrates a robust local browser approach using Chrome DevTools Protocol (CDP) with a **thread-safe browser pool manager** that allocates separate CDP ports for parallel agent tasks. This pattern could give Hermes a local browser backend alongside the existing Browserbase cloud option.

This would make browser automation accessible to all Hermes users without requiring a paid service, while preserving Browserbase as the premium option for stealth, CAPTCHA solving, and residential proxies.

---

## Research Findings

### How Eigent's Local Browser Works

Eigent uses a `HybridBrowserToolkit` backed by a `CdpBrowserPoolManager`:

**CDP Browser Pool Manager (thread-safe):**
- Manages a pool of Chrome browser instances, each on a separate CDP port
- Thread-safe port allocation for parallel agent tasks (critical since Hermes runs sub-agents via ThreadPoolExecutor)
- Lazy creation — browsers spun up on demand, not pre-allocated
- Up to 10 concurrent browser instances per worker type
- Each cloned agent gets its own CDP port from the pool
- Supports external browser connections (pre-authenticated sessions)

**16 Browser Actions:**
- Navigate, click, type, scroll, console execute
- Sheet read/input (for Google Sheets interaction)
- Page content extraction

**Key Design Pattern — Pool-per-Agent:**
```
Agent A (research task) ──→ CDP Port 9222 ──→ Chrome Instance 1
Agent B (form filling)  ──→ CDP Port 9223 ──→ Chrome Instance 2
Agent C (data scraping)  ──→ CDP Port 9224 ──→ Chrome Instance 3
```
Each agent gets an isolated browser session. No cross-contamination of cookies, state, or navigation history.

### Why Playwright Over Raw CDP

While Eigent uses raw CDP connections, [Playwright](https://playwright.dev/) would be a better fit for Hermes:

1. **Cross-browser support** — Chromium, Firefox, WebKit (Browserbase is Chromium-only)
2. **Built-in aria snapshots** — `page.accessibility.snapshot()` produces the same accessibility tree format Hermes already uses (`ariaSnapshot`)
3. **Auto-wait mechanics** — Handles dynamic page loading, animations, and network idle detection
4. **Stealth plugins available** — `playwright-stealth` for basic anti-detection
5. **Headless by default** — No GUI needed on servers/VMs
6. **Well-maintained** — Microsoft-backed, active development, excellent Python bindings
7. **Already proven for agents** — Used by [browser-use](https://github.com/browser-use/browser-use) (45k+ stars), Anthropic's computer-use reference, and many agent frameworks

---

## Current State in Hermes Agent

**`tools/browser_tool.py` (1608 lines):**
- Tightly coupled to Browserbase cloud API
- Uses `agent-browser` Node.js CLI subprocess for browser commands
- Creates Browserbase sessions via REST API (API key + project ID required)
- Session management: per-task_id isolation, inactivity timeout (5 min), emergency cleanup
- Accessibility tree snapshots for LLM-friendly page representation
- Vision integration via `browser_vision` (screenshot → vision model)
- 10 registered tools: navigate, snapshot, click, type, scroll, back, press, close, get_images, vision

**What's missing:**
- No fallback when Browserbase credentials are absent — all browser tools simply fail
- No local execution option
- The `check` function in browser tool registration gates all browser tools on `BROWSERBASE_API_KEY` being set

**Relevant architecture:**
- Hermes already has a multi-backend pattern for terminals (`environments/` directory with local, docker, ssh, singularity, modal backends)
- The same pattern could apply to browsers: `BrowserBackend` base class with `BrowserbaseBackend` and `PlaywrightBackend` implementations

---

## Implementation Plan

### Skill vs. Tool Classification

This is a **codebase change** to `tools/browser_tool.py` (and potentially a new `tools/browser_backends/` module). It must be a **tool** because:
- Handles browser session lifecycle management (create, cleanup, error recovery)
- Manages binary data (screenshots for vision)
- Requires precise execution (element selectors, page navigation timing)
- Needs thread-safe resource management (browser pool for concurrent sub-agents)

### What We'd Need

1. **Playwright Python package** — `pip install playwright` + `playwright install chromium`
2. **Backend abstraction layer** — `BrowserBackend` base class with `execute_command()`, `create_session()`, `close_session()`
3. **Playwright backend implementation** — Local browser management, CDP-like session isolation
4. **Auto-detection** — Use Browserbase if credentials present, fall back to local Playwright
5. **Browser pool manager** — Thread-safe allocation of browser contexts for concurrent sub-agents (inspired by Eigent's `CdpBrowserPoolManager`)

### Phased Rollout

**Phase 1: Local Playwright Backend (Core)**
- Add `playwright` as optional dependency
- Create `PlaywrightBackend` class with browser context pool
- Implement the 8 essential browser tools against Playwright: navigate, snapshot, click, type, scroll, back, press, close
- Auto-detect: if `BROWSERBASE_API_KEY` is set → use Browserbase; else → use local Playwright
- Per-task browser context isolation (Playwright's `browser.new_context()`)
- Headless by default, configurable via env var
- Deliverable: Browser tools work for everyone, zero cloud dependency

**Phase 2: Feature Parity + Vision**
- Implement `browser_get_images` and `browser_vision` with Playwright screenshots
- Add `playwright-stealth` for basic anti-detection on local backend
- Add persistent browser profiles (reuse cookies/auth across sessions)
- Expose backend choice via config: `browser_backend: local|browserbase|auto`
- Deliverable: Full feature parity between local and cloud backends

**Phase 3: Advanced Local Features**
- Browser pool manager for sub-agent concurrency (Eigent's pattern)
- Pre-authenticated session support (connect to user's running browser via CDP)
- Recording/replay for debugging (Playwright trace viewer)
- Optional headed mode for user observation
- Deliverable: Production-grade local browser with advanced features

---

## Pros & Cons

### Pros
- **Removes paywall** — Browser tools become accessible to all users, not just Browserbase subscribers
- **Zero cloud dependency** — Works offline, no API keys needed, no usage-based billing
- **Follows existing pattern** — Mirrors the multi-backend approach already used for terminal environments
- **Playwright is mature** — Well-maintained, cross-browser, excellent Python bindings
- **Better for development** — Local browser is faster (no network round-trips to cloud), visible in headed mode
- **Concurrent sub-agent support** — Browser pool pattern (from Eigent) handles the ThreadPoolExecutor concurrency model Hermes uses for batch delegation
- **Apache-2.0 compatible** — Both Eigent and Playwright are permissively licensed

### Cons / Risks
- **Additional dependency** — Playwright + browser binaries (~200MB for Chromium). Should be optional.
- **No stealth/CAPTCHA** — Local browsers lack Browserbase's anti-detection, residential proxies, and CAPTCHA solving. Some sites will block local browsers.
- **Resource consumption** — Each browser context uses ~100-200MB RAM. Concurrent sub-agents with browsers could strain machines with limited RAM.
- **Cross-platform complexity** — Browser installation and management differs across Linux, macOS, Windows. Playwright handles this but edge cases exist.
- **Maintenance burden** — Two browser backends to maintain, test, and keep feature-aligned.
- **Headless limitations** — Some sites detect headless browsers and serve different content. Browserbase handles this; local Playwright may not.

---

## Open Questions

1. **Should Playwright be a required or optional dependency?** Optional seems right (like browser tools are today), but then we need graceful degradation.
2. **How to handle browser binary installation?** `playwright install chromium` needs to run once. Should `hermes doctor` check for this? Should setup prompt for it?
3. **What about Docker/SSH terminal backends?** If Hermes is running in a Docker container or remote SSH, can Playwright still work? (Answer: yes for Docker with `--no-sandbox`, unclear for SSH without X forwarding in headless mode)
4. **Should we support connecting to an existing browser?** Eigent supports external CDP connections to pre-authenticated browsers. This is powerful for enterprise use cases but adds complexity.
5. **Pool size limits?** Eigent allows 10 concurrent browsers. What's the right default for Hermes given the 3-subagent batch limit?

---

## References

- [Eigent](https://github.com/eigent-ai/Eigent) — CDP browser pool management pattern (Apache-2.0)
- [CAMEL-AI](https://github.com/camel-ai/camel) — Underlying framework with browser toolkits (Apache-2.0)
- [Playwright Python](https://playwright.dev/python/) — Cross-browser automation library (Apache-2.0)
- [browser-use](https://github.com/browser-use/browser-use) — Agent browser automation using Playwright (MIT)
- [playwright-stealth](https://github.com/AtuboDad/playwright_stealth) — Anti-detection plugin for Playwright
- Hermes `tools/browser_tool.py` — Current Browserbase-only implementation (1608 lines)
- Hermes `environments/` — Multi-backend pattern reference for terminal backends


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Local Browser Backend via Playwright/CDP — Zero-Cost Alternative to Browserbase (inspired by Eigent) #374

Overview

Research Findings

How Eigent's Local Browser Works

Why Playwright Over Raw CDP

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Local Browser Backend via Playwright/CDP — Zero-Cost Alternative to Browserbase (inspired by Eigent) #374

Description

Overview

Research Findings

How Eigent's Local Browser Works

Why Playwright Over Raw CDP

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions