Skip to content

Feature: Local Browser Backend via Playwright/CDP — Zero-Cost Alternative to Browserbase (inspired by Eigent) #374

@teknium1

Description

@teknium1

Overview

Hermes Agent's browser tools (tools/browser_tool.py) currently require Browserbase — a paid cloud service — for all browser automation. This creates a hard paywall for browser capabilities: users without a Browserbase subscription cannot use browser_navigate, browser_click, browser_snapshot, or any of the 10 browser tools.

Eigent (Apache-2.0, 12.8k stars), an open-source multi-agent desktop app built on CAMEL-AI, demonstrates a robust local browser approach using Chrome DevTools Protocol (CDP) with a thread-safe browser pool manager that allocates separate CDP ports for parallel agent tasks. This pattern could give Hermes a local browser backend alongside the existing Browserbase cloud option.

This would make browser automation accessible to all Hermes users without requiring a paid service, while preserving Browserbase as the premium option for stealth, CAPTCHA solving, and residential proxies.


Research Findings

How Eigent's Local Browser Works

Eigent uses a HybridBrowserToolkit backed by a CdpBrowserPoolManager:

CDP Browser Pool Manager (thread-safe):

  • Manages a pool of Chrome browser instances, each on a separate CDP port
  • Thread-safe port allocation for parallel agent tasks (critical since Hermes runs sub-agents via ThreadPoolExecutor)
  • Lazy creation — browsers spun up on demand, not pre-allocated
  • Up to 10 concurrent browser instances per worker type
  • Each cloned agent gets its own CDP port from the pool
  • Supports external browser connections (pre-authenticated sessions)

16 Browser Actions:

  • Navigate, click, type, scroll, console execute
  • Sheet read/input (for Google Sheets interaction)
  • Page content extraction

Key Design Pattern — Pool-per-Agent:

Agent A (research task) ──→ CDP Port 9222 ──→ Chrome Instance 1
Agent B (form filling)  ──→ CDP Port 9223 ──→ Chrome Instance 2
Agent C (data scraping)  ──→ CDP Port 9224 ──→ Chrome Instance 3

Each agent gets an isolated browser session. No cross-contamination of cookies, state, or navigation history.

Why Playwright Over Raw CDP

While Eigent uses raw CDP connections, Playwright would be a better fit for Hermes:

  1. Cross-browser support — Chromium, Firefox, WebKit (Browserbase is Chromium-only)
  2. Built-in aria snapshotspage.accessibility.snapshot() produces the same accessibility tree format Hermes already uses (ariaSnapshot)
  3. Auto-wait mechanics — Handles dynamic page loading, animations, and network idle detection
  4. Stealth plugins availableplaywright-stealth for basic anti-detection
  5. Headless by default — No GUI needed on servers/VMs
  6. Well-maintained — Microsoft-backed, active development, excellent Python bindings
  7. Already proven for agents — Used by browser-use (45k+ stars), Anthropic's computer-use reference, and many agent frameworks

Current State in Hermes Agent

tools/browser_tool.py (1608 lines):

  • Tightly coupled to Browserbase cloud API
  • Uses agent-browser Node.js CLI subprocess for browser commands
  • Creates Browserbase sessions via REST API (API key + project ID required)
  • Session management: per-task_id isolation, inactivity timeout (5 min), emergency cleanup
  • Accessibility tree snapshots for LLM-friendly page representation
  • Vision integration via browser_vision (screenshot → vision model)
  • 10 registered tools: navigate, snapshot, click, type, scroll, back, press, close, get_images, vision

What's missing:

  • No fallback when Browserbase credentials are absent — all browser tools simply fail
  • No local execution option
  • The check function in browser tool registration gates all browser tools on BROWSERBASE_API_KEY being set

Relevant architecture:

  • Hermes already has a multi-backend pattern for terminals (environments/ directory with local, docker, ssh, singularity, modal backends)
  • The same pattern could apply to browsers: BrowserBackend base class with BrowserbaseBackend and PlaywrightBackend implementations

Implementation Plan

Skill vs. Tool Classification

This is a codebase change to tools/browser_tool.py (and potentially a new tools/browser_backends/ module). It must be a tool because:

  • Handles browser session lifecycle management (create, cleanup, error recovery)
  • Manages binary data (screenshots for vision)
  • Requires precise execution (element selectors, page navigation timing)
  • Needs thread-safe resource management (browser pool for concurrent sub-agents)

What We'd Need

  1. Playwright Python packagepip install playwright + playwright install chromium
  2. Backend abstraction layerBrowserBackend base class with execute_command(), create_session(), close_session()
  3. Playwright backend implementation — Local browser management, CDP-like session isolation
  4. Auto-detection — Use Browserbase if credentials present, fall back to local Playwright
  5. Browser pool manager — Thread-safe allocation of browser contexts for concurrent sub-agents (inspired by Eigent's CdpBrowserPoolManager)

Phased Rollout

Phase 1: Local Playwright Backend (Core)

  • Add playwright as optional dependency
  • Create PlaywrightBackend class with browser context pool
  • Implement the 8 essential browser tools against Playwright: navigate, snapshot, click, type, scroll, back, press, close
  • Auto-detect: if BROWSERBASE_API_KEY is set → use Browserbase; else → use local Playwright
  • Per-task browser context isolation (Playwright's browser.new_context())
  • Headless by default, configurable via env var
  • Deliverable: Browser tools work for everyone, zero cloud dependency

Phase 2: Feature Parity + Vision

  • Implement browser_get_images and browser_vision with Playwright screenshots
  • Add playwright-stealth for basic anti-detection on local backend
  • Add persistent browser profiles (reuse cookies/auth across sessions)
  • Expose backend choice via config: browser_backend: local|browserbase|auto
  • Deliverable: Full feature parity between local and cloud backends

Phase 3: Advanced Local Features

  • Browser pool manager for sub-agent concurrency (Eigent's pattern)
  • Pre-authenticated session support (connect to user's running browser via CDP)
  • Recording/replay for debugging (Playwright trace viewer)
  • Optional headed mode for user observation
  • Deliverable: Production-grade local browser with advanced features

Pros & Cons

Pros

  • Removes paywall — Browser tools become accessible to all users, not just Browserbase subscribers
  • Zero cloud dependency — Works offline, no API keys needed, no usage-based billing
  • Follows existing pattern — Mirrors the multi-backend approach already used for terminal environments
  • Playwright is mature — Well-maintained, cross-browser, excellent Python bindings
  • Better for development — Local browser is faster (no network round-trips to cloud), visible in headed mode
  • Concurrent sub-agent support — Browser pool pattern (from Eigent) handles the ThreadPoolExecutor concurrency model Hermes uses for batch delegation
  • Apache-2.0 compatible — Both Eigent and Playwright are permissively licensed

Cons / Risks

  • Additional dependency — Playwright + browser binaries (~200MB for Chromium). Should be optional.
  • No stealth/CAPTCHA — Local browsers lack Browserbase's anti-detection, residential proxies, and CAPTCHA solving. Some sites will block local browsers.
  • Resource consumption — Each browser context uses ~100-200MB RAM. Concurrent sub-agents with browsers could strain machines with limited RAM.
  • Cross-platform complexity — Browser installation and management differs across Linux, macOS, Windows. Playwright handles this but edge cases exist.
  • Maintenance burden — Two browser backends to maintain, test, and keep feature-aligned.
  • Headless limitations — Some sites detect headless browsers and serve different content. Browserbase handles this; local Playwright may not.

Open Questions

  1. Should Playwright be a required or optional dependency? Optional seems right (like browser tools are today), but then we need graceful degradation.
  2. How to handle browser binary installation? playwright install chromium needs to run once. Should hermes doctor check for this? Should setup prompt for it?
  3. What about Docker/SSH terminal backends? If Hermes is running in a Docker container or remote SSH, can Playwright still work? (Answer: yes for Docker with --no-sandbox, unclear for SSH without X forwarding in headless mode)
  4. Should we support connecting to an existing browser? Eigent supports external CDP connections to pre-authenticated browsers. This is powerful for enterprise use cases but adds complexity.
  5. Pool size limits? Eigent allows 10 concurrent browsers. What's the right default for Hermes given the 3-subagent batch limit?

References

  • Eigent — CDP browser pool management pattern (Apache-2.0)
  • CAMEL-AI — Underlying framework with browser toolkits (Apache-2.0)
  • Playwright Python — Cross-browser automation library (Apache-2.0)
  • browser-use — Agent browser automation using Playwright (MIT)
  • playwright-stealth — Anti-detection plugin for Playwright
  • Hermes tools/browser_tool.py — Current Browserbase-only implementation (1608 lines)
  • Hermes environments/ — Multi-backend pattern reference for terminal backends

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions