Overview
Hermes Agent's browser tools (tools/browser_tool.py) currently require Browserbase — a paid cloud service — for all browser automation. This creates a hard paywall for browser capabilities: users without a Browserbase subscription cannot use browser_navigate, browser_click, browser_snapshot, or any of the 10 browser tools.
Eigent (Apache-2.0, 12.8k stars), an open-source multi-agent desktop app built on CAMEL-AI, demonstrates a robust local browser approach using Chrome DevTools Protocol (CDP) with a thread-safe browser pool manager that allocates separate CDP ports for parallel agent tasks. This pattern could give Hermes a local browser backend alongside the existing Browserbase cloud option.
This would make browser automation accessible to all Hermes users without requiring a paid service, while preserving Browserbase as the premium option for stealth, CAPTCHA solving, and residential proxies.
Research Findings
How Eigent's Local Browser Works
Eigent uses a HybridBrowserToolkit backed by a CdpBrowserPoolManager:
CDP Browser Pool Manager (thread-safe):
- Manages a pool of Chrome browser instances, each on a separate CDP port
- Thread-safe port allocation for parallel agent tasks (critical since Hermes runs sub-agents via ThreadPoolExecutor)
- Lazy creation — browsers spun up on demand, not pre-allocated
- Up to 10 concurrent browser instances per worker type
- Each cloned agent gets its own CDP port from the pool
- Supports external browser connections (pre-authenticated sessions)
16 Browser Actions:
- Navigate, click, type, scroll, console execute
- Sheet read/input (for Google Sheets interaction)
- Page content extraction
Key Design Pattern — Pool-per-Agent:
Agent A (research task) ──→ CDP Port 9222 ──→ Chrome Instance 1
Agent B (form filling) ──→ CDP Port 9223 ──→ Chrome Instance 2
Agent C (data scraping) ──→ CDP Port 9224 ──→ Chrome Instance 3
Each agent gets an isolated browser session. No cross-contamination of cookies, state, or navigation history.
Why Playwright Over Raw CDP
While Eigent uses raw CDP connections, Playwright would be a better fit for Hermes:
- Cross-browser support — Chromium, Firefox, WebKit (Browserbase is Chromium-only)
- Built-in aria snapshots —
page.accessibility.snapshot() produces the same accessibility tree format Hermes already uses (ariaSnapshot)
- Auto-wait mechanics — Handles dynamic page loading, animations, and network idle detection
- Stealth plugins available —
playwright-stealth for basic anti-detection
- Headless by default — No GUI needed on servers/VMs
- Well-maintained — Microsoft-backed, active development, excellent Python bindings
- Already proven for agents — Used by browser-use (45k+ stars), Anthropic's computer-use reference, and many agent frameworks
Current State in Hermes Agent
tools/browser_tool.py (1608 lines):
- Tightly coupled to Browserbase cloud API
- Uses
agent-browser Node.js CLI subprocess for browser commands
- Creates Browserbase sessions via REST API (API key + project ID required)
- Session management: per-task_id isolation, inactivity timeout (5 min), emergency cleanup
- Accessibility tree snapshots for LLM-friendly page representation
- Vision integration via
browser_vision (screenshot → vision model)
- 10 registered tools: navigate, snapshot, click, type, scroll, back, press, close, get_images, vision
What's missing:
- No fallback when Browserbase credentials are absent — all browser tools simply fail
- No local execution option
- The
check function in browser tool registration gates all browser tools on BROWSERBASE_API_KEY being set
Relevant architecture:
- Hermes already has a multi-backend pattern for terminals (
environments/ directory with local, docker, ssh, singularity, modal backends)
- The same pattern could apply to browsers:
BrowserBackend base class with BrowserbaseBackend and PlaywrightBackend implementations
Implementation Plan
Skill vs. Tool Classification
This is a codebase change to tools/browser_tool.py (and potentially a new tools/browser_backends/ module). It must be a tool because:
- Handles browser session lifecycle management (create, cleanup, error recovery)
- Manages binary data (screenshots for vision)
- Requires precise execution (element selectors, page navigation timing)
- Needs thread-safe resource management (browser pool for concurrent sub-agents)
What We'd Need
- Playwright Python package —
pip install playwright + playwright install chromium
- Backend abstraction layer —
BrowserBackend base class with execute_command(), create_session(), close_session()
- Playwright backend implementation — Local browser management, CDP-like session isolation
- Auto-detection — Use Browserbase if credentials present, fall back to local Playwright
- Browser pool manager — Thread-safe allocation of browser contexts for concurrent sub-agents (inspired by Eigent's
CdpBrowserPoolManager)
Phased Rollout
Phase 1: Local Playwright Backend (Core)
- Add
playwright as optional dependency
- Create
PlaywrightBackend class with browser context pool
- Implement the 8 essential browser tools against Playwright: navigate, snapshot, click, type, scroll, back, press, close
- Auto-detect: if
BROWSERBASE_API_KEY is set → use Browserbase; else → use local Playwright
- Per-task browser context isolation (Playwright's
browser.new_context())
- Headless by default, configurable via env var
- Deliverable: Browser tools work for everyone, zero cloud dependency
Phase 2: Feature Parity + Vision
- Implement
browser_get_images and browser_vision with Playwright screenshots
- Add
playwright-stealth for basic anti-detection on local backend
- Add persistent browser profiles (reuse cookies/auth across sessions)
- Expose backend choice via config:
browser_backend: local|browserbase|auto
- Deliverable: Full feature parity between local and cloud backends
Phase 3: Advanced Local Features
- Browser pool manager for sub-agent concurrency (Eigent's pattern)
- Pre-authenticated session support (connect to user's running browser via CDP)
- Recording/replay for debugging (Playwright trace viewer)
- Optional headed mode for user observation
- Deliverable: Production-grade local browser with advanced features
Pros & Cons
Pros
- Removes paywall — Browser tools become accessible to all users, not just Browserbase subscribers
- Zero cloud dependency — Works offline, no API keys needed, no usage-based billing
- Follows existing pattern — Mirrors the multi-backend approach already used for terminal environments
- Playwright is mature — Well-maintained, cross-browser, excellent Python bindings
- Better for development — Local browser is faster (no network round-trips to cloud), visible in headed mode
- Concurrent sub-agent support — Browser pool pattern (from Eigent) handles the ThreadPoolExecutor concurrency model Hermes uses for batch delegation
- Apache-2.0 compatible — Both Eigent and Playwright are permissively licensed
Cons / Risks
- Additional dependency — Playwright + browser binaries (~200MB for Chromium). Should be optional.
- No stealth/CAPTCHA — Local browsers lack Browserbase's anti-detection, residential proxies, and CAPTCHA solving. Some sites will block local browsers.
- Resource consumption — Each browser context uses ~100-200MB RAM. Concurrent sub-agents with browsers could strain machines with limited RAM.
- Cross-platform complexity — Browser installation and management differs across Linux, macOS, Windows. Playwright handles this but edge cases exist.
- Maintenance burden — Two browser backends to maintain, test, and keep feature-aligned.
- Headless limitations — Some sites detect headless browsers and serve different content. Browserbase handles this; local Playwright may not.
Open Questions
- Should Playwright be a required or optional dependency? Optional seems right (like browser tools are today), but then we need graceful degradation.
- How to handle browser binary installation?
playwright install chromium needs to run once. Should hermes doctor check for this? Should setup prompt for it?
- What about Docker/SSH terminal backends? If Hermes is running in a Docker container or remote SSH, can Playwright still work? (Answer: yes for Docker with
--no-sandbox, unclear for SSH without X forwarding in headless mode)
- Should we support connecting to an existing browser? Eigent supports external CDP connections to pre-authenticated browsers. This is powerful for enterprise use cases but adds complexity.
- Pool size limits? Eigent allows 10 concurrent browsers. What's the right default for Hermes given the 3-subagent batch limit?
References
- Eigent — CDP browser pool management pattern (Apache-2.0)
- CAMEL-AI — Underlying framework with browser toolkits (Apache-2.0)
- Playwright Python — Cross-browser automation library (Apache-2.0)
- browser-use — Agent browser automation using Playwright (MIT)
- playwright-stealth — Anti-detection plugin for Playwright
- Hermes
tools/browser_tool.py — Current Browserbase-only implementation (1608 lines)
- Hermes
environments/ — Multi-backend pattern reference for terminal backends
Overview
Hermes Agent's browser tools (
tools/browser_tool.py) currently require Browserbase — a paid cloud service — for all browser automation. This creates a hard paywall for browser capabilities: users without a Browserbase subscription cannot usebrowser_navigate,browser_click,browser_snapshot, or any of the 10 browser tools.Eigent (Apache-2.0, 12.8k stars), an open-source multi-agent desktop app built on CAMEL-AI, demonstrates a robust local browser approach using Chrome DevTools Protocol (CDP) with a thread-safe browser pool manager that allocates separate CDP ports for parallel agent tasks. This pattern could give Hermes a local browser backend alongside the existing Browserbase cloud option.
This would make browser automation accessible to all Hermes users without requiring a paid service, while preserving Browserbase as the premium option for stealth, CAPTCHA solving, and residential proxies.
Research Findings
How Eigent's Local Browser Works
Eigent uses a
HybridBrowserToolkitbacked by aCdpBrowserPoolManager:CDP Browser Pool Manager (thread-safe):
16 Browser Actions:
Key Design Pattern — Pool-per-Agent:
Each agent gets an isolated browser session. No cross-contamination of cookies, state, or navigation history.
Why Playwright Over Raw CDP
While Eigent uses raw CDP connections, Playwright would be a better fit for Hermes:
page.accessibility.snapshot()produces the same accessibility tree format Hermes already uses (ariaSnapshot)playwright-stealthfor basic anti-detectionCurrent State in Hermes Agent
tools/browser_tool.py(1608 lines):agent-browserNode.js CLI subprocess for browser commandsbrowser_vision(screenshot → vision model)What's missing:
checkfunction in browser tool registration gates all browser tools onBROWSERBASE_API_KEYbeing setRelevant architecture:
environments/directory with local, docker, ssh, singularity, modal backends)BrowserBackendbase class withBrowserbaseBackendandPlaywrightBackendimplementationsImplementation Plan
Skill vs. Tool Classification
This is a codebase change to
tools/browser_tool.py(and potentially a newtools/browser_backends/module). It must be a tool because:What We'd Need
pip install playwright+playwright install chromiumBrowserBackendbase class withexecute_command(),create_session(),close_session()CdpBrowserPoolManager)Phased Rollout
Phase 1: Local Playwright Backend (Core)
playwrightas optional dependencyPlaywrightBackendclass with browser context poolBROWSERBASE_API_KEYis set → use Browserbase; else → use local Playwrightbrowser.new_context())Phase 2: Feature Parity + Vision
browser_get_imagesandbrowser_visionwith Playwright screenshotsplaywright-stealthfor basic anti-detection on local backendbrowser_backend: local|browserbase|autoPhase 3: Advanced Local Features
Pros & Cons
Pros
Cons / Risks
Open Questions
playwright install chromiumneeds to run once. Shouldhermes doctorcheck for this? Should setup prompt for it?--no-sandbox, unclear for SSH without X forwarding in headless mode)References
tools/browser_tool.py— Current Browserbase-only implementation (1608 lines)environments/— Multi-backend pattern reference for terminal backends