Skip to content

justinhuangcode/browsertap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

browsertap

English | δΈ­ζ–‡

CI crates.io License: MIT Rust TypeScript Platform GitHub Stars Last Commit Issues

Tap into your live browser. Close the agent loop. Built in Rust.

browsertap lets AI agents and CLI tools control an already-open, already-authenticated browser session -- screenshots, JS execution, smoke tests, console capture, and more -- without spinning up headless instances or re-logging in.

Why browsertap?

AI agents that interact with web apps need to see and control the real thing. They need to run JS, take screenshots, check console errors, and click buttons -- all in a browser that's already logged in with real cookies, real sessions, and real state.

Existing tools don't fit this workflow:

browsertap Playwright Puppeteer
Attaches to live browser tab Yes No (new instance) No (new instance)
Preserves auth state Yes No (re-login) No (re-login)
Runtime dependency None (single binary) Node.js Node.js
Binary size ~5 MB ~100 MB+ ~100 MB+
Startup time < 10 ms > 500 ms > 500 ms
Built-in smoke testing Yes (parallel) No No
Session codenames Yes No No
Console/network buffering Yes Via code only Via code only
Self-signed TLS Built-in (rcgen + rustls) N/A N/A
Designed for AI agents Yes No (test framework) No (library)

The typical AI agent workflow with browsertap:

Developer has web app open in browser (already logged in)
        |
        v
@browsertap/runtime connects the tab to the daemon
        |
        v
AI agent runs: browsertap run-js iron-falcon "document.title"
        |
        v
AI agent runs: browsertap screenshot iron-falcon -o page.jpg
        |
        v
AI agent inspects the screenshot / queries DOM / checks console
        |
        v
AI agent runs: browsertap smoke iron-falcon --preset main
        |
        v
No headless browser. No re-login. No lost state.

Features

  • Attach to live sessions -- Control an already-open, already-authenticated browser tab
  • Daemon architecture -- browsertapd runs as a persistent HTTPS + WebSocket hub; CLI commands talk to it via REST API
  • Session codenames -- Friendly names like iron-falcon or calm-otter instead of UUIDs
  • JavaScript execution -- Run arbitrary JS in the browser context via CLI
  • Screenshot capture -- Full page or element-specific via CSS selector
  • Console capture -- View browser console output with level filtering; buffer survives CLI reconnect
  • Network capture -- Inspect HTTP requests/responses buffered by the runtime
  • Smoke testing -- Automated route sweep with presets, error detection, and progress tracking
  • Selector discovery -- Find interactive elements on the page (buttons, links, inputs)
  • HMAC-SHA256 tokens -- Short-lived session tokens (5 min) and CLI tokens (1 hour)
  • Self-signed TLS -- Auto-generated certificates via rcgen + rustls, zero external tools
  • Auto-reconnect -- Browser runtime reconnects with exponential backoff after disconnects
  • Config file walk-up -- Place browsertap.toml at project root; CLI finds it automatically
  • JSON output -- Machine-readable output for agent integration
  • Cross-platform -- macOS, Linux, and Windows

Installation

Via Cargo (recommended)

cargo install browsertap-cli
cargo install browsertap-daemon

Pre-built binaries

Download from GitHub Releases:

Platform Binary
Linux x86_64 browsertap-v*-linux-x86_64.tar.gz
Linux ARM64 browsertap-v*-linux-arm64.tar.gz
macOS Intel browsertap-v*-macos-x86_64.tar.gz
macOS Apple Silicon browsertap-v*-macos-arm64.tar.gz
Windows x86_64 browsertap-v*-windows-x86_64.zip

Browser runtime SDK

npm install @browsertap/runtime

From source

git clone https://github.com/justinhuangcode/browsertap.git
cd browsertap
cargo build --release
# Binaries at: target/release/browsertap, target/release/browsertapd

Requirements: Rust 1.75+ and a Chromium-based browser for the page you want to control.

Quick Start

1. Start the daemon

browsertapd
# => browsertapd listening on https://127.0.0.1:4455

2. Integrate the browser runtime into your web app

import { createBrowserTapClient, createSessionStorageAdapter } from '@browsertap/runtime';

const client = createBrowserTapClient({
  storage: createSessionStorageAdapter(),
  onStatus: (snap) => console.log('browsertap:', snap.status, snap.codename),
  autoReconnectHandshake: () =>
    fetch('/api/browsertap/handshake', { method: 'POST' }).then(r => r.json()),
});

const handshake = await fetch('/api/browsertap/handshake', { method: 'POST' }).then(r => r.json());
await client.startSession(handshake);
// => "connected as iron-falcon"

3. Control from CLI

browsertap sessions
# CODENAME             URL                                      STATE      HEARTBEAT
# iron-falcon          http://localhost:3000/dashboard           open       2s ago

browsertap run-js iron-falcon "document.title"
# "Dashboard - MyApp"

browsertap screenshot iron-falcon --selector "#analytics" -o card.jpg
# Screenshot saved to card.jpg (45832 bytes)

Commands

Command Description
daemon Start the browsertap daemon (delegates to browsertapd)
sessions List active browser sessions with codenames and heartbeat status
run-js <session> <code> Execute JavaScript in a browser session
screenshot <session> Capture page or element screenshot
click <session> <selector> Click an element by CSS selector
navigate <session> <url> Navigate a session to a URL
smoke <session> Run smoke tests across configured routes
console <session> View console logs from a session
selectors <session> Discover interactive selectors on the page

Command Flags

Global Flags

Flag Default Description
--daemon-url <url> https://127.0.0.1:4455 Daemon URL (also via BROWSERTAP_DAEMON_URL)

screenshot Flags

Flag Default Description
-s, --selector <sel> (full page) CSS selector of element to capture
-o, --output <path> screenshot.jpg Output file path
--quality <f32> 0.85 JPEG quality (0.0 - 1.0)

smoke Flags

Flag Default Description
--preset <name> defaults Route preset name from browsertap.toml
--routes <list> (none) Comma-separated route list
--parallel <n> 1 Number of parallel workers

console Flags

Flag Default Description
-t, --tail <n> 50 Number of recent events to show
--level <level> (all) Filter by level: log, info, warn, error

How It Works

  1. browsertapd starts an HTTPS + WebSocket server on 127.0.0.1:4455. It auto-generates self-signed TLS certificates on first run and stores them at ~/.browsertap/certs/.

  2. Your web app embeds @browsertap/runtime. When activated, the runtime calls your backend's handshake endpoint, which mints an HMAC-SHA256 signed session token using the shared secret.

  3. The browser runtime opens a WebSocket to the daemon, sends a register message with the signed token, and receives a friendly codename (e.g., iron-falcon). It then patches console.* to capture logs and starts a heartbeat every 5 seconds.

  4. CLI commands (browsertap run-js iron-falcon "...") send HTTPS requests to the daemon's REST API. The daemon forwards the command to the browser via WebSocket, waits for the result, and returns it to the CLI.

  5. Console and network events are buffered in the daemon (500 console events, 200 network events per session). The CLI can retroactively query these buffers, even for events that occurred before the CLI connected.

Architecture

                            WebSocket (wss://)
+------------------+                                +------------------+
|  Your Web App    |  ------>  +--------------+     |  CLI / AI Agent  |
|  (logged in)     |  <------  |  browsertapd |     |                  |
|                  |           |              |     |  $ browsertap    |
| @browsertap/     |  register |  Session     | <-- |    run-js        |
|   runtime        |  heartbeat|  Registry    | --> |    screenshot    |
|                  |  console  |  Command     |     |    smoke         |
|                  |  result   |  Router      |     |    console       |
+------------------+           |  TLS (rustls)|     +------------------+
                               +--------------+
                                 HTTPS REST API

Configuration

Create browsertap.toml at your project root. The CLI walks up directories to find it.

app_label = "MyApp"
app_url = "http://localhost:3000"
daemon_url = "https://127.0.0.1:4455"

[daemon]
host = "127.0.0.1"
port = 4455

[smoke]
defaults = ["dashboard", "settings", "profile"]

[smoke.presets]
main = ["dashboard", "settings", "profile", "billing"]
quick = ["dashboard"]

[smoke.redirects]
"/" = "/dashboard"

Resolution order: CLI flags > Environment variables > browsertap.toml > Defaults

Environment Variables

Variable Description
BROWSERTAP_DAEMON_URL Daemon URL
BROWSERTAP_HOST Daemon listen host
BROWSERTAP_PORT Daemon listen port
BROWSERTAP_SECRET Shared secret (hex string)

Backend Handshake Endpoint

Your web app backend needs one endpoint to mint session tokens:

// POST /api/browsertap/handshake
import { readFileSync } from 'fs';
import { createHmac, randomUUID } from 'crypto';

export async function POST() {
  const secret = process.env.BROWSERTAP_SECRET
    ?? readFileSync(`${process.env.HOME}/.browsertap/secret.key`, 'utf8').trim();

  const sessionId = randomUUID();
  const payload = {
    token_id: randomUUID(),
    scope: 'session',
    subject: 'browsertap-web',
    session_id: sessionId,
    issued_at: new Date().toISOString(),
    expires_at: new Date(Date.now() + 5 * 60 * 1000).toISOString(),
  };

  const encoded = Buffer.from(JSON.stringify(payload)).toString('base64url');
  const sig = createHmac('sha256', Buffer.from(secret, 'hex'))
    .update(encoded).digest('base64url');

  return Response.json({
    sessionId,
    sessionToken: `${encoded}.${sig}`,
    socketUrl: 'wss://127.0.0.1:4455/bridge',
    expiresAt: Math.floor(Date.now() / 1000) + 300,
  });
}

Security & Threat Model

browsertap is designed for single-user, local-only use on development machines.

Layer Control Detail
HTTPS server Localhost-only Binds to 127.0.0.1; never exposed to the network
TLS Auto-generated certs Self-signed via rcgen + rustls at ~/.browsertap/certs/
Session tokens HMAC-SHA256, short-lived Browser tokens expire in 5 minutes; CLI tokens in 1 hour
Token verification Constant-time Uses hmac crate's timing-safe comparison
Secret storage Owner-only permissions ~/.browsertap/secret.key created with mode 0600 (Unix)
Console buffer Bounded Max 500 events per session to prevent memory exhaustion

Not recommended for

  • Multi-user / shared machines -- Other local users with root access can read the session token
  • Production workloads -- browsertap is a development/testing tool; no rate limiting or audit logging
  • Untrusted networks -- Self-signed certificates are not verified by default

Project Structure

browsertap/
β”œβ”€β”€ Cargo.toml                    # Workspace root
β”œβ”€β”€ browsertap.toml               # Example project config
β”œβ”€β”€ crates/
β”‚   β”œβ”€β”€ shared/                   # Shared library (tokens, protocol, types)
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ lib.rs            # Module exports
β”‚   β”‚       β”œβ”€β”€ token.rs          # HMAC-SHA256 token sign/verify
β”‚   β”‚       β”œβ”€β”€ protocol.rs       # WebSocket + REST protocol types
β”‚   β”‚       β”œβ”€β”€ session.rs        # Session state, config types
β”‚   β”‚       └── codename.rs       # Friendly codename generation
β”‚   β”œβ”€β”€ daemon/                   # Daemon binary (browsertapd)
β”‚   β”‚   └── src/
β”‚   β”‚       β”œβ”€β”€ main.rs           # Axum HTTPS server + REST routes
β”‚   β”‚       β”œβ”€β”€ state.rs          # Session registry, command routing
β”‚   β”‚       β”œβ”€β”€ websocket.rs      # WebSocket handler (register/heartbeat/command)
β”‚   β”‚       └── tls.rs            # Self-signed cert generation (rcgen)
β”‚   └── cli/                      # CLI binary (browsertap)
β”‚       └── src/
β”‚           β”œβ”€β”€ main.rs           # Clap command definitions
β”‚           β”œβ”€β”€ client.rs         # HTTP client for daemon REST API
β”‚           └── config.rs         # browsertap.toml loader (walk-up)
└── runtime/
    └── browser/                  # Browser runtime SDK (TypeScript)
        β”œβ”€β”€ package.json          # @browsertap/runtime
        β”œβ”€β”€ tsconfig.json
        └── src/
            β”œβ”€β”€ index.ts          # Public API exports
            β”œβ”€β”€ client.ts         # WebSocket lifecycle, command executor, console patch
            β”œβ”€β”€ types.ts          # TypeScript type definitions
            └── storage.ts        # Session persistence adapters

Contributing

Contributions are welcome! Please open an issue to discuss your idea before submitting a PR.

Changelog

See Releases for version history.

License

MIT

About

A real-time browser control CLI for live tab capture, DOM interaction, and agent automation. 🌐

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors