docs: browser-harness integration spike + recommendation#25
docs: browser-harness integration spike + recommendation#25
Conversation
User asked whether browser-use/browser-harness could be wired into
Claude Code, openclaw, or zeroclawed for agentic browser automation.
30-minute investigation; this RFC captures findings and recommends
a path.
## Headline conclusion
Browser-harness is ~592 lines of Python that connects an agent to a
user-running Chrome via CDP and ships pre-imported helpers
(\`new_tab\`, \`click_at_xy\`, \`capture_screenshot\`, \`js\`,
\`http_get\`, \`cdp\`). The native model is "agent reads SKILL.md
and shells out to \`browser-harness <<PY ... PY\`" — Claude Code
already supports this via skills.
Recommended first step: install browser-harness + drop SKILL.md
into ~/.claude/skills/. Five minutes; zero code changes; immediate
value. Don't build an MCP wrapper or port to Rust until the skill
has been used in anger.
## Four integration options ranked
A. Claude Code skill — recommended first step (5 min, no code)
B. zeroclawed-MCP tool wrapper — half a day, only if multi-agent
support becomes a real need
C. Rust port — DON'T (weeks of work; loses the harness's
edit-helpers-on-the-fly property)
D. Daemon-spawned browser pool — defer until async web automation
from chat channels is a real use case
## Security notes
- Profile mechanism claims cookies-only login state — verify before
trusting for high-value accounts
- \`BROWSER_USE_API_KEY\` should be a \`{{secret:...}}\` ref once
the substitution layer ships
- Agents with browser-harness can scrape anything the user's logged
into — categorical capability expansion worth flagging in deploy
docs
Doc explicitly notes what I did NOT do (install on the Mac;
\`chrome://inspect\` needs human eyeballs anyway), and why no
follow-up task is filed — the next move is "Brian tries A for two
weeks then we triage".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds an RFC/spike write-up evaluating whether browser-use/browser-harness should be integrated into this repo’s agent tooling (Claude Code / OpenClaw / zeroclawed), and recommends starting with a Claude Code skill drop-in before building any native integration.
Changes:
- Introduces a new RFC documenting what browser-harness is, its requirements, and integration options (A–D).
- Recommends adopting option (A) first (Claude Code skill), with (B) as a follow-up only if needed.
- Captures initial security considerations and operational tradeoffs.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Cost: ~half a day. Adds a `browse(action, params)` tool to | ||
| `crates/zeroclawed-mcp` (the MCP server we just scaffolded for secret | ||
| discovery in PR #23). The tool would shell out to `browser-harness` | ||
| exactly as the skill does, but agents would discover it via MCP | ||
| instead of reading a skill prompt. | ||
|
|
There was a problem hiding this comment.
This section references crates/zeroclawed-mcp and a “PR #23” scaffold, but there is no zeroclawed-mcp crate (and no other mention of that path) in the current workspace. To avoid sending readers on a dead-end, either point to the actual crate/module that would host an MCP server/tool wrapper, or phrase this as a hypothetical new crate without citing a non-existent path/PR.
| Cost: ~half a day. Adds a `browse(action, params)` tool to | |
| `crates/zeroclawed-mcp` (the MCP server we just scaffolded for secret | |
| discovery in PR #23). The tool would shell out to `browser-harness` | |
| exactly as the skill does, but agents would discover it via MCP | |
| instead of reading a skill prompt. | |
| Cost: ~half a day. Adds a `browse(action, params)` tool to a future | |
| MCP server crate/module for zeroclawed. The tool would shell out to | |
| `browser-harness` exactly as the skill does, but agents would | |
| discover it via MCP instead of reading a skill prompt. |
| - Exposes a heredoc API: `browser-harness <<'PY' ... PY` — the script | ||
| body runs in a Python REPL with the helpers pre-imported |
There was a problem hiding this comment.
The heredoc invocation is shown both as browser-harness <<'PY' ... PY and browser-harness <<PY ... PY. It’d be better to use one consistent form, and prefer a single-quoted heredoc delimiter in examples to prevent accidental shell interpolation/expansion in the embedded script.
| - `BROWSER_USE_API_KEY` (cloud-daemon feature) should be a | ||
| `{{secret:BROWSER_USE_API_KEY}}` reference per the substitution | ||
| RFC §3 once we wire it. |
There was a problem hiding this comment.
This references “the substitution RFC §3” and uses a {{secret:...}} syntax, but there’s no corresponding RFC/link or other usage in-repo to explain the convention. Consider linking to the actual document (path/URL) that defines secret substitution, or rewording this as a future TODO without an unresolvable reference/syntax.
| - `BROWSER_USE_API_KEY` (cloud-daemon feature) should be a | |
| `{{secret:BROWSER_USE_API_KEY}}` reference per the substitution | |
| RFC §3 once we wire it. | |
| - If we wire up the cloud-daemon feature, pass | |
| `BROWSER_USE_API_KEY` through the project's standard secret | |
| injection/configuration mechanism rather than hard-coding it. |
|
|
||
| Status: SPIKE — 30-minute investigation, not a build commitment. | ||
| Filed because the user asked whether `browser-use/browser-harness` | ||
| could be wired into us (Claude Code, openclaw, or zeroclawed) to give |
There was a problem hiding this comment.
“openclaw” here reads like the product name, which is capitalized as “OpenClaw” elsewhere in docs. If you’re referring to the project (not the adapter-kind string), consider capitalizing for consistency/clarity.
| could be wired into us (Claude Code, openclaw, or zeroclawed) to give | |
| could be wired into us (Claude Code, OpenClaw, or zeroclawed) to give |
|
Codex integration sweep note: I reviewed the inline comments on this PR. GitHub rejected direct inline replies for these older/outdated review comments with HTTP 422, so I am responding top-level instead: 3141257552, 3141257563, 3141257573, 3141257590.\n\nI did not edit this branch. Items that overlap the secure/fnox/host-agent/digest integration work are addressed in draft PR #38 (codex-integration-code), including stdin-based fnox set, bounded fnox waits, whitespace-safe !secure parsing, identity-aware !secure audit logs, valid-input host-agent properties, real WhatsApp HMAC verification, loopback OneCLI default bind, and race-free digest temp paths. Remaining PR-specific findings stay actionable for this branch owner or a follow-up. |
|
Acted on the recommended action (Option A) per the user's directive to follow up on RFC PRs:
Effective immediately in new Claude Code sessions. No work required on Option B (MCP wrapper) until the skill has been used in anger and we know what's missing. |
|
Subsumed by #44 (squashed to |
V1 of `.github/copilot-instructions.md` was ~970 words and read more like
documentation than reviewer guidance. Two issues that hurt signal:
1. **Length** — Copilot's per-repo instructions read window is ~4000
chars; v1 was over that, so the trailing past-mistake list and
skip-list were getting truncated.
2. **Format** — long bulleted exposition reads less like a rule and
more like prose, which Copilot treats as background context rather
than as constraints to apply.
V2 changes:
- Cuts to ~3500 chars by condensing the prioritization tiers and
removing per-class HIGH/MED/LOW exposition (the priority-order list
carries the same info in 5 lines).
- Leads with a single review philosophy line ("if uncertain, do not
comment"), the highest-leverage rule borrowed from deno's
copilot-instructions.md.
- Names specific past Copilot noise patterns from this repo's PR
history (env-mutex/serial_test repeated 8+ times across #19/#22/#23;
dead-doc-reference 4x across #20/#23/#25) so the "don't repeat
across PRs" rule has teeth.
- Cross-references a new path-scoped file at
`.github/instructions/rust.instructions.md` (`applyTo: "**/*.rs"`),
which carries the Rust-specific review nits (`#[expect]` over
`#[allow]`, `// SAFETY:` requirement, `Mutex` across `.await`,
`select!` cancellation safety, `kill_on_drop`, `&str` over
`&String`, `LazyLock<Regex>` for hot paths, etc.).
Path-scoped instructions are loaded only when a PR touches a file
matching `applyTo`, so Rust-specific rules don't burn the global
4000-char budget on PRs that only touch docs / TOML / shell.
…de (#56) * chore(.github): add copilot-instructions.md to tune PR-review behavior GitHub Copilot supports per-repo review instructions at .github/copilot-instructions.md (≤2 pages, applied to every Copilot PR review automatically). Adds calciforge-specific guidance to improve signal-to-noise: - Skip what pre-commit already gates (fmt, clippy, gitleaks) - Prioritize HIGH-severity classes that bit us in past reviews: secret leakage in logs, substitution-boundary correctness, unwrap/expect outside tests, missing unsafe around set_var (edition 2024), blocking I/O in async, auth bypass paths - Tell Copilot what's NOT a bug despite looking like one: {{secret:NAME}} sentinel syntax, post-history-scrub fake test values, FnoxClient subprocess-by-design, clashd/zeroclaw_* upstream references, mixed Rust edition (known) - Past-mistake checklist (6 classes from real findings that landed and were caught later — substitution-after-bypass, None dest_host, bearer-in-info-log, fnox set argv leak, 0.0.0.0 default, hardcoded fallback URLs) - Skip even-if-correct: 'consider adding tests' without specifics, rename suggestions vs. functional convention, feature-creep proposals Cross-references AGENTS.md (host-agent coding standards) and CLAUDE.md (public-repo secret discipline) so Copilot follows both. 83 lines, well under the documented 2-page cap. * chore(.github): tighten copilot-instructions + add path-scoped Rust file V1 of `.github/copilot-instructions.md` was ~970 words and read more like documentation than reviewer guidance. Two issues that hurt signal: 1. **Length** — Copilot's per-repo instructions read window is ~4000 chars; v1 was over that, so the trailing past-mistake list and skip-list were getting truncated. 2. **Format** — long bulleted exposition reads less like a rule and more like prose, which Copilot treats as background context rather than as constraints to apply. V2 changes: - Cuts to ~3500 chars by condensing the prioritization tiers and removing per-class HIGH/MED/LOW exposition (the priority-order list carries the same info in 5 lines). - Leads with a single review philosophy line ("if uncertain, do not comment"), the highest-leverage rule borrowed from deno's copilot-instructions.md. - Names specific past Copilot noise patterns from this repo's PR history (env-mutex/serial_test repeated 8+ times across #19/#22/#23; dead-doc-reference 4x across #20/#23/#25) so the "don't repeat across PRs" rule has teeth. - Cross-references a new path-scoped file at `.github/instructions/rust.instructions.md` (`applyTo: "**/*.rs"`), which carries the Rust-specific review nits (`#[expect]` over `#[allow]`, `// SAFETY:` requirement, `Mutex` across `.await`, `select!` cancellation safety, `kill_on_drop`, `&str` over `&String`, `LazyLock<Regex>` for hot paths, etc.). Path-scoped instructions are loaded only when a PR touches a file matching `applyTo`, so Rust-specific rules don't burn the global 4000-char budget on PRs that only touch docs / TOML / shell. * chore(.github): restore AGENTS.md + CLAUDE.md cross-refs in copilot instructions Verified GitHub's copilot-instructions docs do not specify the ~4000-char read window I'd assumed in the previous commit — that was the older Copilot Chat feature, not the Copilot code-review one. With no real length pressure, the AGENTS.md / CLAUDE.md pointers (dropped in v2 to save chars) are worth restoring. CLAUDE.md's "never commit these" list is exactly the kind of leakage Copilot should be enforcing on diff. * docs: split AGENTS.md into workspace-wide root + host-agent crate file The root `AGENTS.md` was titled "Calciforge Host-Agent" and carried host-agent-specific build/architecture rules — at the repo root, where agents (Claude Code, Codex, Copilot cloud agent, OpenClaw) read it as workspace-wide guidance. The mismatch meant agents working in any non-host-agent crate were getting irrelevant rules ("ZFS snapshot delegation", "mTLS CN→Unix user") and missing the actually-shared ones (substitution-boundary order, sentinel string contract, public-repo secret discipline pointer). Restructure: - Move the existing host-agent content verbatim to `crates/host-agent/AGENTS.md` (`git mv` so history is preserved). - New root `AGENTS.md` covers the whole workspace: crate inventory, project vocabulary, mandatory rules every agent must follow (CLAUDE.md secret discipline, pre-commit gate, sentinel contract, substitution boundary order, no-secret-values-in-logs, fnox stdin mode), workspace build/test commands, and pointers into per-area files (`crates/host-agent/AGENTS.md`, `docs/rfcs/`, `docs/security-gateway.md`, `docs/model-gateway.md`). Cross-refs `.github/copilot-instructions.md` and `.github/instructions/rust.instructions.md` so agents that find AGENTS.md first can pick up the Copilot-specific tuning if relevant. Pairs with the copilot-instructions tightening earlier on this branch. * fix(.github): correct gitleaks allowlist description in copilot-instructions Copilot's review caught a real factual error in v2: the line claimed specific literals (`+15555550100`, `7000000001`, `eyJ0eXAi...`) were allowlisted in `.gitleaks.toml`. They aren't — `7000000001` is even used in non-allowlisted source (`crates/calciforge/src/auth.rs`). The real allowlist mechanism is path-based (`tests/**/fixtures/`, `docs/rfcs/*.md`, lockfiles, etc.) plus a small regex list (loopback, RFC 5737, a few inherited-from-main values). Replace the misleading "these specific literals are allowlisted" claim with an accurate description of how the allowlist actually works, so Copilot doesn't downgrade real findings on the assumption they fall under a non-existent literal-match exemption. Pleasingly meta: this is exactly the "verify against the codebase before commenting" rule from the same file working as intended on the PR that introduced the file.
Summary
User asked whether `browser-use/browser-harness` could be wired
into Claude Code, openclaw, or zeroclawed for agentic browser
automation. 30-min investigation; this RFC captures findings + a
ranked recommendation.
Headline
Browser-harness is ~592 lines of Python connecting an agent to a
user-running Chrome via CDP. Native model: agent reads
`SKILL.md` and shells out to `browser-harness <<PY ... PY` with
helpers like `new_tab`, `click_at_xy`, `capture_screenshot`,
`js`, `http_get`, `cdp` pre-imported. Claude Code already
supports this pattern via skills.
Recommended path
(A) Install + drop SKILL.md into `~/.claude/skills/`. ~5 min,
zero code changes, immediate browser automation in any Claude Code
session. Use for two weeks, then triage whether (B)–(D) are needed.
Other options ranked
agents need browser access too
tasks — defer until that's a real use case
Security flags
trusting for high-value accounts
substitution layer ships
scrapeable) — worth flagging in deploy docs
What I did NOT do
needs human eyeballs)
🤖 Generated with Claude Code