BrowserTrace

Debug Browser Use failures with a local trace timeline

Browser Use agents fail in browser state, not just in logs. BrowserTrace records each step locally so you can inspect screenshots, URL, action, model I/O, status, and the first failed step.

View repo Open exported trace Adapter feedback

Improving this guide or a Browser Use adapter note? Use the First PR Recipe to keep the first contribution small and reviewable.

Why this exists

When a Browser Use run fails late in a task, a stack trace usually tells you which exception happened. It often does not show what the agent saw, which URL it was on, which model decision selected the target, or whether the wrong assumption came from an earlier step.

BrowserTrace keeps that missing context in a local SQLite database plus screenshot files. No signup or cloud service is required.

Try it before wiring Browser Use

uvx --from "browsertrace[ui]" browsertrace doctor
uvx --from "browsertrace[ui]" browsertrace demo
uvx --from "browsertrace[ui]" browsertrace

Open http://127.0.0.1:3000, then inspect demo: Browser Use local HTML upload navigation failure. From a source checkout, python examples/browser_use_callback_demo.py records Browser Use-shaped callback steps without installing Browser Use.

To verify that callback demo from the terminal, run browsertrace list --limit 5 and expect demo: browser-use callback flow. Then run browsertrace show <run_id> and expect two steps: search_google(query=BrowserTrace) and click(selector=#result-1).

Attach it to a Browser Use agent

from browser_use import Agent
from browsertrace import Tracer
from browsertrace.integrations.browser_use import attach_tracer

tracer = Tracer()
agent = Agent(task="...", llm=ChatOpenAI(model="gpt-4o"))

with attach_tracer(agent, tracer, name="browser-use checkout run"):
    await agent.run()

The adapter hooks Browser Use step callbacks and records URL, screenshot, action summary, compact browser-state context, model thought/actions, status, and errors into the same local timeline.

For adapter field requests, use the Browser Use feedback issue and include the Browser Use version, failure shape, and which context your logs missed.

Callback compatibility

attach_tracer supports Browser Use agents that expose register_new_step_callback, plus older or forked agents with on_step_start, on_step, or _new_step_callback attributes.

Current Browser Use examples may also pass on_step_start or on_step_end directly to agent.run(...). For that run-hook-only path, use create_run_hooks:

from browsertrace import Tracer
from browsertrace.integrations.browser_use import create_run_hooks

tracer = Tracer()
hooks = create_run_hooks(tracer, name="browser-use checkout run")

with hooks:
    await agent.run(on_step_start=hooks.on_step_start, on_step_end=hooks.on_step_end)

The run-hook helper reads Browser Use history and browser-session summaries when they are available, then records the latest thought, action, extracted content, URL, title, tabs, and screenshot flag into the same local timeline. If your Browser Use version exposes a different hook shape, comment on issue #11 with the version and callback surface.

To try this path without installing Browser Use, run python examples/browser_use_run_hooks_demo.py from a source checkout.

For a quick terminal check, run browsertrace list --limit 5 and expect demo: browser-use run hooks flow. Then run browsertrace show <run_id> and expect the same two demo step labels: search_google(query=BrowserTrace) and click(selector=#result-1).

Debug icon-only click targets

If the screenshot shows the target but Browser Use clicks a nearby toolbar button, treat it as a visible-target versus accessible-target mismatch. Icon-only buttons often rely on hover tooltips, and that tooltip text may not be present in the accessibility tree when the agent ranks candidate elements.

The best app-side fix is an accessible name on the real button, for example aria-label="Create Test". Until the app can change, make the task prompt structural, such as "click the plus icon immediately next to the search field in the Functional toolbar", or use a deterministic selector for that step.

Related community case: browser-use/browser-use#4801.

Debug stable selector replay gaps

When a Browser Use run succeeds once, the clicked selector is evidence, not automatically reusable automation. The model may have chosen an observed element index, Browser Use may have reconstructed XPath from the current DOM, and Playwright may store an internal role selector that is useful for replay but not a durable CSS/XPath contract.

This makes deterministic replay a second step: inspect the trace, pick the strongest selector contract, then promote it into a test or scripted action. BrowserTrace should preserve the evidence bundle first instead of pretending every Browser Use click has a reusable selector string.

Related community case: browser-use/browser-use#3856.

Debug multi-step form drift

Long Browser Use form runs tend to fail after the useful state has already moved on. A dependent dropdown may not resolve, a validation message may appear after the next action, or one wrong field can poison the rest of a 20-step transcript.

Treat each committed form step as its own trace boundary instead of relying on one long prompt to explain everything.

In BrowserTrace terms, a good failure report should say "country selection passed, tax-id stayed disabled" instead of "the agent failed after 24 actions".

Related community case: browser-use/browser-use#4476.

Debug new-tab desync

If a click or Enter action opens a new tab, Browser Use may keep reasoning from the stale page context unless the action result makes that browser-state delta explicit. The symptom is usually repeated retries against the old page while the expected element exists in the new tab.

In BrowserTrace terms, treat this as a browser topology change. The trace should explain whether the agent switched to the new page, stayed on the stale page, or attempted later actions before the new tab finished loading.

Related community case: browser-use/browser-use#4758.

Debug remote CDP hangs

For Browserless or other remote-CDP providers, a failed Browser Use run may not be only a screenshot problem. A stale remote browser session can make one CDP request stop returning while the websocket still looks connected. If recovery holds a shared event-bus lock during that wait, one degraded browser session can delay unrelated sessions.

When you see screenshot capture, DOM snapshot, or browser-state collection timeouts, collect timing evidence before changing retry policy:

In BrowserTrace terms, treat this as method-timing and browser-session evidence, not just a red screenshot step. The trace should explain whether the CDP method failed fast, never returned, or blocked later state collection through recovery and lock timing.

Related community case: browser-use/browser-use#4579.

Debug local HTML upload navigation mistakes

A local HTML upload can be misread as a navigation target before the intended upload action runs. The security watchdog may correctly block the bad URL, but the useful debugging boundary is earlier: why did the planner or model-visible context turn an attachment name into navigate at step 0?

Treat this as a planner/action validation boundary and future adapter boundary, not as proof of a low-level file upload bug. It also does not mean BrowserTrace already captures every internal Browser Use field.

Related community case: browser-use/browser-use#4794.

Debug action schema validation boundaries

Action schema coercion can hide why a Browser Use step targeted the wrong element. For example, a raw model action may put a boolean in an element-index field, then one validation path coerces it while another rejects it. After normalization, the final executed target can look deliberate unless the trace preserves both sides of the boundary.

Treat this as a schema validation and future adapter boundary. It does not mean BrowserTrace already captures every internal Browser Use field.

Related community case: browser-use/browser-use#4796.

Debug empty model responses

A failing parser exception might occur when the model provider returns an empty response (e.g., input_value='') before Browser Use validation runs. This suggests the parser received no assistant JSON content at all, rather than malformed JSON.

While BrowserTrace does not capture every internal Browser Use field, useful debugging evidence to preserve at the boundary between provider response, parsing, and execution includes:

Related community case: browser-use/browser-use#4786.

Compare failed and successful Browser Use runs

When the same Browser Use task has one failed run and one known-good run, compare them from the terminal to find the first divergent step before opening the local UI.

browsertrace list
browsertrace compare <failed_run_id> <success_run_id>
browsertrace compare <failed_run_id> <success_run_id> --json

While the local BrowserTrace UI is running, local wrappers and tools can request the same first-divergence payload through the web API:

curl http://127.0.0.1:3000/api/compare/<failed_run_id>/<success_run_id>

Prefer the local API endpoint over the CLI output when a script, dashboard, or automation preflight check needs the first-divergence payload as JSON — for example, a CI step that reports the exact divergent action back to a tracking system, or a local debugging UI that visualises the diff without shelling out.

Example compare output:

$ browsertrace compare failed-local-html-upload good-local-html-upload
First divergent step: 3
action: navigate
url: file:///tmp/browsertrace-report.html
status: failed
error: upload preview did not appear

Use --json when you need structured comparison output for automation or CI checks.

The CLI compares action, URL, status, and error fields and reports the first divergent step. It does not replace the local UI: use the comparison to locate the boundary, then open the failed run timeline to inspect screenshots, model input/output, and surrounding context.

Run comparison is most useful when the failed run and a successful run carry enough shared metadata to explain where they diverged. Preserve these fields when you can:

Feature tracker: aaronlab/browsertrace#369.

Share only what is safe

browsertrace list
browsertrace export <run_id> -o full.html
browsertrace export <run_id> --redact -o public.html
browsertrace export <run_id> --public -o public.html

The full export includes model input, model output, screenshots, and URLs. Use --public to omit all three sensitive fields before public sharing, or use individual redaction flags when you want to keep some fields visible.

Troubleshooting Browser Use traces

Could not attach to this Agent
BrowserTrace first tries register_new_step_callback, then common step callback attributes used by older or forked Browser Use agents. If your version exposes a different hook, comment on issue #11 with the Browser Use version and callback surface.
No screenshots appear
Some Browser Use states do not expose a screenshot for every step. BrowserTrace still records the URL, action summary, model thought/actions, status, and error when those fields are available.
The trace includes private page or prompt data
Keep the full trace local. Before attaching anything to a public issue or community post, run browsertrace export <run_id> --public -o public.html to omit prompt/model I/O, screenshots, and URLs.

What to inspect first

  1. Did the screenshot match the model's assumption?
  2. Did the selected action target the right element?
  3. Did the URL change earlier than expected?
  4. Did the model output mention a selector or label that was stale?
  5. Was the red step wrong, or did an earlier step poison the state?
BrowserTrace is MIT licensed and local-first. Browser Use adapter feedback is tracked in issue #11.