Debug Browser Use failures with a local trace timeline

Browser Use agents fail in browser state, not just in logs. BrowserTrace records each step locally so you can inspect screenshots, URL, action, model I/O, status, and the first failed step.

View repo Open exported trace Adapter feedback

Improving this guide or a Browser Use adapter note? Use the First PR Recipe to keep the first contribution small and reviewable.

Why this exists

When a Browser Use run fails late in a task, a stack trace usually tells you which exception happened. It often does not show what the agent saw, which URL it was on, which model decision selected the target, or whether the wrong assumption came from an earlier step.

BrowserTrace keeps that missing context in a local SQLite database plus screenshot files. No signup or cloud service is required.

Try it before wiring Browser Use

uvx --from "browsertrace[ui]" browsertrace doctor
uvx --from "browsertrace[ui]" browsertrace demo
uvx --from "browsertrace[ui]" browsertrace

Open http://127.0.0.1:3000, then inspect demo: Browser Use local HTML upload navigation failure. From a source checkout, python examples/browser_use_callback_demo.py records Browser Use-shaped callback steps without installing Browser Use.

To verify that callback demo from the terminal, run browsertrace list --limit 5 and expect demo: browser-use callback flow. Then run browsertrace show <run_id> and expect two steps: search_google(query=BrowserTrace) and click(selector=#result-1).

Attach it to a Browser Use agent

from browser_use import Agent
from browsertrace import Tracer
from browsertrace.integrations.browser_use import attach_tracer

tracer = Tracer()
agent = Agent(task="...", llm=ChatOpenAI(model="gpt-4o"))

with attach_tracer(agent, tracer, name="browser-use checkout run"):
    await agent.run()

The adapter hooks Browser Use step callbacks and records URL, screenshot, action summary, compact browser-state context, model thought/actions, status, and errors into the same local timeline.

For adapter field requests, use the Browser Use feedback issue and include the Browser Use version, failure shape, and which context your logs missed.

Callback compatibility

attach_tracer supports Browser Use agents that expose register_new_step_callback, plus older or forked agents with on_step_start, on_step, or _new_step_callback attributes.

Current Browser Use examples may also pass on_step_start or on_step_end directly to agent.run(...). For that run-hook-only path, use create_run_hooks:

from browsertrace import Tracer
from browsertrace.integrations.browser_use import create_run_hooks

tracer = Tracer()
hooks = create_run_hooks(tracer, name="browser-use checkout run")

with hooks:
    await agent.run(on_step_start=hooks.on_step_start, on_step_end=hooks.on_step_end)

The run-hook helper reads Browser Use history and browser-session summaries when they are available, then records the latest thought, action, extracted content, URL, title, tabs, and screenshot flag into the same local timeline. If your Browser Use version exposes a different hook shape, comment on issue #11 with the version and callback surface.

To try this path without installing Browser Use, run python examples/browser_use_run_hooks_demo.py from a source checkout.

For a quick terminal check, run browsertrace list --limit 5 and expect demo: browser-use run hooks flow. Then run browsertrace show <run_id> and expect the same two demo step labels: search_google(query=BrowserTrace) and click(selector=#result-1).

Debug icon-only click targets

If the screenshot shows the target but Browser Use clicks a nearby toolbar button, treat it as a visible-target versus accessible-target mismatch. Icon-only buttons often rely on hover tooltips, and that tooltip text may not be present in the accessibility tree when the agent ranks candidate elements.

Capture the live button HTML, not only the intended fixed markup.
Compare the accessibility snapshot before hover and after hover.
Record candidate bounding boxes for the intended icon, nearby toolbar buttons, and the element that was actually clicked.
Check whether the tooltip node appears only after hover and whether it is connected with aria-describedby.

The best app-side fix is an accessible name on the real button, for example aria-label="Create Test". Until the app can change, make the task prompt structural, such as "click the plus icon immediately next to the search field in the Functional toolbar", or use a deterministic selector for that step.

Related community case: browser-use/browser-use#4801.

Debug stable selector replay gaps

When a Browser Use run succeeds once, the clicked selector is evidence, not automatically reusable automation. The model may have chosen an observed element index, Browser Use may have reconstructed XPath from the current DOM, and Playwright may store an internal role selector that is useful for replay but not a durable CSS/XPath contract.

Record the action type and observed action, for example click(index=5) versus the final Playwright action.
Preserve selected element summary: role, accessible name, text snippet, href/name/type attributes, and bounding box when available.
Store candidate selectors such as aria role/name, data-testid, stable id, CSS, XPath, and text selectors, but label which ones were generated after observation.
Keep a selector validation result: matched count, chosen selector, reason rejected, and whether it still matched after the action.
Link the selector evidence to screenshot or DOM snapshot references, URL pattern, page title, and the reason the agent chose that element when available.

This makes deterministic replay a second step: inspect the trace, pick the strongest selector contract, then promote it into a test or scripted action. BrowserTrace should preserve the evidence bundle first instead of pretending every Browser Use click has a reusable selector string.

Related community case: browser-use/browser-use#3856.

Debug multi-step form drift

Long Browser Use form runs tend to fail after the useful state has already moved on. A dependent dropdown may not resolve, a validation message may appear after the next action, or one wrong field can poison the rest of a 20-step transcript.

Treat each committed form step as its own trace boundary instead of relying on one long prompt to explain everything.

Keep the canonical form payload outside the agent and pass only the fields needed for the current segment.
Record URL, title, submitted field labels, visible validation errors, and submit disabled/enabled state after each segment.
Capture screenshot reference, selected element summary, model/tool output, status, retry count, and checkpoint id.
Store whether the next step started from the previous checkpoint or restarted from the beginning.
Compare failed and known-good runs to find the first form segment where URL, status, validation text, or target evidence diverged.

In BrowserTrace terms, a good failure report should say "country selection passed, tax-id stayed disabled" instead of "the agent failed after 24 actions".

Related community case: browser-use/browser-use#4476.

Debug new-tab desync

If a click or Enter action opens a new tab, Browser Use may keep reasoning from the stale page context unless the action result makes that browser-state delta explicit. The symptom is usually repeated retries against the old page while the expected element exists in the new tab.

Record page ids and tab indexes before and after the action.
Capture pages_before, pages_after, and new_pages with URL/title probe status.
Link the new page back to the action id that created it.
Store the focused page before and after the action, not only the currently selected tab index.
Keep any recommended next action, such as switch_tab, as trace evidence instead of inferring it from later retries.

In BrowserTrace terms, treat this as a browser topology change. The trace should explain whether the agent switched to the new page, stayed on the stale page, or attempted later actions before the new tab finished loading.

Related community case: browser-use/browser-use#4758.

Debug remote CDP hangs

For Browserless or other remote-CDP providers, a failed Browser Use run may not be only a screenshot problem. A stale remote browser session can make one CDP request stop returning while the websocket still looks connected. If recovery holds a shared event-bus lock during that wait, one degraded browser session can delay unrelated sessions.

When you see screenshot capture, DOM snapshot, or browser-state collection timeouts, collect timing evidence before changing retry policy:

Event id plus browser/session/target id.
CDP method, request id, start/end/duration, and result, error, or timeout.
Websocket ping/pong timestamps near the stuck request.
event-bus lock timing: wait, acquire, release, and whether recovery waits run while the lock is held.
Whether the browser session was marked unhealthy, retried, or reused after the failed state request.

In BrowserTrace terms, treat this as method-timing and browser-session evidence, not just a red screenshot step. The trace should explain whether the CDP method failed fast, never returned, or blocked later state collection through recovery and lock timing.

Related community case: browser-use/browser-use#4579.

A local HTML upload can be misread as a navigation target before the intended upload action runs. The security watchdog may correctly block the bad URL, but the useful debugging boundary is earlier: why did the planner or model-visible context turn an attachment name into navigate at step 0?

Treat this as a planner/action validation boundary and future adapter boundary, not as proof of a low-level file upload bug. It also does not mean BrowserTrace already captures every internal Browser Use field.

Keep the task prompt and model-visible file or attachment context before step 0.
Record local filename, extension, and MIME type when safe to log.
Preserve the raw model action before validation and the parsed action type, bad URL, or upload target after parsing.
Check whether the bad URL came from the local filename, file contents, or surrounding observation text.
Capture the security/watchdog block reason and allowed-domains state that rejected the navigation.

Related community case: browser-use/browser-use#4794.

Debug action schema validation boundaries

Action schema coercion can hide why a Browser Use step targeted the wrong element. For example, a raw model action may put a boolean in an element-index field, then one validation path coerces it while another rejects it. After normalization, the final executed target can look deliberate unless the trace preserves both sides of the boundary.

Treat this as a schema validation and future adapter boundary. It does not mean BrowserTrace already captures every internal Browser Use field.

Keep the raw model action before validation.
Record the validated or normalized action that Browser Use actually executed.
Capture any schema or normalization warning, validation error, or coercion note.
Preserve selected element metadata for the final executed target, including index, role, text, selector, and URL when available.

Related community case: browser-use/browser-use#4796.

Debug empty model responses

A failing parser exception might occur when the model provider returns an empty response (e.g., input_value='') before Browser Use validation runs. This suggests the parser received no assistant JSON content at all, rather than malformed JSON.

While BrowserTrace does not capture every internal Browser Use field, useful debugging evidence to preserve at the boundary between provider response, parsing, and execution includes:

Provider request id or response metadata when available.
finish_reason, HTTP status, and token usage when safe to log.
Raw assistant content length before parsing.
Whether the response used tool calls or structured output fields instead of text content.
Step number, model name, and approximate message/context size.
Parse error and the Browser Use action that was expected next, if known.

Related community case: browser-use/browser-use#4786.

Compare failed and successful Browser Use runs

When the same Browser Use task has one failed run and one known-good run, compare them from the terminal to find the first divergent step before opening the local UI.

browsertrace list
browsertrace compare <failed_run_id> <success_run_id>
browsertrace compare <failed_run_id> <success_run_id> --json

While the local BrowserTrace UI is running, local wrappers and tools can request the same first-divergence payload through the web API:

curl http://127.0.0.1:3000/api/compare/<failed_run_id>/<success_run_id>

Prefer the local API endpoint over the CLI output when a script, dashboard, or automation preflight check needs the first-divergence payload as JSON — for example, a CI step that reports the exact divergent action back to a tracking system, or a local debugging UI that visualises the diff without shelling out.

Example compare output:

$ browsertrace compare failed-local-html-upload good-local-html-upload
First divergent step: 3
action: navigate
url: file:///tmp/browsertrace-report.html
status: failed
error: upload preview did not appear

Use --json when you need structured comparison output for automation or CI checks.

The CLI compares action, URL, status, and error fields and reports the first divergent step. It does not replace the local UI: use the comparison to locate the boundary, then open the failed run timeline to inspect screenshots, model input/output, and surrounding context.

Run comparison is most useful when the failed run and a successful run carry enough shared metadata to explain where they diverged. Preserve these fields when you can:

Keep a stable task or run id when Browser Use or your app exposes one.
Record the Browser Use version and BrowserTrace version for each run.
Capture model/provider and prompt/template version when known.
Preserve URL/title/action summary for each step.
Include selected element or target summary when Browser Use exposes it.
Store extracted content or final result summary when available.
Mark the error boundary for the first failed step instead of only the final exception.

Feature tracker: aaronlab/browsertrace#369.

Share only what is safe

browsertrace list
browsertrace export <run_id> -o full.html
browsertrace export <run_id> --redact -o public.html
browsertrace export <run_id> --public -o public.html

The full export includes model input, model output, screenshots, and URLs. Use --public to omit all three sensitive fields before public sharing, or use individual redaction flags when you want to keep some fields visible.

Troubleshooting Browser Use traces

Could not attach to this Agent: BrowserTrace first tries register_new_step_callback, then common step callback attributes used by older or forked Browser Use agents. If your version exposes a different hook, comment on issue #11 with the Browser Use version and callback surface.
No screenshots appear: Some Browser Use states do not expose a screenshot for every step. BrowserTrace still records the URL, action summary, model thought/actions, status, and error when those fields are available.
The trace includes private page or prompt data: Keep the full trace local. Before attaching anything to a public issue or community post, run browsertrace export <run_id> --public -o public.html to omit prompt/model I/O, screenshots, and URLs.

What to inspect first

Did the screenshot match the model's assumption?
Did the selected action target the right element?
Did the URL change earlier than expected?
Did the model output mention a selector or label that was stale?
Was the red step wrong, or did an earlier step poison the state?

BrowserTrace is MIT licensed and local-first. Browser Use adapter feedback is tracked in issue #11.