BrowserTrace: local trace viewer for Browser Use failures #4816
Replies: 4 comments
-
|
This is a useful direction. Browser agents are one of those places where the final answer is usually not enough to debug the failure. For a local trace viewer, I would make a few things first-class:
The most useful viewer is probably not just “watch the run.” It is “compare this failed run to the last successful run and see where the path diverged.” That comparison view would be valuable for anyone trying to turn browser-use from a demo into a repeatable workflow. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks, this is useful feedback. I agree that for Browser Use debugging the real workflow is often “where did this fail relative to the last passing run?”, not only replaying one trace. I opened a BrowserTrace issue to track the first local version: aaronlab/browsertrace#369 For v0.1 I would keep it explicit and conservative:
The detail I am still sorting out is the Browser Use final-result shape: separate fields for extracted content, tool output, and final result, or one normalized summary. If you have a concrete pass/fail pair you would expect to compare, that would help make the UI less vague. |
Beta Was this translation helpful? Give feedback.
-
|
Follow-up: I shipped the first small version of the failed-vs-good comparison path in BrowserTrace v0.1.19. It is intentionally explicit for now: browsertrace compare <failed_run_id> <success_run_id>
browsertrace compare <failed_run_id> <success_run_id> --jsonThe first slice compares existing step fields ( Release notes: The next Browser Use-specific question is still the final-result shape: for a pass/fail pair, would you expect comparison to separate extracted content, tool output, final result, and retry/repair attempts, or normalize those into one summary first? A concrete failed run + known-good run shape would help keep the UI honest. No stars/upvotes requested; I am looking for workflow feedback from people turning Browser Use runs into repeatable workflows. |
Beta Was this translation helpful? Give feedback.
-
|
Small follow-up: BrowserTrace v0.1.20 now exposes the same failed-vs-good comparison payload through the local UI server too: So the current Browser Use debugging path is: browsertrace compare <failed_run_id> <success_run_id>
browsertrace compare <failed_run_id> <success_run_id> --json
curl http://127.0.0.1:3000/api/compare/<failed_run_id>/<success_run_id>The API is meant for local dashboards, scripts, or automation preflight checks that want the first divergent action, URL, status, or error before opening the full trace UI. Useful feedback is still very concrete: for real Browser Use failed-vs-good pairs, should the next comparison field be final result, extracted content, tool output, retry/repair attempts, selected element summaries, or version/config metadata? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I built BrowserTrace, a small MIT-licensed local trace viewer for failed Browser Use and browser-agent runs.
Why I am sharing it here: BrowserTrace includes a Browser Use run-hook path for apps that call:
It also keeps the callback-style
attach_tracer(agent, ...)path for agents that exposeregister_new_step_callbackor compatible callback attributes.A minimal run-hook setup looks like this:
BrowserTrace keeps traces local by default and records the Browser Use failure timeline when fields are available: URL, screenshot flag, latest thought, model action, extracted content, status, and errors. It can also export a standalone public-safe HTML trace with prompts/model I/O, screenshots, and URLs omitted.
Quick no-API trial path from PyPI:
Persistent install from PyPI:
pip install "browsertrace[ui]"Repo: https://github.com/aaronlab/browsertrace
PyPI: https://pypi.org/project/browsertrace/
Browser Use guide: https://aaronlab.github.io/browsertrace/browser-use-debugging.html
Feedback I am looking for from Browser Use users: when a run fails, which fields matter most for debugging: task text, memory, extracted content, selected element, retry state, screenshots, model actions, final result, or something else?
No star/upvote ask; I am trying to make the Browser Use failure report useful before adding more adapter surface.
Beta Was this translation helpful? Give feedback.
All reactions