Debug Stagehand act/extract runs with a local timeline
Stagehand sits on top of Playwright and LLM decisions. BrowserTrace wraps the page you already use and records each high-level action as a local trace step.
Improving this guide or a Stagehand adapter note? Use the First PR Recipe to keep the first contribution small and reviewable.
Try the trace viewer first
uvx --from "browsertrace[ui]" browsertrace doctor
uvx --from "browsertrace[ui]" browsertrace demo
uvx --from "browsertrace[ui]" browsertrace
Open http://127.0.0.1:3000 and inspect the failed checkout-agent run before wiring a real Stagehand page.
Wrap a Stagehand page
from stagehand import Stagehand
from browsertrace import Tracer
from browsertrace.integrations.stagehand import wrap_stagehand
tracer = Tracer()
stagehand = await Stagehand(...).init()
page = wrap_stagehand(stagehand.page, tracer, name="stagehand checkout run")
await page.goto("https://example.com")
await page.act("click the checkout button")
await page.extract("get the order total")
page.bt_run.close()
The wrapper records goto, act, extract, observe, and click calls while preserving the original page methods.
Arguments and keyword arguments are saved as model_input. Successful Stagehand return values are written back to the same trace step as model_output.
What the trace captures
- The Stagehand method and instruction.
- The current page URL.
- A screenshot before the action when available.
- The successful Stagehand result, including observe or extract output when returned by the wrapped method.
- Step status and exception text if the action fails.
- Exportable HTML with optional model I/O redaction.
Debug custom tool replay gaps
If a Stagehand cache can replay normal page actions but skips a custom_tool, separate the replay contract from the diagnostic trace contract before changing cache behavior.
For replay, the cache needs enough data to call the current tool implementation again: tool name, serialized arguments, stable tool-call or step id, original status or error, and whether the tool is replay-safe. Credential-fill, payment, submit, or other side-effectful tools should require explicit opt-in before replay.
For debugging, the trace can preserve a richer boundary even when replay is disabled:
- Tool name and redacted argument summary.
- Returned result summary, status, and error.
- Timestamp or step index.
- URL or page id before and after the tool call.
- Optional screenshot or observation id before and after the tool call.
That makes a skipped custom tool visible instead of letting the run continue with missing page state. Avoid storing raw credentials or sensitive tool args by default; prefer a redacted shape plus a runtime hook that can rehydrate secrets when replay is explicitly allowed.
Related community case: browserbase/stagehand#1558.
Debug semantic verification boundaries
If you add a semantic verification layer around act, keep the verifier result as an inspectable action boundary, not only a boolean. The useful debugging question is not just whether verification passed; it is why the target was authorized, blocked, or marked ambiguous.
- Record the action proposal: instruction, action type, selected selector, role, text, and confidence when available.
- Record the target evidence: URL, screenshot id, DOM snapshot id, candidate elements, and semantic endpoint evidence.
- Record the verification result: verifier type, status, and reason.
- Record the execution outcome: executed, blocked, escalated, failed, URL after, and error.
That split makes failure classes visible later: high-confidence proposal with failed verification, passed verification with wrong post-action state, ambiguous verification that still executed, or skipped verification before a later page-state failure.
Related community case: browserbase/stagehand#1880.
Share only what is safe
browsertrace list
browsertrace export <run_id> -o full.html
browsertrace export <run_id> --redact -o public.html
browsertrace export <run_id> --public -o public.html
Use --public before attaching a real trace to a public issue or community thread. Use individual redaction flags when you want to keep some fields visible.
For a compact checklist, see the share-safe export recipe.