Debug Stagehand act/extract runs with a local timeline

Stagehand sits on top of Playwright and LLM decisions. BrowserTrace wraps the page you already use and records each high-level action as a local trace step.

View repo Open exported trace Adapter feedback

Improving this guide or a Stagehand adapter note? Use the First PR Recipe to keep the first contribution small and reviewable.

Try the trace viewer first

uvx --from "browsertrace[ui]" browsertrace doctor
uvx --from "browsertrace[ui]" browsertrace demo
uvx --from "browsertrace[ui]" browsertrace

Open http://127.0.0.1:3000 and inspect the failed checkout-agent run before wiring a real Stagehand page.

Wrap a Stagehand page

from stagehand import Stagehand
from browsertrace import Tracer
from browsertrace.integrations.stagehand import wrap_stagehand

tracer = Tracer()
stagehand = await Stagehand(...).init()
page = wrap_stagehand(stagehand.page, tracer, name="stagehand checkout run")

await page.goto("https://example.com")
await page.act("click the checkout button")
await page.extract("get the order total")
page.bt_run.close()

The wrapper records goto, act, extract, observe, and click calls while preserving the original page methods.

Arguments and keyword arguments are saved as model_input. Successful Stagehand return values are written back to the same trace step as model_output.

What the trace captures

The Stagehand method and instruction.
The current page URL.
A screenshot before the action when available.
The successful Stagehand result, including observe or extract output when returned by the wrapped method.
Step status and exception text if the action fails.
Exportable HTML with optional model I/O redaction.

Debug custom tool replay gaps

If a Stagehand cache can replay normal page actions but skips a custom_tool, separate the replay contract from the diagnostic trace contract before changing cache behavior.

For replay, the cache needs enough data to call the current tool implementation again: tool name, serialized arguments, stable tool-call or step id, original status or error, and whether the tool is replay-safe. Credential-fill, payment, submit, or other side-effectful tools should require explicit opt-in before replay.

For debugging, the trace can preserve a richer boundary even when replay is disabled:

Tool name and redacted argument summary.
Returned result summary, status, and error.
Timestamp or step index.
URL or page id before and after the tool call.
Optional screenshot or observation id before and after the tool call.

That makes a skipped custom tool visible instead of letting the run continue with missing page state. Avoid storing raw credentials or sensitive tool args by default; prefer a redacted shape plus a runtime hook that can rehydrate secrets when replay is explicitly allowed.

Related community case: browserbase/stagehand#1558.

Debug semantic verification boundaries

If you add a semantic verification layer around act, keep the verifier result as an inspectable action boundary, not only a boolean. The useful debugging question is not just whether verification passed; it is why the target was authorized, blocked, or marked ambiguous.

Record the action proposal: instruction, action type, selected selector, role, text, and confidence when available.
Record the target evidence: URL, screenshot id, DOM snapshot id, candidate elements, and semantic endpoint evidence.
Record the verification result: verifier type, status, and reason.
Record the execution outcome: executed, blocked, escalated, failed, URL after, and error.

That split makes failure classes visible later: high-confidence proposal with failed verification, passed verification with wrong post-action state, ambiguous verification that still executed, or skipped verification before a later page-state failure.

Related community case: browserbase/stagehand#1880.

Share only what is safe

browsertrace list
browsertrace export <run_id> -o full.html
browsertrace export <run_id> --redact -o public.html
browsertrace export <run_id> --public -o public.html

Use --public before attaching a real trace to a public issue or community thread. Use individual redaction flags when you want to keep some fields visible.

For a compact checklist, see the share-safe export recipe.

Deeper Stagehand adapter feedback is tracked in issue #8.