Skip to content

fix: cap DOM build and screenshot tasks per BrowserStateRequestEvent (#4579)#4862

Open
wtfashwin wants to merge 3 commits into
browser-use:mainfrom
wtfashwin:fix/4579-state-task-timeouts
Open

fix: cap DOM build and screenshot tasks per BrowserStateRequestEvent (#4579)#4862
wtfashwin wants to merge 3 commits into
browser-use:mainfrom
wtfashwin:fix/4579-state-task-timeouts

Conversation

@wtfashwin

@wtfashwin wtfashwin commented May 18, 2026

Copy link
Copy Markdown

Summary

Closes #4579.

DOMWatchdog.on_BrowserStateRequestEvent awaited the DOM build and clean-screenshot tasks without a per-task cap. TimeoutWrappedCDPClient already caps individual CDP calls (BROWSER_USE_CDP_TIMEOUT_S, default 60s), but a single DOM build fans out into many sequential CDP calls — so the aggregate await can still blow the 30s BrowserStateRequestEvent budget when a remote browser goes silent mid-build (stale WebSocket: sends succeed, responses never arrive, no close frame).

This change wraps each of the two awaits with asyncio.wait_for using configurable caps:

  • BROWSER_USE_DOM_BUILD_TIMEOUT_S (default 15s)
  • BROWSER_USE_SCREENSHOT_TIMEOUT_S (default 8s)

On timeout, the handler returns a minimal state for the slow component (empty selector_map / null screenshot) instead of holding the event-bus slot for the full budget. The healthy path is unchanged.

Env parsing mirrors the defensive guard used for BROWSER_USE_CDP_TIMEOUT_S in browser_use/browser/_cdp_timeout.py: only finite positive values take effect; nan / inf / non-positive / unparseable values fall back to the hardcoded default with a warning. Without this guard a typo would silently make every state build either time out immediately (nan/0) or never (inf/negative).

Why per-task and not per-CDP-call

The existing per-CDP-call timeout is needed but not sufficient:

  • Inner cap (60s): protects a single CDP method from a silent socket.
  • Outer cap (this PR, 15s / 8s): protects the aggregate of many sequential CDP calls inside one task from exceeding the event-level budget.

Both must hold for the handler to fail fast end to end. The defaults stay well below the 30s BrowserStateRequestEvent budget so the per-task cap actually fires before the event-level kill — otherwise it would add nothing over what the event bus already enforces.

Out of scope

The issue thread also describes a bubus global event-bus lock that serializes unrelated browser sessions during recovery. That's a separate lock-scope problem in bubus itself and is not addressed here.

Test plan

New file: tests/ci/browser/test_dom_watchdog_timeouts.py.

  • _parse_env_task_timeout unit tests — unset/empty falls back, valid value used, nan/inf/0/-5/abc rejected, small positives (0.001) accepted.
  • test_default_timeouts_are_sensible — invariants: 0 < SCREENSHOT < DOM_BUILD < 30s.
  • test_dom_build_timeout_returns_minimal_state — real BrowserSession + pytest-httpserver; monkeypatch _build_dom_tree_without_highlights to await an asyncio.Event that is never set (the exact shape of a stale-WS hang). Asserts the handler returns selector_map == {}, the screenshot survives independently, and total time < 10s (not the 30s the unwrapped await would have caused).
  • test_screenshot_timeout_returns_state_without_screenshot — symmetric for the screenshot path; asserts screenshot is None, the DOM survives with a real non-empty selector_map, and elapsed < 10s.
  • test_healthy_path_preserves_results — sanity: with default caps the wrap is transparent (real DOM + real screenshot returned).
  • uv run pytest -vxs tests/ci/browser/test_dom_watchdog_timeouts.py — 9/9 passing.
  • uv run pytest -vx tests/ci/browser/ — 60 passing, 3 skipped (no regressions).
  • uv run pyright — 0 errors on changed files; full repo 0 errors.
  • uv run ruff check --fix && uv run ruff format — clean.

Tests follow the project convention (CLAUDE.md): no mocking — real BrowserSession, real event bus, real Chromium via pytest-httpserver. Only the leaf coroutine that normally fans into CDP is swapped via monkeypatch.setattr for one that awaits a released asyncio.Event, which is the precise shape of the failure mode described in the issue.


Summary by cubic

Caps the DOM build and clean screenshot tasks in DOMWatchdog.on_BrowserStateRequestEvent with per-task timeouts so a silent browser can’t block the 30s event budget. Fixes #4579 by failing fast and returning a minimal state instead of hanging.

  • Bug Fixes
    • Wrap awaits with asyncio.wait_for: BROWSER_USE_DOM_BUILD_TIMEOUT_S (15s) and BROWSER_USE_SCREENSHOT_TIMEOUT_S (8s).
    • Defensive env parsing: only finite positive values apply; others fall back to defaults with a warning.
    • On timeout: return empty selector_map for DOM or None for screenshot; frees the event-bus slot quickly.
    • Complements the per-CDP-call cap and keeps defaults below the 30s event budget.
    • Add regression tests for timeouts and healthy path; fix minor typo in the timeout parser to satisfy codespell.

Written for commit 49b041b. Summary will update on new commits. Review in cubic

wtfashwin added 2 commits May 18, 2026 23:55
Closes browser-use#4579. on_BrowserStateRequestEvent awaited the DOM build and
clean-screenshot tasks without a per-task cap. TimeoutWrappedCDPClient
caps individual CDP calls, but a DOM build is many sequential CDP calls,
so the aggregate could blow the 30s event budget when a remote browser
goes silent mid-build (stale WebSocket — sends succeed, responses never
arrive). Wrap each await with asyncio.wait_for using configurable caps
(BROWSER_USE_DOM_BUILD_TIMEOUT_S=15s, BROWSER_USE_SCREENSHOT_TIMEOUT_S=8s),
with the same defensive env-parsing guard as BROWSER_USE_CDP_TIMEOUT_S so
nan/inf/non-positive values fall back instead of breaking every state
build. On timeout the handler returns a minimal state for the slow
component, freeing the event-bus slot fast.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: CDP connection instability causing indefinite hangs with remote browsers

1 participant