Skip to content

Feature Request: Enhancement: Improved Detection of New Tabs Triggered by Click and Enter Events #4758

@BruceZX

Description

@BruceZX

What is the problem that your feature request solves?

Background / Description
Currently, when using the browser-use CLI with external LLMs, the model often struggles to determine whether a click action or a keys: "enter" command has triggered a page navigation into a new tab.

Since the CLI output doesn't explicitly signal the creation of a new window or tab, the LLM may continue sending commands based on the context of the original page. This leads to execution failures because the expected elements are actually located on the newly opened tab, requiring a switch_tab call that the LLM is unaware it needs to make.

The Problem
Ambiguity in State: External LLMs cannot "see" the browser's tab list unless explicitly told.

Execution Desync: When an action opens a target="_blank" link or a JS-triggered window, the Agent remains on the old page context.

Token Waste: The model attempts multiple retries on the wrong page before failing.

What is your proposed solution?

I have implemented a detection mechanism within the action execution flow:

Pre-action Snapshot: Capture the list of active tab IDs before executing click or press enter.

Post-action Comparison: Compare the tab list after the action.

CLI Feedback: If a new tab index is detected, the CLI explicitly returns a notification (e.g., New tab detected with index: X) in the observation result.

This allows the LLM to immediately recognize the state change and decide whether to call switch_tab to continue its task.

What hacks or alternative solutions have you tried to solve the problem?

No response

What version of browser-use are you currently using?

0.12.6

How badly do you want this new feature?

  • It's an urgent deal-breaker, I can't live without it
  • It's important to add it in the near-mid term future
  • It would be nice to add it sometime in the next 2 years
  • 💪 I'm willing to start a PR to work on this myself
  • 💼 My company would spend >$5k on Browser-Use Cloud if it solved this reliably for us

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions