What is the problem that your feature request solves?
Background / Description
Currently, when using the browser-use CLI with external LLMs, the model often struggles to determine whether a click action or a keys: "enter" command has triggered a page navigation into a new tab.
Since the CLI output doesn't explicitly signal the creation of a new window or tab, the LLM may continue sending commands based on the context of the original page. This leads to execution failures because the expected elements are actually located on the newly opened tab, requiring a switch_tab call that the LLM is unaware it needs to make.
The Problem
Ambiguity in State: External LLMs cannot "see" the browser's tab list unless explicitly told.
Execution Desync: When an action opens a target="_blank" link or a JS-triggered window, the Agent remains on the old page context.
Token Waste: The model attempts multiple retries on the wrong page before failing.
What is your proposed solution?
I have implemented a detection mechanism within the action execution flow:
Pre-action Snapshot: Capture the list of active tab IDs before executing click or press enter.
Post-action Comparison: Compare the tab list after the action.
CLI Feedback: If a new tab index is detected, the CLI explicitly returns a notification (e.g., New tab detected with index: X) in the observation result.
This allows the LLM to immediately recognize the state change and decide whether to call switch_tab to continue its task.
What hacks or alternative solutions have you tried to solve the problem?
No response
What version of browser-use are you currently using?
0.12.6
How badly do you want this new feature?
What is the problem that your feature request solves?
Background / Description
Currently, when using the browser-use CLI with external LLMs, the model often struggles to determine whether a click action or a keys: "enter" command has triggered a page navigation into a new tab.
Since the CLI output doesn't explicitly signal the creation of a new window or tab, the LLM may continue sending commands based on the context of the original page. This leads to execution failures because the expected elements are actually located on the newly opened tab, requiring a switch_tab call that the LLM is unaware it needs to make.
The Problem
Ambiguity in State: External LLMs cannot "see" the browser's tab list unless explicitly told.
Execution Desync: When an action opens a target="_blank" link or a JS-triggered window, the Agent remains on the old page context.
Token Waste: The model attempts multiple retries on the wrong page before failing.
What is your proposed solution?
I have implemented a detection mechanism within the action execution flow:
Pre-action Snapshot: Capture the list of active tab IDs before executing click or press enter.
Post-action Comparison: Compare the tab list after the action.
CLI Feedback: If a new tab index is detected, the CLI explicitly returns a notification (e.g., New tab detected with index: X) in the observation result.
This allows the LLM to immediately recognize the state change and decide whether to call switch_tab to continue its task.
What hacks or alternative solutions have you tried to solve the problem?
No response
What version of browser-use are you currently using?
0.12.6
How badly do you want this new feature?