Can browser-use expose or record stable selectors (XPath/CSS) discovered during an LLM-guided run? #3856
Replies: 6 comments 2 replies
-
|
Exposing stable selectors from LLM-guided runs is valuable for building reusable automation. Here is how to implement it: Selector Recording Architecturefrom dataclasses import dataclass, field
from typing import List, Optional, Dict
import json
@dataclass
class StableSelector:
# Multiple selector strategies for resilience
xpath: str
css: Optional[str] = None
text_content: Optional[str] = None
aria_label: Optional[str] = None
data_testid: Optional[str] = None
# Metadata
element_type: str = "" # button, input, link
action_performed: str = "" # click, type
confidence: float = 1.0
def to_playwright(self) -> str:
if self.data_testid:
return f"[data-testid={self.data_testid}]"
if self.aria_label:
return f"[aria-label={self.aria_label}]"
if self.css:
return self.css
return f"xpath={self.xpath}"
@dataclass
class RecordedFlow:
name: str
url_pattern: str
steps: List[Dict] = field(default_factory=list)
selectors: Dict[str, StableSelector] = field(default_factory=dict)
def add_step(self, action: str, selector: StableSelector, value=None):
step_id = f"step_{len(self.steps)}"
self.steps.append({
"id": step_id,
"action": action,
"selector_id": step_id,
"value": value
})
self.selectors[step_id] = selector
def export(self) -> str:
return json.dumps({
"name": self.name,
"url_pattern": self.url_pattern,
"steps": self.steps,
"selectors": {k: vars(v) for k, v in self.selectors.items()}
}, indent=2)Recording During LLM Runclass SelectorRecorder:
async def record_action(self, page, element, action: str, value=None):
selector = StableSelector(
xpath=await self.get_xpath(element),
css=await self.get_unique_css(element),
text_content=await element.text_content(),
aria_label=await element.get_attribute("aria-label"),
data_testid=await element.get_attribute("data-testid"),
element_type=await element.evaluate("el => el.tagName.toLowerCase()"),
action_performed=action
)
self.current_flow.add_step(action, selector, value)
return selectorExport Format{
"name": "login_flow",
"url_pattern": "https://example.com/login",
"steps": [
{"id": "step_0", "action": "type", "selector_id": "step_0"},
{"id": "step_1", "action": "click", "selector_id": "step_1"}
],
"selectors": {
"step_0": {"xpath": "//input[@type=email]", "data_testid": "email-input"},
"step_1": {"xpath": "//button[@type=submit]", "text_content": "Sign In"}
}
}This lets you replay flows without LLM, with fallback selectors for resilience. More on automation patterns: https://github.com/KeepALifeUS/autonomous-agents |
Beta Was this translation helpful? Give feedback.
-
|
Yes, you can extract the selectors browser-use discovers during a run. The action history includes the element references used for each interaction. After For stable selectors specifically, the LLM-discovered selectors are often fragile (tied to DOM structure at that moment). A more robust approach: after the LLM finds the right element, generate multiple selector strategies (id, data attributes, aria labels, relative XPath) and test them against the current page to find the most resilient one. This gives you selectors you can reuse in traditional automation without the LLM. |
Beta Was this translation helpful? Give feedback.
-
|
I would not assume there is a 100% exact Playwright locator string to recover from every Browser Use action. The practical boundary looks like this:
For a compiler-style workflow, I would split it into two stages instead of trying to replay the raw action exactly:
That also gives you a clean failure mode: "the LLM found this element, but no stable locator could be proven," instead of silently exporting fragile code. For BrowserTrace I use Browser Use hooks to preserve the run evidence around each step, but I would still treat selector synthesis as a separate validation pass. A useful Browser Use-native hook, if maintainers want this workflow, would probably emit the selected DOM/history element plus action metadata immediately before execution, not just expose a final post-run summary. |
Beta Was this translation helpful? Give feedback.
-
|
Stable selector recording would be very useful, but I would avoid treating the selector the agent clicked as automatically reusable. A pattern that has worked well for browser-agent debugging is to store both the observed action and the evidence around it:
Then a deterministic replay system can choose the strongest selector later, rather than replaying the exact fragile XPath from the original run. I would also keep the recorded flow separate from the live agent trace. The trace says “what happened.” The reusable flow says “what we believe should happen again.” That distinction makes it easier to repair selectors without rewriting history. |
Beta Was this translation helpful? Give feedback.
-
|
Agree with the distinction between trace and reusable flow. For Browser Use specifically, I would preserve the trace boundary as the source of truth:
Then a separate replay/export pass can synthesize candidate locators and mark each one as validated, ambiguous, or unproven. That avoids turning a positional XPath from a lucky run into a false contract. The Browser Use hook that would make this much cleaner is a pre-action evidence event: selected element/history node, candidate list or selector_map entry, action args, frame/page identity, and model-visible context before execution. A final history object is useful, but pre-action evidence is what lets a debugger explain why the wrong element was chosen. For BrowserTrace I am treating selector data as evidence first, not as generated Playwright code. If anyone has a concrete Browser Use run where the agent found the right element once but the exported selector later broke, that would be the most useful shape to test against. |
Beta Was this translation helpful? Give feedback.
-
|
I would be careful with this idea. Recording selectors is useful. Treating them as production truth is where it gets risky. The selector that worked during an LLM run may be tied to one page state, one viewport, one experiment bucket, or one logged in account. If you compile that directly into Playwright, it can pass today and fail quietly next week. The safer version is to record more than the selector. Save the action, target text, URL, frame, nearby labels, screenshot, and result. Then use that to build a Playwright script with assertions. So yes, a recorder mode makes sense. But I would make it an audit artifact first, not a deterministic script generator on day one. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I am using browser-use together with an LLM as a high-level “pathfinder” to discover how to automate a specific web task.
The goal is not only to complete the task successfully, but to extract and persist the stable selectors (XPath/CSS) that were used during the successful run. The idea is to “compile” the LLM’s expensive reasoning phase into a lightweight, deterministic Playwright script that can run later without an LLM.
Concretely:
Does browser-use currently expose the selectors it resolves or uses internally during actions (clicks, inputs, navigation)?
Is there a built-in or recommended way to record or hook into selector resolution during a run?
If not, would extending browser-use to emit these selectors (e.g., via callbacks, logs, or a recorder mode) align with its design goals?
I am trying to bridge exploratory, LLM-driven automation with production-grade, selector-based scripts, and would like to understand whether browser-use supports—or could support—this workflow.
Thanks for any guidance or design insight.
Beta Was this translation helpful? Give feedback.
All reactions