fix(agent): harden _extract_start_url URL gating — skip local paths (incl. quoted) and match file extensions on the path, not as a substring (#4794) by r266-tech · Pull Request #4983 · browser-use/browser-use

r266-tech · 2026-06-07T01:14:18Z

_extract_start_url (used when directly_open_url=True, the default) scans the task text and force-injects an initial navigate action before the LLM acts. This PR hardens that URL-gating against two false-decision classes, in both the agent and rust copies, with regression tests covering both implementations.

1. Local filesystem paths were force-navigated (#4794).
A local file path in the task — e.g. an uploaded /app/x_capabilities.html — matched the domain regex (the domain char class [a-zA-Z0-9-] excludes _, so x_capabilities.html is captured as capabilities.html) and was navigated to as the agent's first action, then blocked by the SecurityWatchdog, derailing the run. Such candidates are now skipped. Quoted/parenthesised local paths (e.g. "/app/x.html") are skipped too — leading quote/paren delimiters are stripped before the local-path check.

2. Everyday sites were wrongly excluded as "files".
The excluded-extension filter tested f'.{ext}' as a substring of the whole URL, so any host/path merely containing a short extension token was dropped from auto-navigation: docs.python.org (.py), www.python.org (.py), my.docs.google.com (.doc), and any .css/.js/.md host. Exclusion is now decided from the final path segment (scheme/query/fragment stripped, trailing slash removed, percent-decoded, ;path-params split), so genuine downloadable-file URLs still drop (report.pdf, report.pdf/, data.json;v=1, report%2Epdf, archive.tar.gz) while real pages are kept (index.html, example.com/report.pdf/view — a page, not a file).

Deliberate scope note: exclusion keys on the path, not the query string. A URL like example.com/download?file=report.pdf (a download endpoint) or example.com/view?doc=report.pdf (a viewer page) is kept — the old substring check dropped both, which over-excluded navigable viewer/search pages. Navigation policy is still enforced downstream by the SecurityWatchdog.

…_start_url (browser-use#4794) A local file path in the task (e.g. an uploaded /app/x_capabilities.html) matched the domain regex and was force-navigated as the agent's first action. Skip URL candidates whose whitespace-delimited token is a local filesystem path (/, ~/, ./, ../, or a drive). Fixes both the agent and rust _extract_start_url copies; adds a regression test. Avoids the extension-only approach, which would regress legit .html/.htm URLs.

cubic-dev-ai

1 issue found across 3 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="browser_use/agent/service.py">

<violation number="1" location="browser_use/agent/service.py:2391">
P2: Quoted/parenthesized local file paths bypass the local-path filter because surrounding delimiters are included in `local_path_token`, causing the anchored regex to fail.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

…ract_start_url The excluded-extension filter tested f'.{ext}' as a SUBSTRING of the whole URL, so everyday sites whose host/path merely contains a short extension token were wrongly dropped from auto-navigation (e.g. '.py' in docs.python.org, '.doc' in my.docs.google.com, '.js'/'.css' hosts). Decide exclusion from the final path segment instead (scheme/query/fragment stripped, trailing slash removed, percent-decoded, path-params split), so genuine downloadable-file URLs (report.pdf, data.json;v=1, archive.tar.gz) still drop. Also strip leading quote/paren delimiters before the local-path guard so quoted local paths are skipped too. Mirrored in agent + rust copies; regression tests exercise both implementations.

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

_{Reply with feedback, questions, or to request a fix.

Fix all with cubic | Re-trigger cubic}

…_extract_start_url The local-path guard lstrip()'d quote/paren delimiters ('"(<[{) but not the Markdown backtick, so a backtick-wrapped local path (e.g. `/app/x.html`) kept its leading backtick, the anchored filesystem-path regex failed, and since 'html' is intentionally not an excluded extension the path was auto-navigated as a bare URL. Add ` to the stripped delimiter set in both agent + rust copies; regression tests cover backtick-wrapped local paths in both implementations.

r266-tech · 2026-06-07T08:49:08Z

Good catch — addressed in 4267497. The local-path guard's delimiter strip set covered '"(<[{ but not the Markdown backtick, so a backtick-wrapped local path (/app/x_capabilities.html) kept its leading backtick, the anchored filesystem-path regex didn't match, and since html is intentionally not in excluded_extensions it was treated as a URL and auto-navigated. Added the backtick to the stripped delimiter set in both the agent and rust copies, with regression tests covering backtick-wrapped local paths in both implementations.

cubic-dev-ai Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread browser_use/agent/service.py

r266-tech changed the title ~~fix(agent): don't auto-navigate to local filesystem paths in _extract_start_url (#4794)~~ fix(agent): harden _extract_start_url URL gating — skip local paths (incl. quoted) and match file extensions on the path, not as a substring (#4794) Jun 7, 2026

cubic-dev-ai Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread browser_use/rust/service.py Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): harden _extract_start_url URL gating — skip local paths (incl. quoted) and match file extensions on the path, not as a substring (#4794)#4983

fix(agent): harden _extract_start_url URL gating — skip local paths (incl. quoted) and match file extensions on the path, not as a substring (#4794)#4983
r266-tech wants to merge 3 commits into
browser-use:mainfrom
r266-tech:fix-extract-start-url-local-paths-4794

r266-tech commented Jun 7, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment •

edited

Loading

Uh oh!

Uh oh!

r266-tech commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

r266-tech commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

r266-tech commented Jun 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

r266-tech commented Jun 7, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading