Skip to content

Add anti bot detection via patchright#857

Merged
pirate merged 18 commits intobrowser-use:mainfrom
neo773:bot-detection-batch
Apr 21, 2025
Merged

Add anti bot detection via patchright#857
pirate merged 18 commits intobrowser-use:mainfrom
neo773:bot-detection-batch

Conversation

@neo773
Copy link
Copy Markdown
Contributor

@neo773 neo773 commented Feb 24, 2025

This PR adds anti bot detection via rebrowser patchright

As per their documentation they don't recommend applying patches manually as it's error prone and instead recommend using their drop-in package

This achieves a trust score of 69, previous score was 0

a.mp4

/claim #852
/closes #852

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Feb 24, 2025

CLA assistant check
All committers have signed the CLA.

@neo773
Copy link
Copy Markdown
Contributor Author

neo773 commented Feb 24, 2025

@gregpr07 this is ready to be reviewed

@codebeaver-ai

This comment was marked as outdated.

@gregpr07
Copy link
Copy Markdown
Member

Will review this! Thank you so much for the contribution :)

@gregpr07
Copy link
Copy Markdown
Member

gregpr07 commented Feb 25, 2025

Why do you remove the security arguments? This makes the cross site iframes break...

import os
import sys

from langchain_anthropic import ChatAnthropic

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))

import asyncio

from browser_use import Agent

llm = ChatAnthropic(model_name='claude-3-7-sonnet-20250219', temperature=0.0, timeout=30, stop=None)

task = """
go to https://csreis.github.io/tests/cross-site-iframe.html
click on Go cross-site (complex page)
and click on builders button
return what's new
"""
# task = 'go to https://abrahamjuliot.github.io/creepjs/ and tell me what trust score is'

agent = Agent(
	task=task,
	llm=llm,
)


async def main():
	await agent.run(max_steps=10)


asyncio.run(main())

for example - running this example without security doesn't work. Could you please fix this and I'll merge it.

@neo773
Copy link
Copy Markdown
Contributor Author

neo773 commented Feb 25, 2025

@gregpr07

Detection site does not load with web security disabled it throws API Access Denied error

Given that this is a edge case and iframes are quite rare these days maybe we could default to setting disable_security to false ?

Let me know your thoughts

@gaurav-cointab
Copy link
Copy Markdown

@neo773 the anti bot detection is not working for me.

I am trying to login into grubhub website, and it is saying that it feels that this is a bot. something else must be required for proper anti bot detection.

@gregpr07
Copy link
Copy Markdown
Member

@neo773 no actually cross site iFrames are really really important use case. Websites like Salesforce and other “legacy” providers that are extremely valuable need them. I can’t merge this - we absolutely need cross site iframe support!

@gaurav-cointab i guess not all websites will be fixed, but we will be one step closer with rebrowser (if it works)

@neo773
Copy link
Copy Markdown
Contributor Author

neo773 commented Feb 25, 2025

@gregpr07

Pushed a new commit that adds back the extra args, since we're not able to verify the actual score with this enabled maybe you can try with some real world workflows which it failed previously and report back?

@browser-use browser-use deleted a comment from codebeaver-ai bot Feb 26, 2025
@gregpr07
Copy link
Copy Markdown
Member

Why does it not work? Pinged rebrowser founder

@nwebson
Copy link
Copy Markdown

nwebson commented Feb 26, 2025

Detection site does not load with web security disabled it throws API Access Denied error
Could you share more details on this? How do you test it and where exactly can you see this error?

Rebrowser fixes a few different things but the most important one is Runtime.Enable leak that screams that you're using automated browser. It shouldn't break any iframes though.
Besides this leak there are many other different things that could mark you as a bot... I would suggest starting with rebrowser-bot-detector (sources)
@gaurav-cointab ^^

P.S. @neo773 actually, iframes are everywhere :)

@gaurav-cointab
Copy link
Copy Markdown

gaurav-cointab commented Feb 27, 2025

@neo773 it worked this time.. It was my bad. I did not pick the line which changed the user_agent. sorry for taking misinformed steps, and mentioning here that it is not working.

--update after full day of tinkering
it works sometimes. most of the times it does not work. so when it is working the website is somehow not detecting it as a bot.
when it is not working I am getting this on the https://bot-detector.rebrowser.net/ link

image

what steps should I take to fix these. I had to add the "args" that you had removed from the browser/browser.py file. because the link mentioned to add the
'--disable-blink-features=AutomationControlled',

Now these 3 are remaining, please let me know what to do for these 3.

I believe that fixing the userAgentNavigator should fix things. @neo773 @nwebson

@gaurav-cointab
Copy link
Copy Markdown

I ran the agent in the chrome instance path, I supplied the chrome.exe path, with guest as the profile thinking that the userAgent issue will be fixed. it did. when I ran the agent in chrome.exe now the bot detection link is only highlighting

image

And still the website is thinking this to be a bot. is there a way to fix this above highlighted item.

I am already using rebrowser-playwright@1.49.1, now what else is remaining?

@gregpr07
Copy link
Copy Markdown
Member

gregpr07 commented Mar 2, 2025

/tip $50 @gaurav-cointab
awesome work man :)

@neo773 are you going to fix it according to @gaurav-cointab and @nwebson?

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Mar 2, 2025

@neo773
Copy link
Copy Markdown
Contributor Author

neo773 commented Mar 2, 2025

@gregpr07
Pushed a new commit

Using chrome channel instead of chromium helps with user agent and other subtle difference from chromium
Also added flag disable-blink-features=AutomationControlled for webdriver test

SCR-20250303-czia

@gaurav-cointab
Copy link
Copy Markdown

gaurav-cointab commented Mar 3, 2025

I understand that this is a start for many websites, but this is not working for me. I also tried #805 that also did not work for me. Now what came in my testing, I am going to it out. The https://bot-detector.rebrowser.net/ works for testing for control via rebrowser only, because it now shows all green for all the sessions that are working for my use case or not working for my use case. Otherwise it does not seem to cover all the things that are done for bot detection (most of which even I dont know are being used to detect for bots). Then I came to know about this link, which is easy to configure and use to detect bots. This is a test link, to showcase how it might work.

https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php

Now I thought of testing this in chromium with old playwright, rebrowser playwright, nothing works.
Then I opened the above link in the chrome's guest mode (without browser-use), and found that this is giving me proper output. It also worked for the login into grubhub website, the website did not detect any bots, and allowed me to login into the proper website.

image

Then I opened the guest mode in chrome from browser-use (with rebrowser and without rebrowser), and it failed as shown below.

image

And this matched my own testing of open the grubhub links in browser use, where it gave false result.

So there is something which is not working, with rebrowser. This might be an issue with playwright, which the websites are able to detect, and we are not. It might be that the websites are able to detect that the browser is being opened in debug mode, with debug url, or something else. I dont know what to make of it, I leave this to the experts to figure it out.

@nwebson
Copy link
Copy Markdown

nwebson commented Mar 3, 2025

🕵️ Just my two cents here, I've been working in this niche for 5+ years. There is no bulletproof antidetect solution; it's an infinite cat and mouse game, and once some solution becomes open source, it stops working as all antidetect companies follow all these public repos and even some private channels/repos.
So, you can make it less detectable to pass some very obvious tests, but it doesn't make sense to try to pass reCAPTCHA and all other commercial solutions. It's just not going to work reliably in the long run.

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Mar 6, 2025

@gaurav-cointab: You just got a $50 tip! 👉 Complete your Algora onboarding to collect your payment.

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Mar 7, 2025

🎉🎈 @gaurav-cointab has been awarded $50! 🎈🎊

@gaurav-cointab
Copy link
Copy Markdown

gaurav-cointab commented Mar 9, 2025

Hi @nwebson @gregpr07 @neo773 ,

I got it to working for websites who are doing other checks also. What needs to be done is all this.

  1. rebrowser needs to be used in the current format which is raised in this PR.
  2. it needs to have few extensions, like I have been working with pdf viewrer and ublock-lite which was present in some other PR. I modified it to work with as many extensions as needed
  3. we need to specify the user data directory ( which will happen if you are going to specify extensions )
  4. and it is working flawlessly, for chrome_instance_path, and chromium, not checked with cdp, and wss, no idea how to make it work in them also

now recaptcha is also working, which means since that was not an issue of rebrowser, or anything else, we can move ahead and merge this PR, since I was the one who raised concerns about this PR. This PR works, for many many websites. but some where it fails, the above steps once taken will work.

@pirate
Copy link
Copy Markdown
Contributor

pirate commented Apr 4, 2025

Sweet thanks for those last fixes! 🎉 I think it's ready.

I'm going to target merging this into the upcoming 0.1.42 version. We are currently just finishing testing on 0.1.41, and after that is fully released I will merge this into main.

@pirate pirate added this to the v0.1.42 milestone Apr 4, 2025

browser = await browser_class.launch(
headless=self.config.headless,
channel='chrome',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why channel is chrome? the browser_class could be firefox or webkit.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Patchright only patches CHROMIUM based browsers. Firefox and Webkit are not supported.

If you want to use Firefox or Webkit i would recommend you to use normal Playwright.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW anti-bot-detection is more important to us right now than firefox and safari support. The google haters still have other options with chromium-based browsers like Brave, Edge, Ungoogled Chromium, Opera, etc.

@Vinyzu does patchright still allow connecting to firefox/Safari normally without any patches? It would be annoying to have to import a different library everywhere depending on which browser we connect to.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory yes it does - but files/functions used commonly across browsers are also modified (to be used with chrome), so i would expect (major) bugs.

@pirate
Copy link
Copy Markdown
Contributor

pirate commented Apr 6, 2025

@Vinyzu do you have any plans to add something like your shadow-dom piercing code for cross-origin iframes? We'd love to be able to select elements using xpaths that traverse cross-origin iframes from python, without having to disable web security or select inside the frames manually.

@Vinyzu
Copy link
Copy Markdown

Vinyzu commented Apr 6, 2025

Im not sure if i understand you correctly,
Do you mean piercing iframes so you dont have to use FrameLocators?
If so - no i dont. Patchright is supposed to be as close to the Playwright API/Functionality as possible, so piercing iframes wont be a feature i would plan to add.

But i guess it should be to hard to just iterate through the sites frames and run a Locator on each of them, so i also dont really see the problem there...

@pirate
Copy link
Copy Markdown
Contributor

pirate commented Apr 6, 2025

Basically the problem we face now is that pages can have multiple (even nested) cross-origin frames with identical urls but different contents. The only way to directly select a specific sub-frame in playwright is by frame URL or name, but those are not guaranteed to be unique, and you cant figure out what the containing iframe elem is from a given frame object easily (AFAIK), so there's no way to build an xpath that can select within a specific nested frame.

We'd love to be able to use xpaths like /div/section[1]/iframe[3]/body/main/iframe[2]/body/main/div to deterministically select all the way into a specific nested frame directly from the top frame, without having to iterate through all frames and try the locator blindly on each. This is sort of possible currently if you --disable-web-securty and implement your own xpath resolver & generator in buildDOMTree.js that looks inside frame.contentDocument whenever it encounters a frame, but it's super dangerous to run that in production if the user has real cookies loaded. If we could do it safely at the python level without having to disable browser security that would be amazing. No modification of existing playwright methods is needed, a new new method like page.super_locate(...) could be added that can take xpaths and drill through cross-origin iframes.

@Vinyzu
Copy link
Copy Markdown

Vinyzu commented Apr 6, 2025

I understand your point/problem and it would likely be possible to implement something like this, but it does go against the goal patchright is trying to achieve.

Patchrights/My most important goal is to provide a stealthy playwright version, not to be the "Dream Automation Library" that implements every useful functionality.
Furthermore my time is quite limited, i support many other OSS Projects that would need some attention right now, and i planned on working on them from now on. A functionality like you described is not easy or quick to implement.

TL;DR: It would be possible, but hard and i dont have the time nor the motivation nor the ambition to implement this. But after all, contributions to my projects are always welcome...

@pirate
Copy link
Copy Markdown
Contributor

pirate commented Apr 7, 2025

Ok cool makes sense, no worries I was just wondering if it was already something you had considered in the past.

@mosaictheory-jt
Copy link
Copy Markdown
Contributor

mosaictheory-jt commented Apr 13, 2025

Theres an error when running headless - pushing minor fix here.

@pirate
Copy link
Copy Markdown
Contributor

pirate commented Apr 21, 2025

Alright lets do it! It's time. If everyone can help test this that would be awesome!

Thank you to everyone who's been involved with this work so far!

I'm aiming to ship this in the next full release 1.42.0

@pirate pirate merged commit 1ac623f into browser-use:main Apr 21, 2025
1 check passed
@trevorstr
Copy link
Copy Markdown

I can help test this against the issue I reported here: #1287

I'm guessing I can just install it with this command?

uv remove browser-use
uv add git+https://github.com/browser-use/browser-use --branch main

@trevorstr
Copy link
Copy Markdown

I tried to install it from the main branch and got this error:

Resolved 241 packages in 1.56s
   Updating https://github.com/browser-use/browser-use (main)
  × Failed to download and build `browser-use @ git+https://github.com/browser-use/browser-use@ae3bfadc0b7d58c928393e0df58d43da3b8fe5a0`
  ├─▶ Git operation failed
  ╰─▶ process didn't exit successfully: `/opt/homebrew/bin/git reset --hard ae3bfadc0b7d58c928393e0df58d43da3b8fe5a0` (exit status: 128)
      --- stderr
      Downloading static/kayak.gif (3.5 MB)
      Error downloading object: static/kayak.gif (ab32dca): Smudge error: Error downloading static/kayak.gif (ab32dca74aff21e80c1d05457e61204216fd50109c6c8bdd158de4231c3cdaf0): error transferring
      "ab32dca74aff21e80c1d05457e61204216fd50109c6c8bdd158de4231c3cdaf0": [0] remote missing object ab32dca74aff21e80c1d05457e61204216fd50109c6c8bdd158de4231c3cdaf0

      Errors logged to '/Users/trevor.sullivan/.cache/uv/git-v0/checkouts/6fb0cbb493015030/ae3bfad/.git/lfs/logs/20250421T112508.868766.log'.
      Use `git lfs logs last` to view the log.
      error: external filter 'git-lfs filter-process' failed
      fatal: static/kayak.gif: smudge filter lfs failed

  help: If you want to add the package regardless of the failed resolution, provide the `--frozen` flag to skip locking and syncing.

@stevelizcano
Copy link
Copy Markdown

I tried to install it from the main branch and got this error:

Resolved 241 packages in 1.56s
   Updating https://github.com/browser-use/browser-use (main)
  × Failed to download and build `browser-use @ git+https://github.com/browser-use/browser-use@ae3bfadc0b7d58c928393e0df58d43da3b8fe5a0`
  ├─▶ Git operation failed
  ╰─▶ process didn't exit successfully: `/opt/homebrew/bin/git reset --hard ae3bfadc0b7d58c928393e0df58d43da3b8fe5a0` (exit status: 128)
      --- stderr
      Downloading static/kayak.gif (3.5 MB)
      Error downloading object: static/kayak.gif (ab32dca): Smudge error: Error downloading static/kayak.gif (ab32dca74aff21e80c1d05457e61204216fd50109c6c8bdd158de4231c3cdaf0): error transferring
      "ab32dca74aff21e80c1d05457e61204216fd50109c6c8bdd158de4231c3cdaf0": [0] remote missing object ab32dca74aff21e80c1d05457e61204216fd50109c6c8bdd158de4231c3cdaf0

      Errors logged to '/Users/trevor.sullivan/.cache/uv/git-v0/checkouts/6fb0cbb493015030/ae3bfad/.git/lfs/logs/20250421T112508.868766.log'.
      Use `git lfs logs last` to view the log.
      error: external filter 'git-lfs filter-process' failed
      fatal: static/kayak.gif: smudge filter lfs failed

  help: If you want to add the package regardless of the failed resolution, provide the `--frozen` flag to skip locking and syncing.

This is what you want to do, that works for me:

GIT_LFS_SKIP_SMUDGE=1 uv add git+https://github.com/browser-use/browser-use.git@main --upgrade

@evgeny-kim
Copy link
Copy Markdown
Contributor

evgeny-kim commented Apr 29, 2025

Chrome is started with a custom set of args, which is different from vanilla patchright.

Some of these args are specifically mentioned in patchright command flags leaks: --disable-popup-blocking, --disable-component-update, and --disable-default-apps

Removing all of these flags gave me a +16 trust score bump from CreepJS and what's more important, cf captcha bypass.

Here's a list of additional args:

--allow-legacy-extension-manifests
--allow-pre-commit-input
--ash-no-nudges
--block-new-web-contents
--deny-permission-prompts
--disable-client-side-phishing-detection
--disable-component-update
--disable-cookie-encryption
--disable-datasaver-prompt
--disable-default-apps
--disable-desktop-notifications
--disable-domain-reliability
--disable-external-intent-requests
--disable-features=AutofillServerCommunication,BackForwardCache,CalculateNativeWinOcclusion,CrashReporting,HeavyAdPrivacyMitigations,InfiniteSessionRestore,InterestFeedContentSuggestions,OptimizationHints,OverscrollHistoryNavigation,PrivacySandboxSettings4,ProcessPerSiteUpToMainFrameThreshold
--disable-focus-on-load
--disable-infobars
--disable-notifications
--disable-popup-blocking
--disable-print-preview
--disable-session-crashed-bubble
--disable-speech-api
--disable-speech-synthesis-api
--disable-sync
--disable-window-activation
--enable-experimental-extension-apis
--enable-features=NetworkService
--enable-logging=stderr
--generate-pdf-document-outline
--hide-crash-restore-bubble
--install-autogenerated-theme=0,0,0
--log-level=2
--metrics-recording-only
--no-pings
--noerrdialogs
--safebrowsing-disable-auto-update
--silent-debugger-extension-api
--simulate-outdated-no-au="Tue, 31 Dec 2099 23:59:59 GMT""
--suppress-message-center-popups

@pirate
Copy link
Copy Markdown
Contributor

pirate commented Apr 29, 2025

ah you're right, we forgot to remove the custom args after merging patchright, good catch!

@pirate
Copy link
Copy Markdown
Contributor

pirate commented May 2, 2025

should be fixed now @evgeny-kim #1550

dharam1291 pushed a commit to dharam1291/browser-use that referenced this pull request May 3, 2025
@imamousenotacat
Copy link
Copy Markdown

🕵️ Just my two cents here, I've been working in this niche for 5+ years. There is no bulletproof antidetect solution; it's an infinite cat and mouse game, and once some solution becomes open source, it stops working as all antidetect companies follow all these public repos and even some private channels/repos. So, you can make it less detectable to pass some very obvious tests, but it doesn't make sense to try to pass reCAPTCHA and all other commercial solutions. It's just not going to work reliably in the long run.

Yeap, I've just read this comment full of wisdom indeed, but we have to keep trying. Long live to the mice 😜

Just my own mouse two cents here: this is doing for playwright and browser-use the equivalent to something like puppeteer-real-browser

nopecha_cloudflare.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid fingerprint bot detection (patched client)