Add anti bot detection via patchright#857
Conversation
|
@gregpr07 this is ready to be reviewed |
This comment was marked as outdated.
This comment was marked as outdated.
|
Will review this! Thank you so much for the contribution :) |
|
Why do you remove the security arguments? This makes the cross site iframes break... import os
import sys
from langchain_anthropic import ChatAnthropic
sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import asyncio
from browser_use import Agent
llm = ChatAnthropic(model_name='claude-3-7-sonnet-20250219', temperature=0.0, timeout=30, stop=None)
task = """
go to https://csreis.github.io/tests/cross-site-iframe.html
click on Go cross-site (complex page)
and click on builders button
return what's new
"""
# task = 'go to https://abrahamjuliot.github.io/creepjs/ and tell me what trust score is'
agent = Agent(
task=task,
llm=llm,
)
async def main():
await agent.run(max_steps=10)
asyncio.run(main())for example - running this example without security doesn't work. Could you please fix this and I'll merge it. |
|
Detection site does not load with web security disabled it throws Given that this is a edge case and iframes are quite rare these days maybe we could default to setting Let me know your thoughts |
|
@neo773 the anti bot detection is not working for me. I am trying to login into grubhub website, and it is saying that it feels that this is a bot. something else must be required for proper anti bot detection. |
|
@neo773 no actually cross site iFrames are really really important use case. Websites like Salesforce and other “legacy” providers that are extremely valuable need them. I can’t merge this - we absolutely need cross site iframe support! @gaurav-cointab i guess not all websites will be fixed, but we will be one step closer with rebrowser (if it works) |
|
Pushed a new commit that adds back the extra args, since we're not able to verify the actual score with this enabled maybe you can try with some real world workflows which it failed previously and report back? |
|
Why does it not work? Pinged rebrowser founder |
Rebrowser fixes a few different things but the most important one is P.S. @neo773 actually, iframes are everywhere :) |
|
@neo773 it worked this time.. It was my bad. I did not pick the line which changed the user_agent. sorry for taking misinformed steps, and mentioning here that it is not working. --update after full day of tinkering what steps should I take to fix these. I had to add the "args" that you had removed from the browser/browser.py file. because the link mentioned to add the Now these 3 are remaining, please let me know what to do for these 3. I believe that fixing the userAgentNavigator should fix things. @neo773 @nwebson |
|
I ran the agent in the chrome instance path, I supplied the chrome.exe path, with guest as the profile thinking that the userAgent issue will be fixed. it did. when I ran the agent in chrome.exe now the bot detection link is only highlighting And still the website is thinking this to be a bot. is there a way to fix this above highlighted item. I am already using rebrowser-playwright@1.49.1, now what else is remaining? |
|
/tip $50 @gaurav-cointab @neo773 are you going to fix it according to @gaurav-cointab and @nwebson? |
|
@gregpr07 Using
|
|
I understand that this is a start for many websites, but this is not working for me. I also tried #805 that also did not work for me. Now what came in my testing, I am going to it out. The https://bot-detector.rebrowser.net/ works for testing for control via rebrowser only, because it now shows all green for all the sessions that are working for my use case or not working for my use case. Otherwise it does not seem to cover all the things that are done for bot detection (most of which even I dont know are being used to detect for bots). Then I came to know about this link, which is easy to configure and use to detect bots. This is a test link, to showcase how it might work. https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php Now I thought of testing this in chromium with old playwright, rebrowser playwright, nothing works. Then I opened the guest mode in chrome from browser-use (with rebrowser and without rebrowser), and it failed as shown below. And this matched my own testing of open the grubhub links in browser use, where it gave false result. So there is something which is not working, with rebrowser. This might be an issue with playwright, which the websites are able to detect, and we are not. It might be that the websites are able to detect that the browser is being opened in debug mode, with debug url, or something else. I dont know what to make of it, I leave this to the experts to figure it out. |
|
🕵️ Just my two cents here, I've been working in this niche for 5+ years. There is no bulletproof antidetect solution; it's an infinite cat and mouse game, and once some solution becomes open source, it stops working as all antidetect companies follow all these public repos and even some private channels/repos. |
|
@gaurav-cointab: You just got a $50 tip! 👉 Complete your Algora onboarding to collect your payment. |
|
🎉🎈 @gaurav-cointab has been awarded $50! 🎈🎊 |
|
Hi @nwebson @gregpr07 @neo773 , I got it to working for websites who are doing other checks also. What needs to be done is all this.
now recaptcha is also working, which means since that was not an issue of rebrowser, or anything else, we can move ahead and merge this PR, since I was the one who raised concerns about this PR. This PR works, for many many websites. but some where it fails, the above steps once taken will work. |
|
Sweet thanks for those last fixes! 🎉 I think it's ready. I'm going to target merging this into the upcoming |
|
|
||
| browser = await browser_class.launch( | ||
| headless=self.config.headless, | ||
| channel='chrome', |
There was a problem hiding this comment.
why channel is chrome? the browser_class could be firefox or webkit.
There was a problem hiding this comment.
Patchright only patches CHROMIUM based browsers. Firefox and Webkit are not supported.
If you want to use Firefox or Webkit i would recommend you to use normal Playwright.
There was a problem hiding this comment.
FWIW anti-bot-detection is more important to us right now than firefox and safari support. The google haters still have other options with chromium-based browsers like Brave, Edge, Ungoogled Chromium, Opera, etc.
@Vinyzu does patchright still allow connecting to firefox/Safari normally without any patches? It would be annoying to have to import a different library everywhere depending on which browser we connect to.
There was a problem hiding this comment.
In theory yes it does - but files/functions used commonly across browsers are also modified (to be used with chrome), so i would expect (major) bugs.
|
@Vinyzu do you have any plans to add something like your shadow-dom piercing code for cross-origin iframes? We'd love to be able to select elements using xpaths that traverse cross-origin iframes from python, without having to disable web security or select inside the frames manually. |
|
Im not sure if i understand you correctly, But i guess it should be to hard to just iterate through the sites frames and run a Locator on each of them, so i also dont really see the problem there... |
|
Basically the problem we face now is that pages can have multiple (even nested) cross-origin frames with identical urls but different contents. The only way to directly select a specific sub-frame in playwright is by frame URL or name, but those are not guaranteed to be unique, and you cant figure out what the containing We'd love to be able to use xpaths like |
|
I understand your point/problem and it would likely be possible to implement something like this, but it does go against the goal patchright is trying to achieve. Patchrights/My most important goal is to provide a stealthy playwright version, not to be the "Dream Automation Library" that implements every useful functionality. TL;DR: It would be possible, but hard and i dont have the time nor the motivation nor the ambition to implement this. But after all, contributions to my projects are always welcome... |
|
Ok cool makes sense, no worries I was just wondering if it was already something you had considered in the past. |
|
Theres an error when running headless - pushing minor fix here. |
Co-authored-by: JT <162653111+mosaictheory-jt@users.noreply.github.com>
|
Alright lets do it! It's time. If everyone can help test this that would be awesome! Thank you to everyone who's been involved with this work so far! I'm aiming to ship this in the next full release 1.42.0 |
|
I can help test this against the issue I reported here: #1287 I'm guessing I can just install it with this command? |
|
I tried to install it from the main branch and got this error: |
This is what you want to do, that works for me: GIT_LFS_SKIP_SMUDGE=1 uv add git+https://github.com/browser-use/browser-use.git@main --upgrade |
|
Chrome is started with a custom set of args, which is different from vanilla patchright. Some of these args are specifically mentioned in patchright command flags leaks: --disable-popup-blocking, --disable-component-update, and --disable-default-apps Removing all of these flags gave me a +16 trust score bump from CreepJS and what's more important, cf captcha bypass. Here's a list of additional args:
|
|
ah you're right, we forgot to remove the custom args after merging patchright, good catch! |
|
should be fixed now @evgeny-kim #1550 |
Yeap, I've just read this comment full of wisdom indeed, but we have to keep trying. Long live to the mice 😜 Just my own mouse two cents here: this is doing for playwright and browser-use the equivalent to something like puppeteer-real-browser |






This PR adds anti bot detection via
rebrowserpatchrightAs per their documentation they don't recommend applying patches manually as it's error prone and instead recommend using their drop-in package
This achieves a trust score of 69, previous score was 0
a.mp4
/claim #852
/closes #852