Skip to content

Update libcurl-impersonate usage#4535

Merged
dvikan merged 1 commit intoRSS-Bridge:masterfrom
wrobelda:master
May 17, 2025
Merged

Update libcurl-impersonate usage#4535
dvikan merged 1 commit intoRSS-Bridge:masterfrom
wrobelda:master

Conversation

@wrobelda
Copy link
Contributor

@wrobelda wrobelda commented May 2, 2025

Overhaul the usage of libcurl-impersonate, which was not, in fact, being used properly, as the code was overriding the headers set by it to prevent detection.

  • update the libcurl-impersonate to an actively managed lexiforest fork
  • impersonate Chrome 131
  • move the defaultHttpHeaders to http.php, where it belongs
  • only set defaultHttpHeaders if curl-impersonate is not detected
  • make useragent ini setting optional and disabled by default
  • add necessary documentation updates

@wrobelda wrobelda force-pushed the master branch 5 times, most recently from 1f447cd to 2f5de87 Compare May 2, 2025 00:34
@wrobelda wrobelda marked this pull request as draft May 2, 2025 08:16
@wrobelda wrobelda marked this pull request as ready for review May 2, 2025 12:18
@dvikan
Copy link
Contributor

dvikan commented May 9, 2025

i was hoping we could keep CurlHttpClient cleanly separated by treating it as a thirdparty http client

this change removes the http user agent for non-curl-impersonator usage which might break some fetching. (default ua is Curl v1.2.3 or simply the empty string i dont remember)

this change breaks the usage of custom http headers for non-curl-impersonate uses (no it doesnt, i read code wrong)

@wrobelda wrobelda force-pushed the master branch 2 times, most recently from 6273e0f to 462dbf8 Compare May 17, 2025 16:06
@wrobelda
Copy link
Contributor Author

wrobelda commented May 17, 2025

i was hoping we could keep CurlHttpClient cleanly separated by treating it as a thirdparty http client

I can see that, but you could also make a point that getContents() in contents.php could be kept clean of the actual underlying anti-bot measures, which is the purpose of the default headers you had there already.

Also, I would imagine any alternative client would need to incorporate some of their own anti-bot measures, possibly with another set of headers, so if we consider the current set of default headers as curl-specific, I think it makes sense to keep them all sealed off in CurlHttpClient.

It would be hard to fix the libcurl-impersonate usage without the refactoring I did.

this change removes the http user agent for non-curl-impersonator usage which might break some fetching. (default ua is Curl v1.2.3 or simply the empty string i dont remember)

Indeed, fixed it.

libcurl-impersonate was not being used properly, as the code was
overriding the headers set by it to prevent detection.

- update the libcurl-impersonate to an actively managed lexiforest
  fork
- impersonate Chrome 131
- move the defaultHttpHeaders to http.php, where it belongs
- only set defaultHttpHeaders if curl-impersonate is not detected
- make useragent ini setting optional and disabled by default
- add necessary documentation updates
@dvikan dvikan merged commit b7c04f8 into RSS-Bridge:master May 17, 2025
3 of 9 checks passed
@dvikan
Copy link
Contributor

dvikan commented May 17, 2025

thanks for helping to keep this up to date.

@wrobelda
Copy link
Contributor Author

wrobelda commented Jun 4, 2025

@dvikan I found by accident that https://github.com/RSS-Bridge/rss-bridge/blob/master/bridges/FB2Bridge.php reads the useragent value from configuration and uses that to issue a file_get_contents() call . I believe the bridge might be failing now because the useragent config value is now empty by default.

I don't know why file_get_contents is used here. From what I gathered, that function does not use curl internally, IMHO it should be rewritten to use CurlHttpClient.

@dvikan
Copy link
Contributor

dvikan commented Jun 4, 2025

yes file_get_contents should not be used. Im considering deleting this bridge. But im unsure if it works or is in use.

dont worry about it, I will check whether the bridge works or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants