Disable cURL encoding/compression handling when a bad Content-Encoding header is received#962
Conversation
Merge upstream @mmeier86 * simplepie#960 * simplepie#962
…LOPT_ENCODING (#8376) * Replace deprecated CURLOPT_ENCODING The CURLOPT_ENCODING setting has been deprecated in favor of CURLOPT_ACCEPT_ENCODING. Signed-off-by: Michael Meier <mmeier1986@gmail.com> * Sync with our SimplePie fork PR FreshRSS/simplepie#67 simplepie/simplepie#960 simplepie/simplepie#962 * Our SimplePie PR merged --------- Signed-off-by: Michael Meier <mmeier1986@gmail.com> Co-authored-by: Alexandre Alapetite <alexandre@alapetite.fr>
|
Thanks. Looks like this was introduced in f5d76cd to fix specific bug in another site but the feed mentioned in the issue no longer works so hard to tell if it is needed nowadays.
It would also be helpful to include a test. Something like the following (with #966): --- a/tests/Integration/SimplePieTest.php
+++ b/tests/Integration/SimplePieTest.php
@@ -553,4 +553,37 @@ HTML
self::assertLessThanOrEqual(1, count($feed->get_links('self') ?? []), 'Link rel=self should not be promoted from HTML when it is already present in headers');
self::assertSame($bogoUrl, $feed->get_link(0, 'bogo'), 'Link rel=bogo does not match');
}
+
+ public function testSimplePieIgnoresBadContentEncodingHeader(): void
+ {
+ $server = new MockWebServer();
+ $server->start();
+
+ $filepath = dirname(__FILE__, 2) . '/data/feed_rss-2.0.xml';
+ $body = file_get_contents($filepath);
+ \assert($body !== false); // For PHPStan
+
+ $url = $server->setResponseOfPath(
+ '/bad-content-encoding',
+ new MockWebServerResponse($body, [
+ 'content-type: application/rss+xml',
+ 'content-encoding: aws-chunked',
+ ], 200)
+ );
+
+ $feed = new SimplePie();
+ $feed->enable_cache(false);
+
+ $feed->set_feed_url($url);
+
+ // For some reason, without this, Sniffer thinks we have text/plain on error.
+ $feed->force_feed(true);
+
+ $return = $feed->init();
+ $server->stop();
+
+ $error = implode("\n", (array) ($feed->error() ?? '')); // For PHPStan
+ self::assertTrue($return, 'Failed fetching feed: ' . $error);
+ self::assertSame(200, $feed->status_code());
+ }
} |
|
I'm not exactly fluent in PHP, but I will give it a good try. |
6340688 to
74c5496
Compare
|
It looks like the fix doesn't work for older PHP versions. I'm trying to figure out what the issue is there. I've also added an assert to make sure that cURL is actually available in the test, as I initially didn't have cURL support in my local PHP setup and couldn't get the test to fail at all, as it used the non-cURL code path without me realising. |
|
Looks like setting NULL does not work on PHP < 8.1 because of php/php-src#11433, which leads to curl returning: |
|
I managed to work around it by creating a new curl handle from scratch but PHPStan does not like that on older versions of PHP where |
|
I think I see what you're getting at with that commit, let me see whether I can stretch my non-existent PHP skills far enough to make it work in this PR. |
|
I managed to get it working using conditional PHPStan config, you can just cherry pick both commits if you want. |
On PHP < 8.1.21, we will not be able to disable `CURLOPT_ACCEPT_ENCODING` by setting it to `NULL`: php/php-src#11433 `curl_setopt` with such values will be a no-op so the secound `curl_exec` will still fail with `CURLE_BAD_CONTENT_ENCODING` and return an empty body. Even worse, it will still report 200 in `CURLINFO_HTTP_CODE` so `FileClient` will not throw an exception because the `status_code` property is not 0. Let’s extract the code so that we can create a fresh `CurlHandle` and later make the `CURLOPT_ACCEPT_ENCODING` conditional. Unfortunately, since PHP 8.0 changed `curl_init` to return `CurlHandle` instead of `resource`, and PHPStan has no simple way to make the return type depend on PHP version, we have to resort to an ugly hack using `typeAliases`.
Setting the CURLOPT_ENCODING to 'none' is not officially supported by cURL. Setting it in case of cURL receiving an invalid Content-Encoding header from the server and re-trying would not fix the issue. The current implementation already sets cURL up to send an Accept-Encoding header with all supported encodings a couple of lines above this change. If the fetch still returns a BAD_CONTENT_ENCODING error, the server already ignored the Accept-Encoding headers once. This change, instead of sending 'none' in a re-try, disables cURL's content encoding handling (in practice, handling compression). Co-authored-by: Jan Tojnar <jtojnar@gmail.com>
74c5496 to
98fb5c3
Compare
|
That seems to have done the trick. |
On PHP < 8.1.21, we will not be able to disable `CURLOPT_ACCEPT_ENCODING` by setting it to `NULL`: php/php-src#11433 `curl_setopt` with such values will be a no-op so the secound `curl_exec` will still fail with `CURLE_BAD_CONTENT_ENCODING` and return an empty body. Even worse, it will still report 200 in `CURLINFO_HTTP_CODE` so `FileClient` will not throw an exception because the `status_code` property is not 0. Let’s extract the code so that we can create a fresh `CurlHandle` and later make the `CURLOPT_ACCEPT_ENCODING` conditional. Unfortunately, since PHP 8.0 changed `curl_init` to return `CurlHandle` instead of `resource`, and PHPStan has no simple way to make the return type depend on PHP version, we have to resort to an ugly hack using `typeAliases`.
This PR relates to #961 and FreshRSS/FreshRSS#8374.
When cURL has already returned a bad encoding error, this is a good indication that the server ignores
Accept-Encodingheaders, as theCURLOPT_ENCODINGoption is already set to''for the first request. This results in cURL sending anAccept-Encodingheader with all supported encodings (compression algorithms, in practice) already in the first request. And the fact that this request results in aCURLE_BAD_CONTENT_ENCODINGerror indicates that the server ignored that header already. So trying again, but this time withCURLOPT_ENCODING = 'none'seems unlikely to result in success?Instead, my proposal here is to set
CURLOPT_ENCODINGtonull, which tells cURL to disable compression handling altogether and to ignore the badContent-Encodingheader send by the server.I was able to confirm that this works in tests with FreshRSS against a test setup where the server sends a valid
index.xmlwith an RSS feed, but with theContent-Encodingheader set toaws-chunked. WithCURLOPT_ENCODING = 'none', the fetch fails and the body is empty, leading to FreshRSS not being able to add the feed. But withCURLOPT_ENCODING = nullset, the fetch succeeds and the feed can be added.