Skip to content

Prevent cache polution#675

Merged
mblaney merged 1 commit intosimplepie:masterfrom
FreshRSS:prevent-cache-polution
Apr 12, 2021
Merged

Prevent cache polution#675
mblaney merged 1 commit intosimplepie:masterfrom
FreshRSS:prevent-cache-polution

Conversation

@Alkarex
Copy link
Contributor

@Alkarex Alkarex commented Mar 28, 2021

Retrieving a feed with distinct cURL parameters such as user-agent might give different responses, which should not share the same cache.

Follow-up and generalisation of #643

The code to get the cache filename is extracted in its own new function SimplePie::get_cache_filename($url) so that clients can easily get this information as well.

Downstream PR: FreshRSS/FreshRSS#3502

Retrieving a feed with distinct cURL parameters such as user-agent might give different responses, which should not share the same cache.

Follow-up and generalisation of simplepie#643

The code to get the cache filename is extracted in its own new fuction `SimplePie::get_cache_filename($url)` so that clients can easily get this information as well.

Downstream PR: FreshRSS/FreshRSS#3502
// Append custom parameters to the URL to avoid cache pollution in case of multiple calls with different parameters.
$url .= $this->force_feed ? '#force_feed' : '';
$options = array();
if ($this->timeout != 10)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests are to keep the best possible compatibility with existing cache filenames: if not using custom cURL options, the cache filenames should be unchanged

if (!empty($options))
{
ksort($options);
$url .= '#' . urlencode(var_export($options, true));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modified URL is still a valid URL, in case a custom cache_name_function() might have such an expectation

@mblaney
Copy link
Member

mblaney commented Apr 11, 2021

could this be optional? I don't feel that changing your agent necessarily means you want to cache both versions, but this is enforcing it. I think you will also need to move force_feed to your options array, because you could end up with two # segments in the url.

@Alkarex
Copy link
Contributor Author

Alkarex commented Apr 11, 2021

  1. Yes, we can make an option if you want. I am not sure what use-case would be addressed by such an option though. In any case, I think it would be more standard and less error prone to not share cache by default. As standard, HTTP caches are not supposed to be shared with different query parameters, and should even obey things such as Vary in the response. If you confirm that you rather want an option, and if yes what the default should be, I will edit accordingly.

  2. Extra hashes are legal in an URL fragment, e.g. https://example.net/#one#two#three . See e.g. https://tools.ietf.org/html/rfc3986#appendix-B , any character is allowed after the first hash sign. The corresponding PHP function parses it just fine:

<?php
$url = 'https://example.net/#one#two#three';
print_r(parse_url($url));
Array
(
    [scheme] => https
    [host] => example.net
    [path] => /
    [fragment] => one#two#three
)

@mblaney
Copy link
Member

mblaney commented Apr 12, 2021

ok fair enough, I was thinking caches might grow unexpectedly from this change but that's really only a problem if you're changing the agent options a lot. thanks for the update!

@mblaney mblaney merged commit 950bd46 into simplepie:master Apr 12, 2021
@Alkarex Alkarex deleted the prevent-cache-polution branch April 12, 2021 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants