Split SimplePie\File class into HTTP client and response#774
Split SimplePie\File class into HTTP client and response#774mblaney merged 27 commits intosimplepie:masterfrom
Conversation
|
looks good @Art4, before I merge can someone confirm that the tests are running? I'm seeing |
|
@mblaney Github seems to have some problems at the moment. Github Actions is experiencing degraded performance. See https://www.githubstatus.com/ |
|
I will try to review it later today. |
|
Thank you, I appreciate it. I just found another error and stabilized it with a test. |
| * @license http://www.opensource.org/licenses/bsd-license.php BSD License | ||
| */ | ||
|
|
||
| namespace SimplePie\Exception; |
There was a problem hiding this comment.
Any thoughts on having it under SimplePie\HTTP? That way we can remove just a single namespace later.
There was a problem hiding this comment.
I have no strong opinions about the place for exceptions. We will potentially get more Exceptions while fixing #755.
|
@mblaney Is there any chance that this will be merged soon? |
|
I am still reviewing this. Since my head is too small to look at everything at once, I am extracting some of the unrelated cleanups into separate pull requests and rebased and squashed the fixes into the original commits. I have also added some notes about the changes into the commit messages: master...jtojnar:simplepie:split-file-into-client-and-response I have also switched to SPDX syntax since the long comments make it difficult to fit multiple files on the screen at the same time. |
jtojnar
left a comment
There was a problem hiding this comment.
Sorry it took so long, I have finally managed to review this in detail it deserves.
You did a great job with this and convinced me that this approach is a reasonable interim solution.
While reviewing, I did some rebase surgery to make it easier to handle with my attention span, and while at it fixed some issues. You can see that in #815.
| * While header names are not case-sensitive, get_headers() will preserve the | ||
| * exact case in which headers were originally specified. |
There was a problem hiding this comment.
This is not true for the FileClient. I do not feel like it is important for now so I have removed it from #815.
src/HTTP/Response.php
Outdated
| declare(strict_types=1); | ||
| /** | ||
| * SimplePie | ||
| * |
There was a problem hiding this comment.
I have credited PHP Framework Interoperability Group in #815.
| if (strpos($header, ',') === false) { | ||
| $header = [$header]; | ||
| } else { | ||
| $header_lines = explode(',', $header); |
There was a problem hiding this comment.
I worry that this might disrupt some headers. Instead, in #815 I have decided to add support for HTTP\Parser to produce PSR-7 compatible headers, use $parsed_headers internally in File as you did initially, and recompute $parsed_headers when $headers change (which should not really happen outside of tests but better safe than sorry).
| * @see http://tools.ietf.org/html/rfc3986#section-4.1 | ||
| * @return string the original (first requested) URL before following redirects (expect 301) | ||
| */ | ||
| public function get_permanent_uri(): string |
There was a problem hiding this comment.
I am still not wholly convinced the ResponseInterface-like object should contain RequestInterface-specific properties like URI but I guess it is fine for now.
| } | ||
|
|
||
| if ($this->file->method & \SimplePie\SimplePie::FILE_SOURCE_REMOTE) { | ||
| if (preg_match('/^http(s)?:\/\//i', $this->file->get_requested_uri())) { |
There was a problem hiding this comment.
I have extracted the regex to Misc::is_remote_uri() in #815.
| /** | ||
| * Get a HTTP client | ||
| */ | ||
| private function get_http_client(): Client |
There was a problem hiding this comment.
I wonder if it would be worth it to add support for singletons maintained by Registry. Or maybe we could just create ClientFactory and get it in constructor.
There was a problem hiding this comment.
It is not worth it. This method is private and will be removed in v2 again. Moving this to Registry will lead to more BC issues.
| $orig_timeout = $this->timeout; | ||
| $this->timeout = 1; | ||
| $file = $this->get_http_client()->request(Client::METHOD_GET, $this->feed_url, $headers); |
There was a problem hiding this comment.
This is weird side-channel signalling. It might be nicer to pass $this->timeout/10 (or 1 for whatever reason you have decided to change it) as an argument to the get_http_client() factory. Went with that in #815
There was a problem hiding this comment.
This request will check if the cache has been changed. The timeout was to 1 sec. I assume that the intention was to not wait to long for a response for this check.
It then was changed to $this->timeout/10 in fe7e333, but this could lead to a float instead of int, if e.g. $this->timeout is set to 5. That's why I changed it back to 1 sec. Alternative we could use ceil() but this will be in most cases 1 or (if $this->timeout = 11-20) 2. So I decided to use 1 directly.
About the side-channel signalling I decided to use this instead of a new argument because the timeout is moved to PSR-18 HTTP client and $this->timeout will be deprecated.
There was a problem hiding this comment.
Thanks for the context. Peeking further into history, it looks like it was 1 since the introduction of timeouts along with SimplePie_File in 5bf1814.
But I think it is still useful to keep it proportional to the master timeout to allow people with slow connection to increase it. Even if we assume majority of cached resources will not get outdated, waiting for server to reply HTTP 304 Not Modified can still take time (domain name resolution can be particularly slow). And if it turns out that the cached version is outdated, the web server will try to send an updated version. Having too low timeout can mean the request will be terminated, only to be started again with the original timeout.
So I do not actually think reducing the timeout makes much sense at all. The only explanation I can come up with is that it prevents essentially doubling the timeout when fetching a cached resource from an unresponsive server without force_cache_fallback due to re-running the failed request.
I have split the change to a separate commit in #815, including the explanation.
| try { | ||
| $response = $this->get_http_client()->request(Client::METHOD_GET, $href, $headers); | ||
| } catch (HttpException $th) { | ||
| continue; |
There was a problem hiding this comment.
This terminates the loop early, skipping $done[] = $href;. Fortunately, the success case is short so should be fine to move it inside the try block. Fixed in #815
| try { | ||
| $response = $this->get_http_client()->request(Client::METHOD_GET, $value, $headers); | ||
| } catch (HttpException $th) { | ||
| continue; |
There was a problem hiding this comment.
Similarly, this skips unset($array[$key]);. Fixed in #815
| * @see http://tools.ietf.org/html/rfc3986#section-4.1 | ||
| * @return string the final requested url after following redirects | ||
| */ | ||
| public function get_requested_uri(): string; |
There was a problem hiding this comment.
Also in #815, I renamed this to get_final_requested_uri so it is clearer what this does.
|
I propose we go with #815 instead. This change set is complex enough to warrant not squashing everything into a single megacommit, where refactorings like the switch to |
sorry @Art4 you're right, this PR should've been dealt with before merging smaller patches. I'm happy with it so didn't need to comment, but was waiting for approval from @jtojnar before merging. I'll wait for the two of you to decide how to continue before merging anything else. |
@mblaney I just wanted to kindly let you know that I have already shared all my thoughts and comments in this and other PRs/issues. I believe I have covered everything that needs to be said. I think it's now time for you to make a decision. Once you have made a decision, I can move forward with the next steps in #731. Thank you so much for your time and consideration. |
|
@Art4 Just to be clear, are you okay with merging #815 in this PR’s stead? Or do you want to address the comments mentioned in above (e.g.#774 (comment)) in this PR? |
|
@jtojnar I think I have addressed all the comments relevant to this PR. Some issues in this PR are addressed in #777, because they can't be fixed here. All open comments are either "not important for now", "fine for now" or you have already suggested improvements in #815. As long as this is not explicitly requested, I don't see the need to replicate the improvements here, but would rather discuss them in #815 (and yes, I still have some thoughts on them). The discussion is otherwise even more scattered than it already is because of the many open issues. @mblaney This PR is the base of or mentioned by #777, #813, #815, #812 and #810. These issues are partly the basis for further issues like #826 and #833. Closing this PR will be quite confusing in retrospect. So if you are happy with this PR for the most part, I recommend that we merge this PR as a base and discuss the other changes in the other PRs and issues. I would also like to see a decision from you in #782 because it has far reaching meanings on issues like #777. And I'd appreciate any thoughts or reactions from you to #731. |
I would argue that having a clean commit history with the details in commit messages (as in #815) is more important than GitHub issues, as it facilitates git archaeology. Here, we at most get a single big squashed commit without any rationale for the changes, which will be more confusing when running Additionally, the git repo is a more permanent record – it already survived move from the previous hosting provider while the issues did not. You could also the contents of this PR with commits from #815 using the following: That would allow us to merge this PR (#774) with a cleaner and more descriptive commit history. |
|
And here we are again at the point where a decision has to be made. @mblaney, please take over. 🙂 |
|
Thanks @Art4 and @jtojnar for the updates. I appreciate yours thoughts on commit history @jtojnar, but I would prefer authors of a PR completed their work, so will retract my earlier comment and now merge #774. I hope that's ok, if you want to open a new PR or modify #815 for any of the changes you've made it would be good to discuss them separately. |
That would be nice but the quality of the software should not suffer for it. Building software is a collaborative effort anyway, so it should not be much of a problem, as long as authorship is properly credited. Unfortunately, as it is now the history is a complete mess making any debugging using And even worse, there are also correctness errors in this PR (e.g. #774 (comment)). IMO the Edit: I have updated #815. Not much we can do about deeper history but at least blame of relevant changes will end there. |

Hi 👋
At the moment we have
SimplePie\File::__construct(), where a request is handled usingcurl,fsockopenorfile_get_contents. The response is saved in the class properties likebody,headers, and so on.This PR is a first step to allow other HTTP libraries in SimplePie. So this PR splits the File class into 2 parts:
request()method. The client internally calls theFileclass like before.Fileclass now implements aResponseinterface, that allows getting all the data like before through methods.The concept and interfaces are interoperable with PSR-18 and PSR-7. The next step will be an implementation of
SimplePie\HTTP\Clientthat works with a PSR-18 client under the hood.refs #520