Skip to content

Add support for PSR-18 HTTP clients#826

Closed
jtojnar wants to merge 15 commits intosimplepie:masterfrom
jtojnar:add-psr18-http-client-support-rebased
Closed

Add support for PSR-18 HTTP clients#826
jtojnar wants to merge 15 commits intosimplepie:masterfrom
jtojnar:add-psr18-http-client-support-rebased

Conversation

@jtojnar
Copy link
Member

@jtojnar jtojnar commented Mar 20, 2023

This is #777 rebased and commits squashed/reordered for easier reviewing and cleaner Git history.

Similarly to how #815 is rebased #774.

Full diff: https://github.com/Art4/simplepie/compare/c7330e69af805877bf599bf8c48e7082d44d3b84..jtojnar:simplepie:add-psr18-http-client-support-rebased

@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch 2 times, most recently from f8d385f to e5dce13 Compare March 20, 2023 12:37
@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch from e5dce13 to bd54a17 Compare March 20, 2023 12:53
Copy link
Member Author

@jtojnar jtojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to Add SimplePie::set_http_client(), add tests

$permanentUrl = $requestedUrl;
}

$request = $request->withUri($this->uriFactory->createUri($requestedUrl));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-absolute URLs are allowed these days: https://httpwg.org/specs/rfc9110.html#field.location

Copy link
Contributor

@Art4 Art4 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's the problem. This is part of the UriFactory and Request class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the location header could have a non-absolute url (without host). But this is not a problem.

$this->uriFactory->createUri() creates a new Uri instance from the non-absolute url. This uri is set to $request->withUri($uri). This will take care of the missing host:

This method MUST update the Host header of the returned request by
* default if the URI contains a host component. If the URI does not
* contain a host component, any pre-existing Host header MUST be carried
* over to the returned request.

See https://github.com/php-fig/http-message/blob/2.0/src/RequestInterface.php#L99-L129

$requestedUrl = $response->getHeaderLine('Location');

if ($statusCode === 301) {
$permanentUrl = $requestedUrl;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can do anything about being passed a client that already handles redirects (like selfoss does).

We might not have other choice than adding support for each specific one. For example:

if (class_exists(\GuzzleHttp\RedirectMiddleware::class) && $response instanceof \GuzzleHttp\Psr7\Response && $response->hasHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER)) {
    $permanentUrl = self::getGuzzleHistoryPermanentUrl($url, $response);
}


/**
 * Get permanent URL of the response.
 * RedirectMiddleware will need to be enabled for this to work.
 *
 * @param string $url requested URL, to use as a fallback
 * @param GuzzleHttp\Psr7\Response $response response to inspect
 *
 * @return string URL we should change the future requests to
 */
public static function getGuzzleHistoryPermanentUrl(string $url, \GuzzleHttp\Psr7\Response $response) {
    // Sequence of fetched URLs
    $urlStack = array_merge([$url], $response->getHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER));
    // Let’s consider the first URL a permanent redirect since it is our start point.
    $statusStack = array_merge([301], $response->getHeader(\GuzzleHttp\RedirectMiddleware::STATUS_HISTORY_HEADER));

    // Follow the chain util first non-permanent redirect, since non-permanent
    // redirect could invalidate permanent ones later in the chain.
    $index = 0;
    while (($statusStack[$i] === 301 || $statusStack[$i] === 308) && $index < count($urlStack)) {
        $index++;
    }

    return $urlStack[$index - 1];
}

Copy link
Contributor

@Art4 Art4 Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a PSR-18 client handles the redirects they will return a PSR-7 Response with a 200 (or other then 301||308) status code. So we have no chance to get the final requested url. But this is not a problem because File will be deprecated and removed. I would not put any work into this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but we need to get the permanent URL for features like fossar/selfoss#1178.

Unless we want to deprecate SimplePie::$permanent_url and expose the Response in SimplePie so that apps could extract the permanent URL themselves.

Or we could instruct consumers to not pass clients with RedirectMiddleware in the docs but that is error-prone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should deprecate also SimplePie::$permanent_url while deprecating File, and we should expose Psr\Http\Message\ResponseInterface instead. The extraction of the permanent URL would be a detail of the provided PSR-18 client. This has to be discussed in future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, looking at the Guzzle docs, their redirect middleware will apparently not be used when using Guzzle as PSR-18 client:

This option has no effect when making requests using GuzzleHttp\Client::sendRequest(). In order to stay compliant with PSR-18 any redirect response is returned as is.

That means the PSR-18 client will not be aware that the requests are connected when we handle redirects ourselves and there will be no way to extract the permanent URL from it.

$filepath = dirname(__FILE__, 3) . '/data/this-file-does-not-exist';

$this->expectException(HttpException::class);
$this->expectExceptionMessage('file_get_contents('.$filepath.'): Failed to open stream: No such file or directory');
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should match error messages like this. I do not think they are guaranteed to be stable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking the error message to be sure that the test failed for the expected reason. As an alternative we could throw a new FileNotFoundException.

try {
$raw = file_get_contents($path);
} catch (Throwable $th) {
throw new HttpException($th->getMessage(), $th->getCode(), $th);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file_get_contents should not throw an exception by itself. I presume this is to catch it if app throws in the handler passed to set_error_handler? Perhaps add a comment?

Copy link
Contributor

@Art4 Art4 Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AIUI, E_WARNING will not be converted into throwable unless the application defines a custom error handler with set_error_handler, that throws.

That tests catches the throw below:

if ($raw === false) {
    throw new HttpException('file_get_contents() could not read the file', 1);
}

Copy link
Contributor

@Art4 Art4 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user has a custom error handler we can catch the error and throw it as a HttpException. The tests expect the message from the PHP Error as file_get_contents(/path/to/file): Failed to open stream: No such file or directory.

If the user has no custom error handler, $raw will be false and we throw a HttpException with the message file_get_contents() could not read the file. (Saying this I'm realizing that we have no tests for this second case.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a is_readable() check in #848.

try {
$raw = file_get_contents($path);
} catch (Throwable $th) {
throw new HttpException($th->getMessage(), $th->getCode(), $th);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either way, the error code will probably not be meaningful for HttpException.

@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch 2 times, most recently from 839643d to 1a218b6 Compare March 21, 2023 03:55
private function get_http_client(): Client
{
if ($this->http_client === null) {
$this->http_client = new FileClient(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like SimplePie will no longer share input data. Perhaps we should have separate internal set_http_client method that will continue accepting it from SimplePie and a public one set_http_factories for consumers that want to use Sanitize or Locator outside the context of SimplePie?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sanitize has already set_http_client() for PSR classes and the (deprecated) pass_file_data(). If one has not called set_http_client(), the http client will be created on the deprecated file data.

@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch 2 times, most recently from ea32232 to 223e6ed Compare May 30, 2023 17:13
Copy link
Member Author

@jtojnar jtojnar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have finished reordering the commits and noted some more stuff I have noticed.

UriFactoryInterface $uri_factory
): void {
$this->http_client_injected = true;
$this->http_client = new Psr18Client($http_client, $request_factory, $uri_factory);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have seen some libraries to use Psr18ClientDiscovery. While I dislike this on principle, it is certainly convenient.

}

/**
* Return the string representation as a URI reference.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just use {@inheritdoc} here for ApiGen.

Weirdly, Symfony started removing it symfony/symfony#47390. Perhaps because they no longer seem to have generated API docs – quick search only found https://symfony.com/blog/new-symfony-api-documentation which returns 404.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of {@inheritdoc} either.

Comment on lines +13 to +14
- New method `SimplePie\Locator::set_http_client()` for providing PSR-18 HTTP client and PSR-17 factories by @Art4 in [#777](https://github.com/simplepie/simplepie/pull/777)
- New method `SimplePie\Sanitize::set_http_client()` for providing PSR-18 HTTP client and PSR-17 factories by @Art4 in [#777](https://github.com/simplepie/simplepie/pull/777)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #815, we marked the last two as internal. Do we want to promote them to API? I think that would only make sense if we expect consumers to instantiate these classes directly, rather than only having SimplePie create them internally.

For 2.0, I would like to minimize the API surface marking everything final and private, and only opening methods that have good use cases. So we should probably decide if we want to allow consumers to create Locator and Sanitize instances, and make set_http_client() methods on them internal, so that we do not need to deprecate them again.

Copy link
Contributor

@Art4 Art4 Jun 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last two was set internal because they accept an also internal SimplePie\HTTP\Client. #777 changes this to PSR-18 client and PSR-17 factories.

I agree with you that we should minimize the API surface but atm it is allowed to create Locator and Sanitize instances and we are referencing Sanitize::set_http_client() as an alternative for the deprecated Sanitize::pass_file_data() and maybe should do the same with Locator::set_http_client().

Alternative we could complete deprecate the instantiation of Locator and Sanitize for users. This could be made and discussed in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecating the the instantiation of Locator and Sanitize for consumers in a separate PR sounds good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #849.

* @param File|Response $file Input file
*/
public function __construct(File $file)
public function __construct(/* File */ $file)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is no change for PHP variance checker, extending the type that the argument accepts is still a breaking change since subclasses might not expect it.

And there are apparently some 😿 https://github.com/search?q=%2Fextends%20SimplePie.%2BSniffer%2F&type=code

Edit: I am using File::fromResponse here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the type hint is not a breaking change because we have not released 1.9.0 yet. (refs)

Also the type hint in the docblock was wrong, see https://github.com/simplepie/simplepie/blob/1.8.0/src/Content/Type/Sniffer.php#L71-L79

$locate = $this->registry->create(Locator::class, [&$file, $this->timeout, $this->useragent, $this->max_checked_feeds, $this->force_fsockopen, $this->curl_options]);
$locate->set_http_client($this->get_http_client());
$locate = $this->registry->create(Locator::class, [
(! $file instanceof File) ? File::fromResponse($file) : $file,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to move the instanceof check into the method. Edit: Done here.

$requestFactory->method('createRequest')->willReturn($request);

$body = $this->createMock(StreamInterface::class);
$body->method('getContents')->willReturnCallback(function () {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to be changed to __toString. Edit: Done.

@jtojnar jtojnar requested a review from Art4 May 30, 2023 18:06
@Art4 Art4 mentioned this pull request Jun 12, 2023
48 tasks
@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch from 223e6ed to 7ce4db4 Compare June 12, 2023 18:41
@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch from 7ce4db4 to 1abf153 Compare August 10, 2023 19:04
Copy link
Contributor

@Art4 Art4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could resolve the git conflicts I could review this PR.

I think most of the notes are addressed in other issues/PRs.

@Art4 Art4 mentioned this pull request Sep 13, 2023
7 tasks
@Art4
Copy link
Contributor

Art4 commented Nov 2, 2023

@jtojnar May I take over this PR? I could resolve the git conflicts and will review the changes there.

jtojnar and others added 4 commits November 6, 2023 02:38
…ent-support"

There are correctness issues as well as disorganized commit history.
The fixed changes are re-applied in the commits that follow.
Art4 added 11 commits November 6, 2023 02:38
Providers are meant for data parametrization.
This one basically created a completely different test situation
and what we are going to test next will be even more wild.
Let’s dissolve the provider into individual test cases.
We are going to allow using HTTP Clients that will set this to a different
`Response` subclasses than just `File`.

It was already marked private with PHPDoc, let’s enforce it.
Hopefully, this will produce a more scrutable error message
in case someone wants to access `File` properties on this.
This will allow using PSR-18 HTTP Clients.

Since those return a different `Response` subclass than `File`,
we need to convert `Response` to `File` to pass it to `Locator`, `Sniffer` & co.
Allows passing PSR implementations to Sanitize instead of internal HTTP Client.

TODO: Possibly make the set_http_client method internal again.
Allows passing PSR implementations to Locator instead of internal HTTP Client.

TODO: Possibly make the set_http_client method internal again.
@jtojnar jtojnar force-pushed the add-psr18-http-client-support-rebased branch from 1abf153 to 0427439 Compare November 6, 2023 01:42
@jtojnar
Copy link
Member Author

jtojnar commented Nov 6, 2023

I will try moving this forward during the week.

@Art4 Art4 added this to the 1.9.0 milestone Sep 27, 2024
@Art4
Copy link
Contributor

Art4 commented Aug 29, 2025

I think we can move this PR to the 1.10 milestone.

@Art4 Art4 modified the milestones: 1.9.0, 1.10.0 Sep 12, 2025
@jtojnar
Copy link
Member Author

jtojnar commented Sep 12, 2025

Resolving the merge conflicts was too annoying and it is probably too late for clean history anyway. Extracted the fixes and cleanups into #938.

Sorry about the delay.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants