Skip to content

Use idn_to_ascii function instead of idna_convert library#785

Merged
mblaney merged 1 commit intosimplepie:masterfrom
jtojnar:idn
May 30, 2023
Merged

Use idn_to_ascii function instead of idna_convert library#785
mblaney merged 1 commit intosimplepie:masterfrom
jtojnar:idn

Conversation

@jtojnar
Copy link
Member

@jtojnar jtojnar commented Feb 16, 2023

The idna_convert library is outdated and there is now a native PHP function to convert IDNs to Punycode.
It requires intl extension or a polyfill.

compress_parse_url will append ? and # to the URI even if there is no query string or hash fragment, and insert // even when there is no authority part. To avoid this and the breakage of tests this would cause, let’s call the function only for files having a non-empty authority part of the URI (i.e. non-local files) that contains characters that are not considered printable ASCII (such as bytes of UTF-8 encoded Unicode characters).

In the future, we might leave the IDN support to HTTP library to deal with.

Fixes: #538
Supersedes: #776

@jtojnar jtojnar force-pushed the idn branch 2 times, most recently from 57c0130 to a9f41f1 Compare February 16, 2023 18:14
$parsed = \SimplePie\Misc::parse_url($link);
$this->link = \SimplePie\Misc::compress_parse_url($parsed['scheme'], $idn->encode($parsed['authority']), $parsed['path'], $parsed['query'], $parsed['fragment']);
if ($parsed['authority'] !== '' && !ctype_print($parsed['authority'])) {
$authority = \idn_to_ascii($parsed['authority'], \IDNA_NONTRANSITIONAL_TO_ASCII, \INTL_IDNA_VARIANT_UTS46);
Copy link
Member Author

@jtojnar jtojnar Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the original code was already broken on authority strings containing non-ASCII characters in userinfo part of the URI, e.g. čížek@háčkyčárky.cz:

  • This is what idna_convert gives: xn--ek-mja1kox@xn--hkyrky-ptac70bc.cz
  • This is what Firefox converts: %C4%8D%C3%AD%C5%BEek@xn--hkyrky-ptac70bc.cz

It probably does not make sense to attempt to fix it when it will likely be corrected with switch to PSR-18.

Edit: Looks like they are deprecated anyway https://www.rfc-editor.org/rfc/rfc7540#section-8.1.2.3

@@ -1,969 +0,0 @@
<?php
// {{{ license
Copy link
Contributor

@Art4 Art4 Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this file might be a breaking change. What do you think about keeping this file and simply triggering a deprecation error?

<?php

declare(strict_types=1);

trigger_error('Using class `idna_convert` from `idn/idna_convert.class.php` is deprecated since SimplePie 1.9.0, the native `idn_to_ascii()` function is used instead.', \E_USER_DEPRECATED);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that this is not a part of API – it is not part of SimplePie, it is not listed in the API docs nor was it documented anywhere else.

I have listed it in the Removed section of changelog so that if someone uses it for themselves, they can just copy it from older version.

@Art4
Copy link
Contributor

Art4 commented Feb 17, 2023

I like this approach. idn_to_ascii() should available in the most PHP environments and will work out of the box without requiring any files. If it is not present, one can define it using the Algo26\IdnaConvert library or using the polyfill symfony/polyfill-intl-idn.

This PR fixes #538.

@Art4 Art4 mentioned this pull request Feb 17, 2023
48 tasks
@Art4 Art4 mentioned this pull request Mar 15, 2023
@jtojnar jtojnar mentioned this pull request Mar 18, 2023
The `idna_convert` library is outdated and there is now a native PHP function to convert IDNs to Punycode.
It requires `intl` extension or a [polyfill](https://github.com/symfony/polyfill-intl-idn).

`compress_parse_url` will append `?` and `#` to the URI even if there is no query string or hash fragment, and insert `//` even when there is no authority part. To avoid this and the breakage of tests this would cause, let’s call the function only for files having a non-empty authority part of the URI (i.e. non-local files) that contains characters that are not considered printable ASCII (such as bytes of UTF-8 encoded Unicode characters).

In the future, we might leave the IDN support to HTTP library to deal with.
@Art4
Copy link
Contributor

Art4 commented May 26, 2023

@mblaney This is ready for merge.

@mblaney mblaney merged commit 06b2687 into simplepie:master May 30, 2023
@jtojnar jtojnar deleted the idn branch May 30, 2023 12:14
@Art4 Art4 added this to the 1.9.0 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bundled idn/idna_convert.class.php can be replaced with mso/idna-convert

3 participants