Skip to content

Permuting site URLs can fail for some domains and environments #4776

@aaemnnosttv

Description

@aaemnnosttv

Bug Description

I recently discovered a few tests that only fail locally with some versions of PHP related to testing Module::permute_site_url.

After digging into it, I found the problem was related to parse_url which was malforming parts of the URL when given certain UTF-8 multi-byte characters as used in some IDN domains. This goes back to an old bug in PHP and also seems to be related to the server's configured locale.

Relevant test output
There were 2 failures:

1) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test" ('http://éxämplę.test', array('http://éxämplę.test', 'https://éxämplę.test', 'http://www.éxämplę.test', 'https://www.éxämplę.test', 'http://xn--xmpl-loa2a55a.test', 'https://xn--xmpl-loa2a55a.test', 'http://www.xn--xmpl-loa2a55a.test', 'https://www.xn--xmpl-loa2a55a.test'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 'http://www.xn--xmpl-loa2a55a.test'
-    1 => 'http://www.éxämplę.test'
-    2 => 'http://xn--xmpl-loa2a55a.test'
-    3 => 'http://éxämplę.test'
-    4 => 'https://www.xn--xmpl-loa2a55a.test'
-    5 => 'https://www.éxämplę.test'
-    6 => 'https://xn--xmpl-loa2a55a.test'
-    7 => 'https://éxämplę.test'
+    0 => 'http://www.xn--xmpl?_-bua2b.test'
+    1 => 'http://www.éxämpl?_.test'
+    2 => 'http://xn--xmpl?_-bua2b.test'
+    3 => 'http://éxämpl?_.test'
+    4 => 'https://www.xn--xmpl?_-bua2b.test'
+    5 => 'https://www.éxämpl?_.test'
+    6 => 'https://xn--xmpl?_-bua2b.test'
+    7 => 'https://éxämpl?_.test'
 )

/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51

2) Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url with data set "http://éxämplę.test/sub-directory" ('http://éxämplę.test/sub-directory', array('http://éxämplę.test/sub-directory', 'https://éxämplę.test/sub-directory', 'http://www.éxämplę.test/sub-directory', 'https://www.éxämplę.test/sub-directory', 'http://xn--xmpl-loa2a55a.test...ectory', 'https://xn--xmpl-loa2a55a.tes...ectory', 'http://www.xn--xmpl-loa2a55a....ectory', 'https://www.xn--xmpl-loa2a55a...ectory'))
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
 Array (
-    0 => 'http://www.xn--xmpl-loa2a55a.test/sub-directory'
-    1 => 'http://www.éxämplę.test/sub-directory'
-    2 => 'http://xn--xmpl-loa2a55a.test/sub-directory'
-    3 => 'http://éxämplę.test/sub-directory'
-    4 => 'https://www.xn--xmpl-loa2a55a.test/sub-directory'
-    5 => 'https://www.éxämplę.test/sub-directory'
-    6 => 'https://xn--xmpl-loa2a55a.test/sub-directory'
-    7 => 'https://éxämplę.test/sub-directory'
+    0 => 'http://www.xn--xmpl?_-bua2b.test/sub-directory'
+    1 => 'http://www.éxämpl?_.test/sub-directory'
+    2 => 'http://xn--xmpl?_-bua2b.test/sub-directory'
+    3 => 'http://éxämpl?_.test/sub-directory'
+    4 => 'https://www.xn--xmpl?_-bua2b.test/sub-directory'
+    5 => 'https://www.éxämpl?_.test/sub-directory'
+    6 => 'https://xn--xmpl?_-bua2b.test/sub-directory'
+    7 => 'https://éxämpl?_.test/sub-directory'
 )

/.../wordpress/wp-content/plugins/google-site-kit/vendor/wp-phpunit/wp-phpunit/includes/abstract-testcase.php:814
/.../wordpress/wp-content/plugins/google-site-kit/tests/phpunit/integration/Core/Modules/ModuleTest.php:214
/.../wordpress/wp-content/plugins/google-site-kit/vendor/phpunit/phpunit/phpunit:51


Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • A new URL class should be created in the Core\Util namespace with a single static parse method for now
    • The signature and behavior should match that of (wp_)parse_url with the main difference being that it is safe to use with multi-byte UTF-8 IDN domains
    • Any given URLs that do not contain problematic characters could continue to use wp_parse_url internally
  • All usage of (wp_)parse_url should be refactored to use the new utility

Implementation Brief

  • Create a new class named URL in the includes/Core/Util folder having the Core\Util namespace as per the AC.
  • Create a static method parse with the same signature as wp_parse_url(), i.e. it should accept $url and $component as parameters.
    • Check if $url contains multibyte chars, i.e. if mb_strlen($url, 'UTF-8') != strlen($url).
    • If this is not the case, simply delegate the call to wp_parse_url() and return its output.
    • If the string does contain multibyte chars, use the same logic within wp_parse_url() to parse the $url. But instead of calling PHP's inbuilt parse_url() method, create and call a function that encodes the URL and decodes the parts later. An example of this function can be found in this comment.
    • Find and replace the usage of wp_parse_url() and parse_url() methods with URL::parse().

Test Coverage

  • Add phpunit tests for URL::parse() with similar URL's used in Google\Site_Kit\Tests\Core\Modules\ModuleTest::test_permute_site_url.

QA Brief

  • This issue modifies the code that handles URLs in the "back-end" of the plugin. As such, smoke testing the plugin would be required with particular attention to the following areas. When doing these tests, use a site URL / domain name that has foreign UTF-8 multi-byte characters such as https://www.éxämplę.com.
    • Plugin setup (redirects from Google Authentication service to the application).
    • User input survey.
    • Entity Search box (verifying the entity URLs on the entity dashboard)
    • AdSense Earnings Report
    • Creating a new Analytics and Analytics 4 property from within the plugin.
    • Creating a TagManager container.

QA:Eng

  • Test this issue on a development machine where the ModuleTest::test_permute_site_url() test fails on the latest release but is fixed when this issue is merged in.

Changelog entry

  • Fix issues in permutation site URLs with multi-byte UTF-8 IDN domains.

Metadata

Metadata

Assignees

Labels

P2Low priorityPHPQA: EngRequires specialized QA by an engineerType: BugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions