Account for more YouTube URL formats#4508
Conversation
The ID should be able to include a - or _, like: https://www.youtube.com/watch?v=xo68-iWaKv8 https://youtu.be/CMrv_D78oxY
|
Steps To Test
|
Testing the filtered content would require adding another 2 mocks to mock_http_request() This isn't ideal, though.
| public function test_get_video_id_from_url( $url, $expected ) { | ||
| $this->assertEquals( | ||
| $expected, | ||
| $this->call_private_method( $this->handler, 'get_video_id_from_url', [ $url ] ) |
There was a problem hiding this comment.
Testing this private method isn't ideal. It'd be better to add test cases to test__conversion()
But that would also require mocking 2 more HTTP responses, and might be overkill.
There was a problem hiding this comment.
For cleaner testing, you would extract the logic to convert an arbitrary URL into a Youtube Video ID into a separate object. This turns this private method into a public method on a separate object to test.
Before, the param tag descriptions were not aligned.
|
Hi @westonruter, Thanks, Weston! |
| */ | ||
| private function get_video_id_from_url( $url ) { | ||
| if ( preg_match( '/(?:watch\?v=|embed\/|youtu.be\/)(?P<id>\w*)/', $url, $match ) ) { | ||
| if ( preg_match( '/(?:watch\?v=|embed\/|youtu.be\/)(?P<id>[\w_-]*)/', $url, $match ) ) { |
There was a problem hiding this comment.
Pattern used in 1.4:
amp-wp/includes/embeds/class-amp-youtube-embed.php
Lines 15 to 16 in dce4fae
Is all of this accounted for?
There was a problem hiding this comment.
I wonder if the complexity of the pattern is part of the problem for why we missed this. YouTube doesn't actually specify the allowed characters in the video ID, so limiting it to a-zA-Z0-9_- may or may not account for everything.
I think we could make it more robust by parsing the URL first:
$parsed_url = wp_parse_url( $url );
$query_vars = [];
if ( isset( $parsed_url['query'] ) ) {
wp_parse_str( $parsed_url['query'], $query_vars );
if ( isset( $query_vars['v'] ) ) {
return $query_vars['v']; // e.g. https://www.youtube.com/watch?v=CMrv_D78oxY
} elseif ( isset( $query_vars['vi'] ) ) {
return $query_vars['vi']; // e.g. http://youtube.com/watch?vi=oTJRivZTMLs&feature=channel
}
}
if ( isset( $parsed_url['path'] ) ) {
$segments = explode( '/', trim( $parsed_url['path'], '/' ) );
if ( 1 === count( $segments ) ) {
return $segments[0]; // e.g. https://youtu.be/XOY3ZUO6P0k
} elseif ( count( $segments ) > 1 ) {
if ( in_array( $segments[0], [ 'embed', 'v', 'e', 'vi'], true ) ) {
return $segments[1]; // e.g. http://www.youtube.com/v/-wtIMTCHWuI?version=3&autohide=1
}
}
}
return false;There was a problem hiding this comment.
Here's a list of a bunch of URLs which could be all accounted for: https://gist.github.com/rodrigoborgesdeoliveira/987683cfbfcc8d800192da1e73adc486
The www.youtube-nocookie.com domain also needs to be accounted for.
Not all of those URLs are valid.
There was a problem hiding this comment.
OK, thanks. Your snippet is committed here verbatim, other than an extra empty line: b1344d5...b8d38fa#diff-23071a193ac2ab7d9be3c9eb4bdebac6R228
There's also a test for www.youtube-nocookie.com.
Weston's suggestion for get_video_id_from_url() parses the URL, instead of only matching. Also, add some tests, drawing from: https://gist.github.com/rodrigoborgesdeoliveira/987683cfbfcc8d800192da1e73adc486
Now, there's a case for one ending in a query param, and for not.
As Weston mentioned, support for this is needed.
| public function test_get_video_id_from_url( $url, $expected ) { | ||
| $this->assertEquals( | ||
| $expected, | ||
| $this->call_private_method( $this->handler, 'get_video_id_from_url', [ $url ] ) |
There was a problem hiding this comment.
For cleaner testing, you would extract the logic to convert an arbitrary URL into a Youtube Video ID into a separate object. This turns this private method into a public method on a separate object to test.
As Alain mentioned, is possible to have a playlist URL like https://www.youtube.com/playlist?list=PLCra4VPr-3frJzAd-lVYo3-34wu0Eax_u
|
This is a cool tool to test this PR. You can put the id of any YT video and it generates a wide variety of variants: https://www.clivemcg.com/ytvg/ |
amedina
left a comment
There was a problem hiding this comment.
Successfully tested a large variety of URL formats.
Those work for me. There is an issue with the last URL however; the video does not start at the 10s mark on the AMP version. According to the docs this can be done by adding Parsing the URL for these type of parameters is going to take much more work and probably out of scope for this PR. |
|
@pierlon You are correct. I changed the comment before because those actually worked. I could reproduce the one with the starting time. |
|
@pierlon |
* Account for hyphens or underscores in YouTube URL The ID should be able to include a - or _, like: https://www.youtube.com/watch?v=xo68-iWaKv8 https://youtu.be/CMrv_D78oxY * Test the method instead of the full filtered content Testing the filtered content would require adding another 2 mocks to mock_http_request() This isn't ideal, though. * Align the param descriptions vertically Before, the param tag descriptions were not aligned. * Change the name of the dataProvider method * Commit Weston's suggestion, add some test cases Weston's suggestion for get_video_id_from_url() parses the URL, instead of only matching. Also, add some tests, drawing from: https://gist.github.com/rodrigoborgesdeoliveira/987683cfbfcc8d800192da1e73adc486 * Add another embed URL test case Now, there's a case for one ending in a query param, and for not. * Add a test for youtube-nocookie As Weston mentioned, support for this is needed. * Ensure that the first segment isn't account or playlist As Alain mentioned, is possible to have a playlist URL like https://www.youtube.com/playlist?list=PLCra4VPr-3frJzAd-lVYo3-34wu0Eax_u * Indicate unused URL_PATTERN constant is deprecated * Include youtube-nocookie.com domains for completeness in oEmbed filtering * Eliminate duplicate YouTube URL parsing in video_override method * Eliminate redundant URL parsing * Harden support for short URLs and use allowlist for root segments * Disrecard subdomains when checking for YouTube domain Co-authored-by: Weston Ruter <westonruter@google.com>
|
QA:
|
* tag '1.5.2': Bump 1.5.2 Bump version to 1.5.1-RC1 Cache response status and headers when fetching external stylesheets (#4509) Fix securing multi-line mustache templates (#4521) Add CSS monitoring time series to Site Health debugging info (#4519) Update hostname used for WordPress TV embeds to fix external HTTP requests (#4524) Fix processing of element child sanitization loop when invalid elements are replaced with children (#4512) Account for more YouTube URL formats (#4508) Update selected featured image ID on select (#4453) Raise default threshold for disabling CSS caching (#4513) Cast i-amphtml-intrinsic-sizer dimensions to integers (#4506) Only move meta tags to the head when required and add processing for meta[http-equiv] (#4505) Fix failing tests (#4507) Bump 1.5.2-alpha




Summary
A YouTube embed URL can now include a hyphen or underscore, like:
https://www.youtube.com/watch?v=xo68-iWaKv8
https://youtu.be/CMrv_D78oxY
Fixes #4504
Checklist