Refactor ImageCDN parsing to rely on HTML API instead of RegExps#32700
Refactor ImageCDN parsing to rely on HTML API instead of RegExps#32700
Conversation
|
Are you an Automattician? Please test your changes on all WordPress.com environments to help mitigate accidental explosions.
Interested in more tips and information?
|
|
Thank you for your PR! When contributing to Jetpack, we have a few suggestions that can help us test and review your patch:
This comment will be updated as you work on your PR and make changes. If you think that some of those checks are not needed for your PR, please explain why you think so. Thanks for cooperation 🤖 The e2e test report can be found here. Please note that it can take a few minutes after the e2e tests checks are complete for the report to be available. Follow this PR Review Process:
Still unsure? Reach out in #jetpack-developers for guidance! |
ba4718b to
564599c
Compare
b318fbb to
dc3a6b1
Compare
6b724e7 to
583e9ca
Compare
jeherve
left a comment
There was a problem hiding this comment.
Thank you for proposing this! This is a nice way to use WP_HTML_Tag_Processor, and something we can do now that we'll be requiring WordPress 6.2 (#31638).
I have not tested your PR, but left a couple of comments below about how things work in the monorepo, if that can help you discover how we usually do things :)
|
@haqadn @Automattic/heart-of-gold can you make sure you keep an eye on this one? |
484deb0 to
ffab63f
Compare
e51843b to
5e23682
Compare
|
One note I should leave here is to ask about pulling in Core's The issue here is that the test output changes but in a semantically neutral way. I could rewrite the test so that it hardcodes the newer ordering of output, but that would leave this fragility in place for the future. |
jeherve
left a comment
There was a problem hiding this comment.
should I try and find someone to review this? are either of you happy with the proposed change?
Maybe @Automattic/heart-of-gold could take a look?
| /** | ||
| * Allow specific images to be skipped by Photon. | ||
| * | ||
| * @TODO: Does this need to pass the full HTML of the image tag? |
There was a problem hiding this comment.
I looked at usage of this filter in other plugins in the WordPress.org plugin directory. While most plugins do not use this third parameter, there are a few that do, sometimes to look for a specific class name in the HTML. So I think we should keep this available if possible.
| * Filter whether an image using an attachment ID in its class has to be uploaded to the local site to go through Photon. | ||
| * | ||
| * @see https://developer.wordpress.com/docs/photon/api/ | ||
| * @TODO: What is the point of passing $images and $index. Are they required? |
There was a problem hiding this comment.
I looked around but couldn't figure out where that filter is being used today. It was used in the past, but may not be used anymore today.
haqadn
left a comment
There was a problem hiding this comment.
The code changes look good to me and no problems found after testing. 👍🏼
P.S.: I only worked on migrating from a Jetpack module to image-cdn package. I don't have prior knowledge of the implementation details on the core functionality.
|
I will reconstruct the existing filter value before merge. |
| remove_image_size( 'jetpack_soft_oversized_after_upload' ); | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
I think these functions are okay here.
Personally, I would put them in the parent class at class-image-cdn-attachment-test-case.php but, it looks like this is the only (current) test that extends this particular class.
|
Since we're late in the beta testing week, I'd say let's merge this next week after the branch cut so we get the longest amount of time to test it in the wild before the next dotorg release |
ab212e3 to
424e68a
Compare
424e68a to
074a797
Compare
074a797 to
c7493a4
Compare
) The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and visit all images, then read and modify them based on the values of their attributes and computed Photon properties. In the previous code the `Image_CDN` class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them. Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com>
c7493a4 to
227be97
Compare
No `assertEquivalentMarkup` exists yet, so this gets around that without creating one.
|
Fixed broken test. @dmsnell |
|
@dilirity - thanks for fixing those tests - not sure how I copy/pasted the original typo on as far as I know it's ready to go, and it was ready to go, but I'm not really sure how to test and vet it to the level of quality at which I would have preferred (since I was jumping in to unfamiliar code and proposing a change in a system I'm not working in). if you think this is sound (and according to the tests and a code audit it seems ready) then it should be fine to merge. had I been in a place where I could thoroughly test or watch and react to failures post-merge I would have. |
There was a problem hiding this comment.
My bad. I was using two different images. Ignore this. Though I'm not sure why only one of these images' width/height is 150 when both are set to use it.
So I did some testing and found something odd. Markup for resulted images below.
If you add an image via the Image block and use these settings:
On the production version of Boost, it looks like this:
<img decoding="async" width="150" height="150" src="https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash.jpg?resize=150%2C150&ssl=1" alt="" class="wp-image-12" style="aspect-ratio:4/3;object-fit:cover" srcset="https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash-scaled.jpg?resize=150%2C150&ssl=1 150w, https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash-scaled.jpg?zoom=2&resize=150%2C150&ssl=1 300w, https://i0.wp.com/honest-shark.jurassic.ninja/wp-content/uploads/2024/08/donnie-rosie-O7L3MrlSAHA-unsplash-scaled.jpg?zoom=3&resize=150%2C150&ssl=1 450w" sizes="(max-width: 150px) 100vw, 150px" data-recalc-dims="1">However, this branch running Boost seems to ignore the size:
<img fetchpriority="high" decoding="async" width="1707" height="2560" src="https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=150%2C150&ssl=1" alt="" class="wp-image-14" style="aspect-ratio:4/3;object-fit:cover" srcset="https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?w=1707&ssl=1 1707w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=200%2C300&ssl=1 200w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=683%2C1024&ssl=1 683w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=768%2C1152&ssl=1 768w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=1024%2C1536&ssl=1 1024w, https://i0.wp.com/extraordinary-gazelle.jurassic.ninja/wp-content/uploads/2024/08/filip-zrnzevic-QsWG0kjPQRY-unsplash-scaled.jpg?resize=1365%2C2048&ssl=1 1365w" sizes="(max-width: 1000px) 100vw, 1000px">I'm not sure if this is Boost related, but I don't think we've made any changes to Image CDN. Boost is running on the Free version with only Critical CSS and Image CDN enabled.
dilirity
left a comment
There was a problem hiding this comment.
Well, I couldn't find anything broken. Code-wise it is good and tests seem to be happy.
I think we can ![]()
dilirity
left a comment
There was a problem hiding this comment.
Approving after merge with trunk.
) * Refactor ImageCDN parsing to rely on HTML API instead of RegExps (#32700) The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and visit all images, then read and modify them based on the values of their attributes and computed Photon properties. In the previous code the `Image_CDN` class scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them. Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com> * Rearrange semantically equivalent test output to avoid false negatives. No `assertEquivalentMarkup` exists yet, so this gets around that without creating one. * Fix broken test * Remove unnecessary comment * Fix static analysis issues * Fix static analysis issue * Bump project version to 0.4.7-alpha * Fix project version --------- Co-authored-by: Adnan Haque <3737780+haqadn@users.noreply.github.com> Co-authored-by: Brandon Kraft <public@brandonkraft.com> Co-authored-by: Jeremy Herve <jeremy@jeremy.hu> Co-authored-by: Mark George <thingalon@gmail.com> Co-authored-by: Osk <oskosk@users.noreply.github.com> Co-authored-by: Peter Petrov <peter.petrov89@gmail.com>



Status
This is a work in progress and isn't tested or verified.This has been reviewed, but the filters aren't tested because it's unclear what code might rely on them.Due to the change in indentation the diff view is more associated with the actual changes if ignoring whitespace.
Proposed changes:
The introduction of the HTML API into WordPress 6.2 offers a new method of matching and modifying HTML. In this patch we're replacing code that attempts to parse the input HTML and extract images that are direct children of an anchor ("A" tag), then read and modify them based on the values of their attributes and computed Photon properties.
In the previous code the
Image_CDNclass scanned the entire HTML document to generate a list of PREG image match objects, then iterated over those matches and performed string-replace operations on them.Now the class does a pass from start to finish, visting each image tag along the way, and making the appropriate modifications. Extra care is taken to ensure that only images that are the single child of a link are matched.
In this change the values of the
tagkey in some of the filters has changed from the initial matched HTML snippet to the name of the image tag, which could beIMGorAMP-IMGorAMP-ANIM. An update to the Tag Processor or a custom sub-class thereof could provide the original HTML snippet and match the existing behavior, but that hasn't been done in this patch yet given the author's uncertainty about the use and value of those snippets.Other information:
Jetpack product discussion
This mandates running on WordPress 6.2. If we prefer being able to run on older versions, we could also include a copy of the Tag Processor in the plugin.
Does this pull request change what data or activity we track or use?
No.
Testing instructions:
Test suite should pass. Manual review is necessary though.
I don't even know what all the function I modified is supposed to do fully, so code auditing and review of the modifications is critical.