Sanitize invalid children of amp-story and amp-story-page elements to prevent white story of death#3336
Conversation
schlessera
left a comment
There was a problem hiding this comment.
Minor nitpicks only.
| public function sanitize() { | ||
| $nodes = $this->dom->getElementsByTagName( self::$tag ); | ||
| $num_nodes = $nodes->length; | ||
| $this->amp_story_tag_spec = AMP_Allowed_Tags_Generated::get_allowed_tag( 'amp-story' )[0]; |
There was a problem hiding this comment.
get_allowed_tag() can potentially return null and this will then throw a notice on PHP 7.4+: https://3v4l.org/pnjnl
However, I assume we fully control the allowed tags here and those we check for here can't be filtered away?
There was a problem hiding this comment.
That's correct. If we update the Validator spec and it results in these tag specs being null then we'd catch it in the unit test.
| return; | ||
| $amp_story_element = $this->dom->getElementsByTagName( 'amp-story' )->item( 0 ); | ||
| if ( $amp_story_element instanceof DOMElement ) { | ||
| $this->sanitize_story_element( $amp_story_element ); |
There was a problem hiding this comment.
The way this flows seems counterintuitive to me, it makes it look like sanitizing the story element is an edge case.
I would prefer it for the condition to be inversed and add an early return. Then have the sanitize_story_element() as the default next step.
| $node = $element->firstChild; | ||
| while ( $node ) { | ||
| $next_node = $node->nextSibling; | ||
| if ( $node instanceof DOMElement ) { |
There was a problem hiding this comment.
Same logic inversion here, I would prefer an early return (continue in this case) instead of making the main logic look like an edge case.
There was a problem hiding this comment.
True, but the reason why I did it this way was because of $node = $next_node needing to run below. Otherwise, I'd have added:
if ( ! $node instanceof DOMElement ) {
$node = $next_node;
continue;
}But that seems worse because the logic is duplicated.
There was a problem hiding this comment.
How about something like this:
$node = $element->firstChild;
do {
$next_node = $node->nextSibling;
if ( ! $node instanceof DOMElement ) {
continue;
}
if ( 'amp-story-page' === $node->nodeName ) {
$page_number++;
$this->sanitize_story_page_element( $node, $page_number );
} elseif ( ! in_array( $node->nodeName, $this->amp_story_tag_spec['tag_spec']['child_tags']['child_tag_name_oneof'], true ) ) {
$this->remove_invalid_child( $node );
}
} while ( $node = $next_node );There was a problem hiding this comment.
Note: This is mostly just preference here. I'll approve the changes and let you decide whether you want to make changes or not.
There was a problem hiding this comment.
Thanks, I like that. However, I tried it and then there is a PHPCS compliant: WordPress.CodeAnalysis.AssignmentInCondition.FoundInWhileCondition. We can revisit later.
| $node = $element->firstChild; | ||
| while ( $node ) { | ||
| $next_node = $node->nextSibling; | ||
| if ( $node instanceof DOMElement ) { |
There was a problem hiding this comment.
I would also prefer an early return/continue here instead.
There was a problem hiding this comment.
See reasoning above.
| '<amp-story-page><p>Before layer</p><amp-story-grid-layer><p>Lorem Ipsum Demet Delorit.</p></amp-story-grid-layer><p>After layer</p></amp-story-page</p>', | ||
| '<amp-story-page><amp-story-grid-layer><p>Lorem Ipsum Demet Delorit.</p></amp-story-grid-layer></amp-story-page>', | ||
| ], | ||
| ]; |
There was a problem hiding this comment.
Test for CTA removal is missing...?
There was a problem hiding this comment.
That should be covered above by story_with_cta_on_first_page and story_with_multiple_cta_on_second_page.
| if ( ! isset( $rule_specs ) ) { | ||
| continue; | ||
| } | ||
| foreach ( $rule_specs as $rule_spec ) { |
There was a problem hiding this comment.
Note that the $rule_specs array only has one item in it.
… prevent white story of death (#3336) * Sanitize invalid children of amp-story and amp-story-page elements * Harden logic for gathering allowed children for AMP Stories
* tag '1.3.0': (318 commits) Bump 1.3.0 Add inline styles for custom fonts (#3345) Limit deeply-nesting test to 200 to fix Xdebug error (#3341) Bump 1.3-RC2 (#3335) Sanitize invalid children of amp-story and amp-story-page elements to prevent white story of death (#3336) Remove unused Travis deploy stage (#3340) Implement automated accessibility testing using Axe (#3294) Only add all Google Font style rules in editor context Prevent adding AMP query var to Story URLs in Compatibility Tool Prevent attempting to redirect Stories with rejected validation errors Ensure all AMP scripts (including v0.js) get moved to the head Make sure that media picker is background types are filter correctly. Normalize style[type] attribute quote style after r46164 in WP core Fix phpunit covers tags Bump version to 1.3-RC1 Strip 100% width/height from layout=fill elements Fix issue with cut (#3246) Remove unused Google Fonts SVGs (#3289) Fix resize for non-fit text box (#3259) Use template_dir consistently as signal for transitional mode ...
A compatibility issue was discovered in #3321 with the Reading Time WP plugin, but it is likely going to happen with other plugins as well. The Reading Time WP plugin filters
the_contentto inject this at the beginning:This results in invalid
amp-storywhich restricts its children to elements likeamp-story-page. When thisspanis a direct child and thechild_tag_name_oneofconstraint is violated, the result is the entireamp-storybeing invalid and a white story of death (where thebodyhas no children). The validation error is not helpful at all:This problem was actually “prophesied” in #2926:
So this PR fixes the problem by extending the
AMP_Story_Sanitizerto preemptively remove AMP Story elements underamp-storyandamp-story-pagewhich are invalid. These are the two elements which have thechild_tag_name_oneofconstraint. This special case sanitizer is especially important for AMP Stories since all of the markup for a story is inpost_contentand is prone to be mutated withthe_contentfilters to add elements like word counts, sharing buttons, and related posts. This PR prevents such elements from being seen by the tag-and-attribute sanitizer, thus preventing theamp-storyandamp-story-pageas a whole from being removed.In the case of the
spanwhich the Reading Time WP plugin adds tothe_content, the validation error now becomes much more helpful:And no white story of death occurs.
Fixes #3321.