Skip to content

Add sanitizer arg to prevent noscript fallbacks; preemptively remove disallowed attributes from fallback#2131

Merged
westonruter merged 13 commits intodevelopfrom
improve/noscript-fallbacks
Apr 15, 2019
Merged

Add sanitizer arg to prevent noscript fallbacks; preemptively remove disallowed attributes from fallback#2131
westonruter merged 13 commits intodevelopfrom
improve/noscript-fallbacks

Conversation

@westonruter
Copy link
Copy Markdown
Member

@westonruter westonruter commented Apr 14, 2019

When a disallowed attribute appears on an input img, video, audio, or iframe currently the validation error will be duplicated once for the replaced AMP component and again for the original element added inside as a noscript. This PR prevents a duplication of the validation error by removing the disallowed attributes preemptively.

Also, in case the noscript fallback elements are not desired, a add_noscript_fallback arg is introduced for the image, audio, video, and iframe sanitizers to prevent it from being added (this is needed for AMP stories, and these commits were picked from #2107).

Lastly, as reported in ampproject/amphtml#21686 the amp-img allows the HTTP protocol whereas the amp-img > noscript > img does not. So this PR prevents adding the noscript > img fallback if it lacks the HTTPS protocol.

Fixes #2126.

@westonruter westonruter added this to the v1.1 milestone Apr 14, 2019
@westonruter westonruter requested a review from felixarntz April 14, 2019 22:36
@googlebot googlebot added the cla: yes Signed the Google CLA label Apr 14, 2019
Copy link
Copy Markdown
Collaborator

@felixarntz felixarntz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix looks solid.

One thing I'd like to change though is that we now have the same code four times. I think we should introduce a trait to make it more straightforward.

}
foreach ( $disallowed_attributes as $disallowed_attribute ) {
$old_node->removeAttribute( $disallowed_attribute );
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not iterate only once and immediately remove the attributes instead of populating first and then doing another round of removing? If the problem is remove attributes modifying the $old_node->attributes property, then we could assign it to another variable to use.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that the attributes are a live DOM list and if a node is removed while iterating then a node gets skipped. This pattern of gathering up the list of nodes and then removing them on a second iteration can be seen elsewhere in the codebase. It's not ideal.

@westonruter
Copy link
Copy Markdown
Member Author

We could also create an AMP_Media_Element_Sanitizer abstract class that these 4 sanitizers inherit from. Not sure if that is the best name, but I think that might make more sense than a trait in this case.

@felixarntz
Copy link
Copy Markdown
Collaborator

I think the noscript fallback is something rather specific, so I think a trait makes sense. Also it doesn't lock us in as it kind of allows polymorphism in PHP. If we use a base class, we're tied to it, while with a trait we can go more granular.

For example, if other elements than media could benefit from that noscript fallback in the future, we could use the trait, but a media base class wouldn't fit in that case.

The latest commit includes a trait that does it. We could still easily modify it to become a base class if this is the decision.

* @param array $default_args Original defaults.
* @return array Extended defaults.
*/
protected function merge_default_args( array $default_args ) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right name? When we have other traits then they would potentially have their own default args to merge, so then this should be named uniquely for this trait.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this method isn't even needed? Maybe better to just have each class that uses it define the default arg like they do for other args?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, some sanitizers may want different defaults.

*
* @param string $tag Tag name to get allowed attributes for.
*/
protected function initialize_allowed_attributes( $tag ) {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very specific to the noscript fallbacks, as it assumes there is it one tag spec for the given tag, which is true for the noscript fallback tag specs. In reality AMP has multiple tag specs for each tag name. So maybe this should also be renamed to be specific to the noscript fallback case.

return;
}

$this->initialize_noscript_allowed_attributes( self::$tag );
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only happen if the add_noscript_fallback arg is true, no?

return;
}

$this->initialize_noscript_allowed_attributes( self::$tag );
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

return;
}

$this->initialize_noscript_allowed_attributes( self::$tag );
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

return;
}

$this->initialize_noscript_allowed_attributes( self::$tag );
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

@westonruter westonruter merged commit 9a070af into develop Apr 15, 2019
@westonruter westonruter deleted the improve/noscript-fallbacks branch April 15, 2019 02:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes Signed the Google CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Force amp-img > noscript > img fallbacks to have HTTPS src/srcset

3 participants