Conversation
ba8c9cd to
d266233
Compare
|
Nice! Generally seems reasonable, but want to see if @choumx has any additional thoughts. |
|
ampproject/amphtml#11115 would help but haven't had the time. 😓 |
|
I'm glad that an XHTML-friendly option is being considered, but we'll need to support the existing bracketed syntax as well. |
ThierryA
left a comment
There was a problem hiding this comment.
This is very exciting ❤️ I left my CR below.
| } | ||
|
|
||
| // Save all alternative names in lookup to improve performance. | ||
| if ( isset( $attr_spec[ AMP_Rule_Spec::ALTERNATIVE_NAMES ] ) ) { |
There was a problem hiding this comment.
Any reason not to cache this in the conditional statement above to avoid looping through the $attr_spec[ AMP_Rule_Spec::ALTERNATIVE_NAMES ]?
There was a problem hiding this comment.
I thought about that, but there are other alternative names than just the bind ones. For example, srcset is an alt for src, and xlink:href is an alternate for href. These need to be accounted for as well. So that is why there is a separate conditional.
| // Prepare whitelists. | ||
| $this->allowed_tags = $this->args['amp_allowed_tags']; | ||
| foreach ( AMP_Rule_Spec::$additional_allowed_tags as $tag_name => $tag_rule_spec ) { | ||
| $this->allowed_tags[ $tag_name ][] = $tag_rule_spec; |
There was a problem hiding this comment.
Is $this->allowed_tags[ $tag_name ] always an array, amp-share-tracking doesn't seem to be declared as an array?
There was a problem hiding this comment.
This logic was just moved from the sanitize method to the constructor. Yes, it should always an array. The $additional_allowed_tags is an array of arrays:
There was a problem hiding this comment.
Yes, though it is still looking for $this->allowed_tags[ 'amp-share-tracking' ] which doesn't exists isn't it? To make sure the array is conventionally declared, then we should have an:
if ( ! isset( $this->allowed_tags[ $tag_name ] ) ) {
$this->allowed_tags[ $tag_name ] = array();
}Like you have done here.
There was a problem hiding this comment.
Potentially but it's just moved from existing code and it didn't have any problem. And actually, my use of isset() is actually overkill it seems. It's actually not required in PHP at all. For example:
<?php
error_reporting(E_ALL);
$array = array();
$array['foo']['bar']['baz'] = 'Hello!';
print_r( $array );Results in:
Array
(
[foo] => Array
(
[bar] => Array
(
[baz] => Hello!
)
)
)
There are no notices or warnings. It works all the way back to PHP 4.3.0 (and beyond potentially): https://3v4l.org/b8ZaG
| /** | ||
| * Replace AMP binding attributes with something that libxml can parse (as HTML5 data-* attributes). | ||
| * | ||
| * This is necessary necessary because attributes in square brackets are not understood in PHP and |
There was a problem hiding this comment.
Typo mistake necessary necessary.
There was a problem hiding this comment.
It is very necessary 😉
|
|
||
| $dom = new DOMDocument(); | ||
|
|
||
| $document = self::convert_amp_bind_attributes( $document ); |
There was a problem hiding this comment.
I think at this stage it would be beneficial to look at extending the DOMDocument. For example here is a quick prototype which handles the amp binding conversion:
/**
* Custom AMP DOM Document.
*/
class AMP_Dom_Document extends DOMDocument {
/**
* Class constructor.
*
* @param string $version The version number of the document as part of the XML declaration.
* @param string $encoding The encoding of the document as part of the XML declaration.
*/
public function __construct( $version = null, $encoding = null ) {
parent::__construct( $version, $encoding );
}
/**
* Load HTML.
*
* @param string $source The HTML code
* @param int $options Additional Libxml parameters
* @return boolean True on success or False on failure
*/
public function loadHTML( $source, $options = 0 ) {
$source = AMP_DOM_Utils::convert_amp_bind_attributes( $source );
parent::loadHTML( $source, $options );
}
/**
* Save HTML.
*
* @param DOMNode $node Optional parameter to output a subset of the document.
* @return string The document (or node) HTML code as string
*/
public function saveHTML( DOMNode $node = null ) {
$html = parent::saveHTML( $node );
return AMP_DOM_Utils::restore_amp_bind_attributes( $html );
}
}Then new AMP_Dom_Document would be called instead of new DOMDocument.
Eventually, AMP_Dom_Document could call the entire sanitization logic making the processing workflow easier to understand and more bullet proof.
Also, instead of removing attributes causing Warning: DOMDocument::loadHTML(): error parsing attribute name and restoring them after saveHTML(), we could look at potentially using DOMDocument::registerNodeClass() in our AMP_DOM_Document::__construct to allow AMP attributes and values.
There was a problem hiding this comment.
That is a good idea, but I want to wait to do that until we do the sanitization of the entire doc including the head.
There was a problem hiding this comment.
Anyway, it is a good enhancement but it's not critical for this first merge.
There was a problem hiding this comment.
Yes that is perfectly fine, good to have it as a @todo.
There was a problem hiding this comment.
@ThierryA And this is now done! 😄 There is a shiny new \Amp\AmpWP\Dom\Document courtesy of @schlessera via #3758.
| */ | ||
| private function process_alternate_names( $attr_spec_list ) { | ||
| foreach ( $attr_spec_list as $attr_name => &$attr_spec ) { | ||
| if ( '[' === $attr_name[0] ) { |
There was a problem hiding this comment.
Is $this->args['amp_bind_placeholder_prefix'] prefix used for spec attributes like [src]?
There was a problem hiding this comment.
Yes, since such attributes aren't understood by DOMDocument we add an safe placeholder version of that same attribute (e.g. amp-binding-1234…-src) and add it to the alternative names so it can be matched instead when the nodes are processed.
There was a problem hiding this comment.
Perhaps we could use another term than binding in the future if some of the attributes are not specific to the binding component.
| * @param array $attr_spec_list Attribute spec list. | ||
| * @return array Modified attribute spec list. | ||
| */ | ||
| private function process_alternate_names( $attr_spec_list ) { |
There was a problem hiding this comment.
This function is called for all tags and then for all attributes of each tag which is quite resource heavy. My understanding is that it is used to add the $this->args['amp_bind_placeholder_prefix'] to the AMP_Rule_Spec::ATTR_SPEC_LIST which is later on used for the allowed attrs and flagging scripts.
To avoid having to do all these loops, we could perhaps do the conditional check at the is_amp_allowed_attribute() and process_node() level since we have access to the single node attributes?
There was a problem hiding this comment.
Actually it should just be called once for each tag. It loops over all attributes itself.
By doing it once at the constructor we avoid double-nested foreach loops inside the sanitize_disallowed_attributes_in_node method below. Since the double foreach O(n^2) was being done at the node level, it could be extremely poor for performance. So now is_amp_allowed_attribute is able to use the rev_alternate_attr_name_lookup to do fast lookups.
|
Nice!
…On Thu, Dec 19, 2019 at 6:20 PM Weston Ruter ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In includes/utils/class-amp-dom-utils.php
<#895 (comment)>:
> @@ -56,6 +56,8 @@ public static function get_dom( $document ) {
$dom = new DOMDocument();
+ $document = self::convert_amp_bind_attributes( $document );
@ThierryA <https://github.com/ThierryA> And this is now done! 😄 There is
a shiny new \Amp\AmpWP\Dom\Document courtesy of @schlessera
<https://github.com/schlessera> via #3758
<#3758>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#895?email_source=notifications&email_token=AA3KLWVMJZR6SIJSNAZMOZDQZOUMPA5CNFSM4ENLZFZKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCP2GO2Q#discussion_r359975012>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA3KLWRIRWYPNH4PF3STMVDQZOUMPANCNFSM4ENLZFZA>
.
--
[image: Google Logo]
Thierry Muller
Developer Relations Program Manager
Engineering - News & Content Ecosystem
thierrymuller@google.com
|
[x]binding attribute syntax withdata-amp-binding-RANDOMSTRING---xso PHP can parse it.data-amp-binding...---as being bracketed instead.amp-bindcomponent script when it comes across an AMP bound attribute.data-amp-binding-...with the original syntax.Fixes #836.