Skip to content

Mustache templates for tables using script[type=text/plain] are corrupted #4254

@westonruter

Description

@westonruter

Bug Description

As reported on a support forum topic by @spasibych, a workaround for Mustache templates for tables is broken by the DOM parser in the AMP plugin.

Please refer to the amp.dev docs on Mustache templates for tables.

In particular, given this input:

<script type="text/plain" template="amp-mustache">
  <table>
    <tr>
{{#foo}}<td></td>{{/foo}}
    </tr>
  </table>
</script>

The AMP plugin is outputting this as:

<script type="text/plain" template="amp-mustache">
  <table>
    <tr>
{{#foo}}<td>{{/foo}}		
</script>

However, the use of a script to contain the template is one of the workarounds mentioned:

Workarounds include wrapping Mustache sections in HTML comments (e.g. <!-- {{#bar}} -->), using non-table elements like <div> instead or using a <script type="text/plain"> tag to define your templates.

It turns out that that the HTML comment workaround does work. Given this input:

<template type="amp-mustache">
  <table>
    <tr>
<!--{{#foo}}--><td></td><!--{{/foo}}-->
    </tr>
  </table>
</template>

The AMP plugin makes no modifications in the output.

Expected Behaviour

The mustache template for a table using the <script type="text/plain"> workaround should work.

Steps to reproduce

  1. Create a Custom HTML block with the above <script type="text/plain" template="amp-mustache">...</script> element.
  2. View the AMP page.
  3. Notice that the closing tags are dropped.

Screenshots

Additional context

  • WordPress version:
  • Plugin version:
  • Gutenberg plugin version (if applicable):
  • AMP plugin template mode:
  • PHP version:
  • OS:
  • Browser: [e.g. chrome, safari]
  • Device: [e.g. iPhone6]

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

Implementation brief

As noted in ampproject/amphtml#26656 (comment):

We'll have to figure out some other mechanism to workaround libxml failing to parse the HTML. Something that comes to mind is doing a search/replace to convert <script type="text/plain" template="amp-mustache"> into some other element (e.g. tmp-script) during parsing, and then replace that placeholder with the original script once the DOM has been constructed.

This turns out to work: https://3v4l.org/0qq5S

<?php

$script = '
    <script type="text/plain" template="amp-mustache">
      <table>
        <tr>
            {{#foo}}<td></td>{{/foo}}
        </tr>
      </table>
    </script>
';

$dom = new DOMDocument();

$tmp_script = preg_replace( 
    '#<script(\s[^>]*template="amp-mustache"[^>]*)>(.*?)</script>#s', 
    '<tmp-script$1>$2</tmp-script>',
    $script
);

@$dom->loadHTML( '<!DOCTYPE html><html><head><meta charset="utf-8"></head><body>' . $tmp_script . '</body></html>' );

$tmp_script_elements = iterator_to_array( $dom->getElementsByTagName( 'tmp-script' ) );
foreach ( $tmp_script_elements as $tmp_script_element ) {
    $script = $dom->createElement( 'script' );
    foreach ( $tmp_script_element->attributes as $attr ) {
        $script->setAttribute( $attr->nodeName, $attr->nodeValue );
    }
    while ( $tmp_script_element->firstChild ) {
        $script->appendChild( $tmp_script_element->firstChild );
    }
    $tmp_script_element->parentNode->replaceChild( $script, $tmp_script_element );
}

echo $dom->saveHTML();

Note the regex for doing the search/replace will need to be hardened, in particular the regex for matching the attributes (in any order).

The logic should go into the Document class and called similarly to the maybe_replace_noscript_elements method:

amp-wp/src/Dom/Document.php

Lines 602 to 639 in ea28148

/**
* Maybe replace noscript elements with placeholders.
*
* This is done because libxml<2.8 might parse them incorrectly.
* When appearing in the head element, a noscript can cause the head to close prematurely
* and the noscript gets moved to the body and anything after it which was in the head.
* See <https://stackoverflow.com/questions/39013102/why-does-noscript-move-into-body-tag-instead-of-head-tag>.
* This is limited to only running in the head element because this is where the problem lies,
* and it is important for the AMP_Script_Sanitizer to be able to access the noscript elements
* in the body otherwise.
*
* @see maybe_restore_noscript_elements() Reciprocal function.
*
* @param string $html HTML string to adapt.
* @return string Adapted HTML string.
*/
private function maybe_replace_noscript_elements( $html ) {
if ( ! version_compare( LIBXML_DOTTED_VERSION, '2.8', '<' ) ) {
return $html;
}
return preg_replace_callback(
'#^.+?(?=<body)#is',
function ( $head_matches ) {
return preg_replace_callback(
'#<noscript[^>]*>.*?</noscript>#si',
function ( $noscript_matches ) {
$placeholder = sprintf( '<!--noscript:%s-->', (string) wp_rand() );
$this->noscript_placeholder_comments[ $placeholder ] = $noscript_matches[0];
return $placeholder;
},
$head_matches[0]
);
},
$html
);
}

Note that this replacement must be done immediately after parsing, since the Document will then need to have the script replaced to successfully go through the sanitizers.

QA testing instructions

Demo

Changelog entry

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't workingQA passedHas passed QA and is done

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions