Skip to content

Meta sanitizer erroneously moves HTML5 microdata meta tags to head #4502

@westonruter

Description

@westonruter

Bug Description

As originally reported in a support topic, the 1.5.x update broke certain Schema.org metadata as output by Yoast.

For example, put this markup in a Custom HTML block:

<span class="td-page-meta" itemprop="author" itemscope itemtype="https://schema.org/Person">
	<meta itemprop="name" content="Siva">
</span>
<meta itemprop="datePublished" content="2020-03-24T18:05:15+05:30">
<meta itemprop="dateModified" content="2020-03-24T18:05:15+05:30">
<meta itemscope itemprop="mainEntityOfPage" itemtype="https://schema.org/WebPage" itemid="https://crictamil.in/mcclenaghan-talks-about-rohith-missing/">
<span class="td-page-meta" itemprop="publisher" itemscope itemtype="https://schema.org/Organization">
	<span class="td-page-meta" itemprop="logo" itemscope itemtype="https://schema.org/ImageObject">
		<meta itemprop="url" content="https://crictamil.in/wp-content/uploads/2018/05/logo.png">
	</span>
	<meta itemprop="name" content="Cric Tamil">
	<meta itemprop="url" content="https://crictamil.in">
</span>
<meta itemprop="headline " content="இவர் மட்டும் இருந்திருந்தா இந்திய அணி நியூஸிலாந்திடம் இவ்வளவு மோசமாக தோற்றிருக்காது - மெக்லனகன் கருத்து">
<span class="td-page-meta" itemprop="image" itemscope itemtype="https://schema.org/ImageObject">
	<meta itemprop="url" content="https://crictamil.in/wp-content/uploads/2020/03/Mcclenaghan-1.jpg">
	<meta itemprop="width" content="1280"><meta itemprop="height" content="720">
</span>

When viewing the AMP page, the Custom HTML block then becomes:

<span class="td-page-meta" itemprop="author" itemscope itemtype="https://schema.org/Person"></span>
<span class="td-page-meta" itemprop="publisher" itemscope itemtype="https://schema.org/Organization">
	<span class="td-page-meta" itemprop="logo" itemscope itemtype="https://schema.org/ImageObject"></span>
</span>
<span class="td-page-meta" itemprop="image" itemscope itemtype="https://schema.org/ImageObject"></span>

All of the meta tags were moved to the head, causing structured data errors.

Expected Behaviour

The meta tags used in the body for HTML5 Microdata should not be moved to the head.

Additional context

  • WordPress version: 5.4
  • Plugin version: 1.5.x

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

Implementation brief

The quick fix may be just to omit processing meta tags that have the HTML5 Microdata attributes:

--- a/includes/sanitizers/class-amp-meta-sanitizer.php
+++ b/includes/sanitizers/class-amp-meta-sanitizer.php
@@ -70,7 +70,7 @@ class AMP_Meta_Sanitizer extends AMP_Base_Sanitizer {
 	 * Sanitize.
 	 */
 	public function sanitize() {
-		$meta_elements = $this->dom->getElementsByTagName( static::$tag );
+		$meta_elements = $this->dom->xpath->query( '//meta[ not( @itemid ) and not( @itemref ) and not( @itemprop ) and not( @itemscope ) and not( @itemtype ) ]' );
 
 		// Remove all nodes for easy reordering later on.
 		$meta_elements = array_map(

There may be other meta tags which should be excluded. One possibility is the property and scheme attributes.

See the generic meta tag spec for more: https://github.com/ampproject/amphtml/blob/286a6302fcd007eab303b105c161e76d9e880322/validator/validator-main.protoascii#L701-L727

We may need to rather use a denylist of meta tags rather than an allowlist.

QA testing instructions

Demo

Changelog entry

Metadata

Metadata

Assignees

Labels

BugSomething isn't workingP0High prioritySanitizers

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions