Skip to content

The HTML4 meta[http-equiv=Content-Type] causes validation error to leak and should convert automatically to HTML5 meta[charset] #3469

@westonruter

Description

@westonruter

Bug Description

As reported in a support forum topic, if a theme like Koji declares the Content-Type using the HTML4 method like this:

<meta http-equiv="content-type" content="<?php bloginfo( 'html_type' ); ?>" charset="<?php bloginfo( 'charset' ); ?>" />

Then the result the AMP plugin generating:

<meta content="text/html; charset=UTF-8" charset="UTF-8" http-equiv="Content-Type">

Somehow the validating sanitizer is leaking the http-equiv and content attributes here, and this causes an error in the validator:

image

This is even though the sanitizer is reporting the former as being invalid:

image

Even though it is accepted, it is not getting sanitized. So that's confusing.

The AMP plugin should transparently convert any such meta Content-Type tag into the HTML meta charset that AMP requires:

<meta charset=utf-8>

See also #2970. This would have avoided an error in adding support for Twenty Twenty (#3342 (comment)) which was fixed in Twenty Twenty via WordPress/twentytwenty#640.

Here is a workaround plugin for the issue: https://gist.github.com/westonruter/d18fd37fb9a58a26ae375a0cb9570831

There should not be any need to show a validation error as this should automatically get converted. If the site is using an encoding other than UTF-8, then technically the encoding should be getting converted to UTF-8 before being parsed. See:

// @todo If 'utf-8' is not the blog charset, then we'll need to do some character encoding conversation or "entityification".
if ( 'utf-8' !== strtolower( get_bloginfo( 'charset' ) ) ) {
/* translators: %s: the charset of the current site. */
trigger_error( esc_html( sprintf( __( 'The database has the %s encoding when it needs to be utf-8 to work with AMP.', 'amp' ), get_bloginfo( 'charset' ) ) ), E_USER_WARNING ); // phpcs:ignore WordPress.PHP.DevelopmentFunctions.error_log_trigger_error
}

This would fix #855.

Expected Behaviour

An AMP-valid HTML5 meta charset should only be output.

Steps to reproduce

  1. Install and activate Koji theme.
  2. Activate Standard or Transitional mode.
  3. See the the AMP Validator report a validation error for http-equiv attribute.
  4. See the admin bar report a validation error for the same.

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • The http-equiv attribute should not be leaked to the frontend.
  • Convert HTML4 meta[http-equiv] to HTML5 meta[charset].
  • (Nice to have) Convert encoding of input documents to UTF-8.

Implementation brief

QA testing instructions

Demo

Changelog entry

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't workingQA passedHas passed QA and is done

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions