Skip to content

Include AMP PHP Library#4513

Merged
swissspidy merged 89 commits intomainfrom
add/amp-wp
Oct 14, 2020
Merged

Include AMP PHP Library#4513
swissspidy merged 89 commits intomainfrom
add/amp-wp

Conversation

@swissspidy
Copy link
Copy Markdown
Collaborator

@swissspidy swissspidy commented Sep 16, 2020

Summary

This PR addresses on of our longer standing and more pressing issues: built-in AMP sanitization and optimization for stories. See this doc for reference.

How it works:

  • The relevant parts of the AMP plugin are included as a composer dependency
  • Our main rendered class, HTML.php, calls the new Sanitization and Optimization classes which do the respective jobs.
  • The logic of these classes is mostly copied from AMP_Theme_Support
  • Custom sanitizers are added on top of (or in replacement of) existing sanitizers

Relevant Technical Choices

  • Includes the AMP WordPress plugin as a composer dependency
    • Prevents the plugin's file autoloader from running using mcaskill/composer-exclude-files, to avoid loading any procedural code by the AMP plugin.
      For one because they'd otherwise cause conflicts if functions are already defined. But also because they often cause side-effects like accessing options from the database, or requiring AMP__VERSION to be set.
    • PHP-CSS-Parser patches have been copied for the time being, as there seems to be a need to fix paths to patches in the amp-wp repo. See Local patch could not be downloaded for dependencies cweagans/composer-patches#315 for the actual issue occurring here.
  • Uses classes provided by the plugin to construct a DOMDocument
    • This drastically reduces the size of the HTML class and removes duplicate logic.
  • Copies chunks of AMP_Theme_Support to Sanitization to run markup through sanitization and optimization.

Meta_Sanitizer

A shameless copy of AMP_Meta_Sanitizer, except that it does not call amp_get_boilerplate_stylesheets(). As mentioned above, we're not loading that function.

  • Contribute this upstream

Canonical_Sanitizer (#4193)

Enforces adding a link[rel=canonical] to documents to make sure they are always valid.

The AMP plugin itself does that already in \AMP_Theme_Support::ensure_required_markup(). We just do the same here.

This sanitizer is only used when the AMP plugin is not active. It could be easily contributed to the AMP plugin, too.

  • Contribute this upstream

Story_Sanitizer

This new sanitizer was extracted from the HTML class, and currently deals with setting publisher logos and poster images. Also removes the amp attribute from the document if either is missing.

  • Ensure this also works well when the AMP plugin is used.

Related: #4762

Avoiding Conflicts

See #4543 for how we're avoiding conflicts with the AMP plugin itself (or any other plugin using the same libraries) by prefixing them before shipping.

To-do

Not in the scope of this PR, but tasks to look at soon after.

  • Enable transient caching in AMP_Style_Sanitizer
    Currently disabled because add_poster_images calls AMP_Options_Manager, which in turn requires AMP__VERSION to be defined. The method is private, so can't be overridden in an extended class.
    Aside: CSS used in stories is usually rather small, so not sure transient caching is actually needed
  • Load packages from Packagist once available.
  • Contribute upstream changes where necessary / desirable
  • Find workaround for Local patch could not be downloaded for dependencies cweagans/composer-patches#315

User-facing changes

Web Stories are sanitized and optimized by default, even when the AMP plugin is not active.

Testing Instructions

  1. Check out this branch and run composer install or check out this branch on the QA site
  2. Publish a story, don't set a poster
  3. Add some analytics ID in the settings
  4. Maybe add some plugin that adds some invalid markup
  5. Verify it's still valid AMP

Fixes #4650
Fixes #4193

@swissspidy swissspidy added Group: Integration Integration with other platforms and plugins Group: WordPress Changes related to WordPress or Gutenberg integration AMP Output Issues related to AMP output and validation Pod: WP & Infra labels Sep 16, 2020
@google-cla google-cla bot added the cla: yes label Sep 16, 2020
@github-actions

This comment has been minimized.

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 16, 2020

Codecov Report

Merging #4513 into main will decrease coverage by 7.88%.
The diff coverage is 85.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4513      +/-   ##
==========================================
- Coverage   83.85%   75.96%   -7.89%     
==========================================
  Files         891      898       +7     
  Lines       15760    15938     +178     
==========================================
- Hits        13215    12107    -1108     
- Misses       2545     3831    +1286     
Flag Coverage Δ
#karmatests 52.61% <ø> (-20.47%) ⬇️
#unittests 65.77% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
includes/AMP/Integration/AMP_Story_Sanitizer.php 0.00% <0.00%> (ø)
includes/Story_Renderer/HTML.php 72.88% <60.97%> (-5.99%) ⬇️
includes/AMP/Sanitization.php 85.48% <85.48%> (ø)
includes/AMP/Traits/Sanitization_Utils.php 90.69% <90.69%> (ø)
bin/utils/createBuild.js 100.00% <100.00%> (ø)
includes/AMP/Canonical_Sanitizer.php 100.00% <100.00%> (ø)
includes/AMP/Meta_Sanitizer.php 100.00% <100.00%> (ø)
includes/AMP/Optimization.php 100.00% <100.00%> (ø)
includes/AMP/Story_Sanitizer.php 100.00% <100.00%> (ø)
includes/REST_API/Link_Controller.php 93.42% <100.00%> (ø)
... and 151 more

@westonruter

This comment has been minimized.

@swissspidy

This comment has been minimized.

@amedina
Copy link
Copy Markdown
Contributor

amedina commented Sep 17, 2020

This is great!

As discussed, the initial goal can be provide sanitization similar to how the AMP Reader Mode with dev tools turned off works in the AMP plugin: ensure the full validity of stories while silently sanitizing any offending markup.

This work also provides excellent input to the architecture/modularization work on the plugin side.

@swissspidy swissspidy force-pushed the add/amp-wp branch 4 times, most recently from b556f49 to a08f8cd Compare September 18, 2020 14:16
@swissspidy

This comment has been minimized.

@swissspidy swissspidy force-pushed the add/amp-wp branch 2 times, most recently from 1f5f761 to cdb6b26 Compare September 23, 2020 21:15
@swissspidy swissspidy added Type: Enhancement New feature or improvement of an existing feature cla: yes and removed cla: no labels Oct 13, 2020
@google-cla

This comment has been minimized.

@google-cla google-cla bot added cla: no and removed cla: yes labels Oct 13, 2020
@google-cla
Copy link
Copy Markdown

google-cla bot commented Oct 13, 2020

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

@swissspidy
Copy link
Copy Markdown
Collaborator Author

swissspidy commented Oct 13, 2020

High time for an update on this one!

I know that this change wasn't the easiest to review, with over 4k changed LOC in one massive PR, even when most of the code is just copied from the AMP plugin and not too much our concern.

In any case, it helped get a much better sense of what's needed to contribute all of this back to the AMP plugin. Plus, it's a giant leap forward for built-in AMP sanitization.

Now it's crucial to get some real-world feedback on it, which is why I think it's high time to get this thing merged!

We can then continue to iterate on it in follow-up tickets if anything arises.

/cc @swissspidy @westonruter

Co-authored-by: Weston Ruter <westonruter@google.com>
@google-cla google-cla bot added cla: no and removed cla: yes labels Oct 13, 2020
@google-cla
Copy link
Copy Markdown

google-cla bot commented Oct 14, 2020

A Googler has manually verified that the CLAs look good.

(Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.)

ℹ️ Googlers: Go here for more info.

@swissspidy swissspidy merged commit a995d7a into main Oct 14, 2020
@swissspidy swissspidy deleted the add/amp-wp branch October 14, 2020 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AMP Output Issues related to AMP output and validation Group: Integration Integration with other platforms and plugins Group: WordPress Changes related to WordPress or Gutenberg integration Type: Enhancement New feature or improvement of an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Built-in AMP sanitization Enforce adding canonical tag at all times

8 participants