Update/block support settings use tag processor#46625
Merged
Conversation
adamziel
approved these changes
Dec 22, 2022
Contributor
adamziel
left a comment
There was a problem hiding this comment.
I love how simpler is the final code! Way to go ❤️
7e7b3b8 to
2f40651
Compare
…pper. When we introduced #42124 new block supports behavior we did so with a PCRE replacement that opened the possibility for a few bugs related to processing the HTML attributes. It was noted in that PR that this would be a good candidate for the `WP_HTML_Tag_Processor`. In this patch we're performing that replacement as follow-up work. This should improve the reliability and hopefully the readability of what is being done to the HTML as it renders.
2f40651 to
b187df8
Compare
|
Flaky tests detected in b187df8. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/3897623939
|
85 tasks
dmsnell
added a commit
to dmsnell/wordpress-develop
that referenced
this pull request
Jan 26, 2023
This commit pulls in the HTML Tag Processor from the Gutenbeg repository.
The Tag Processor attempts to be an HTML5-spec-compliant parser that
provides the ability in PHP to find specific HTML tags and then add,
remove, or update attributes on that tag. It provides a safe and reliable
way to modify the attribute on HTML tags.
```php
// Add missing `rel` attribute to links.
$p = new WP_HTML_Tag_Processor( $block_content );
if ( $p->next_tag( 'A' ) && empty( $p->get_attribute( 'rel' ) ) ) {
$p->set_attribute( 'noopener nofollow' );
}
return $p->get_updated_html();
```
Introduced originally in WordPress/gutenberg#42485 and developed within
the Gutenberg repository, this HTML parsing system was built in order
to address a persistent need (properly modifying HTML tag attributes)
and was motivated after a sequence of block editor defects which stemmed
from mismatches between actual HTML code and expectectations for HTML
input running through existing naive string-search-based solutions.
The Tag Processor is intended to operate fast enough to avoid being an
obstacle on page render while using as little memory overhead as possible.
It is practically a zero-memory-overhead system, and only allocates memory
as changes to the input HTML document are enqueued, releasing that memory
when flushing those changes to the document, moving on to find the next
tag, or flushing its entire output via `get_updated_html()`.
Rigor has been taken to ensure that the Tag Processor will not be consfused
by unexpected or non-normative HTML input, including issues arising from
quoting, from different syntax rules within `<title>`, `<textarea>`, and
`<script>` tags, from the appearance of rare but legitimate comment and
XML-like regions, and from a variety of syntax abnormalities such as
unbalanced tags, incomplete syntax, and overlapping tags.
The Tag Processor is constrained to parsing an HTML document as a stream
of tokens. It will not build an HTML tree or generate a DOM representation
of a document. It is designed to start at the beginning of an HTML
document and linearly scan through it, potentially modifying that document
as it scans. It has no access to the markup inside or around tags and it
has no ability to determine which tag openers and tag closers belong to each
other, or determine the nesting depth of a given tag.
It includes a primitive bookmarking system to remember tags it has previously
visited. These bookmarks refer to specific tags, not to string offsets, and
continue to point to the same place in the document as edits are applied. By
asking the Tag Processor to seek to a given bookmark it's possible to back
up and continue processsing again content that has already been traversed.
Attribute values are sanitized with `esc_attr()` and rendered as double-quoted
attributes. On read they are unescaped and unquoted. Authors wishing to rely on
the Tag Processor therefore are free to pass around data as normal strings.
Convenience methods for adding and removing CSS class names exist in order to
remove the need to process the `class` attribute.
```php
// Update heading block class names
$p = new WP_HTML_Tag_Processor( $html );
while ( $p->next_tag() ) {
switch ( $p->get_tag() ) {
case 'H1':
case 'H2':
case 'H3':
case 'H4':
case 'H5':
case 'H6':
$p->remove_class( 'wp-heading' );
$p->add_class( 'wp-block-heading' );
break;
}
return $p->get_updated_html();
```
The Tag Processor is intended to be a reliable low-level library for traversing
HTML documents and higher-level APIs are to be built upon it. Immediately, and
in Core Gutenberg blocks it is meant to replace HTML modification that currently
relies on RegExp patterns and simpler string replacements.
See the following for examples of such replacement:
WordPress/gutenberg@1315784
https://github.com/WordPress/gutenberg/pull/45469/files#diff-dcd9e1f9b87ca63efe9f1e834b4d3048778d3eca41aa39c636f8b16a5bb452d2L46
WordPress/gutenberg#46625
Co-Authored-By: Adam Zielinski <adam@adamziel.com>
Co-Authored-By: Bernie Reiter <ockham@raz.or.at>
Co-Authored-By: Grzegorz Ziolkowski <grzegorz@gziolo.pl>
dmsnell
added a commit
to dmsnell/wordpress-develop
that referenced
this pull request
Feb 1, 2023
Porting part of WordPress/gutenberg#46625 Replace use of fragile `preg_match` with Tag Processor when adding an element class name to its wrapper.
dmsnell
added a commit
to dmsnell/wordpress-develop
that referenced
this pull request
Feb 1, 2023
Porting part of WordPress/gutenberg#46625 Replace use of fragile `preg_match` with Tag Processor when adding an element class name to its wrapper.
dmsnell
added a commit
to dmsnell/wordpress-develop
that referenced
this pull request
Feb 1, 2023
Porting part of WordPress/gutenberg#46625 Replace use of fragile `preg_match` with Tag Processor when adding an element class name to its wrapper.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Use
WP_HTML_Tag_Processorto add new class name to wrapping elements when rendering block supports.Why?
This class was built to quickly and reliably modify HTML tag attributes. It circumvents specific problems, such as matching on the wrong attributes (such as
data-custom-class="some value"), overlooking matches (such asclass=blueorclass='wp-block-group'), writing updates in a way that get overlooked by the browser (by writing to the end of the tag instead of before any potential duplicate attributes), and by writing invalid content to the HTML (such as through a bug in a PCRE pattern greedily matching more than it should).A side perk here is that application code becomes more focused on the semantic operations it's performing rather than the mechanisms through which it does it.
How?
Utilizes the API provided by the Tag Processor to do the heavy lifting for us.
Testing
Hopefully the unit tests cover this.
Otherwise try to use block supports and make sure that the expected classes appear where they are and not where they shouldn't be.
I'm not that familiar with how this system works so I hope you can help figure out what needs to be tested.