Explore HTML parsing and Adoption Agency Algorithm#1
Closed
Explore HTML parsing and Adoption Agency Algorithm#1
Conversation
…king a tag closer This commit marks the start of a bookmark one byte before the tag name start for tag openers, and two bytes before the tag name for tag closers. Setting a bookmark on a tag should set its "start" position before the opening "<", e.g.: ``` <div> Testing a <b>Bookmark</b> ----------------^ ``` The current calculation assumes this is always one byte to the left from $tag_name_starts_at. However, in tag closers that index points to a solidus symbol "/": ``` <div> Testing a <b>Bookmark</b> ----------------------------^ ``` The bookmark should therefore start two bytes before the tag name: ``` <div> Testing a <b>Bookmark</b> ---------------------------^ ```
…closers' into wp_html_processor
Owner
Author
|
This implementation works! I benchmarked it on the HTML parsing spec itself, which is a 12MB HTML document: I tried parsing the HTML spec page (12MB): That's pretty terrible! It's also not surprising. This PR builds an actual document tree and uses inefficient operations such as A text-based version similar to WP_HTML_Tag_Processor should be much faster and more memory-efficient. Let's explore one! |
Owner
Author
Adoption Agency Algorithm requires a full pass through the HTML documentIn the worst-case scenario, the entire document must be parsed to know even the second node. Consider this markup: <b>
<div>
<div><!-- 100k tags amounting to 2 MB of normative HTML --></div>
</b> <!-- suddenly, a rogue </b> -->
</div>
</b>The correct DOM would be: The adoption agency algorithm makes the What if we built an HTML normalizer instead?Since the entire markup must be processed upfront, this could work just as well: class WP_HTML_Processor {
public function __construct( $html, $options ) {
// Apply HTML parsing rules first, unless explicitly asked not to
if ( true !== $options['is_normative'] ) {
$html = WP_HTML_Normalizer::normalize( $html );
}
// From now on, we assume normative markup
$this->html = $html;
}
public function next_by_css( $selector );
public function set_inner_html( $html );
// ... |
This was referenced Feb 23, 2023
adamziel
pushed a commit
that referenced
this pull request
Mar 2, 2023
…air screen.
The table is no longer created by core as of WordPress 3.0, and support for global terms was removed in WordPress 6.1, so `$wpdb->sitecategories` is unset by default.
This commit resolves a "passing null to non-nullable" deprecation notice on PHP 8.1:
{{{
Deprecated: addcslashes(): Passing null to parameter #1 ($string) of type string is deprecated in wp-includes/class-wpdb.php on line 1804
}}}
The `tables_to_repair` filter is available for plugins to readd the table or include any additional tables to repair.
Follow-up to [14854], [14880], [54240].
Props ipajen, chiragrathod103, SergeyBiryukov.
Fixes #57762.
git-svn-id: https://develop.svn.wordpress.org/trunk@55421 602fd350-edb4-49c9-b593-d223f7449a82
adamziel
pushed a commit
that referenced
this pull request
Oct 13, 2023
…om next_posts(). The `esc_url()` function expects to a string for `$url` parameter. There is no input validation within that function. The function contains a `ltrim()` which also expects a string. Passing `null` to this parameter results in `Deprecated: ltrim(): Passing null to parameter #1 ($string) of type string is deprecated` notice on PHP 8.1+. Tracing the stack back, a `null` is being passed to it within `next_posts()` when `get_next_posts_page_link()` returns `null` (it can return a string or `null`). On PHP 7.0 to PHP 8.x, an empty string is returned from `esc_url()` when `null` is passed to it. The change in this changeset avoids the deprecation notice by not invoking `esc_url()` when `get_next_posts_page_link()` returns `null` and instead sets the `$output` to an empty string, thus maintain the same behavior as before (minus the deprecation notice). Adds a test to validate an empty string is returned and the absence of the deprecation (when running on PHP 8.1+). Follow-up to [11383], [9632]. Props codersantosh, nihar007, hellofromTonya, mukesh27, oglekler, rajinsharwar. Fixes #59154. git-svn-id: https://develop.svn.wordpress.org/trunk@56740 602fd350-edb4-49c9-b593-d223f7449a82
adamziel
pushed a commit
that referenced
this pull request
Aug 16, 2024
…Info screen.
This resolves a fatal error if `strict_types` PHP setting is enabled:
{{{
Argument #1 ($num) must be of type float, string given
}}}
Since the goal of the Site Health Info screen is to display raw values where possible, the `number_format()` call here does not seem to provide any benefit.
Props krishneup, sabernhardt, audrasjb, SergeyBiryukov.
Fixes #60364.
git-svn-id: https://develop.svn.wordpress.org/trunk@58847 602fd350-edb4-49c9-b593-d223f7449a82
adamziel
pushed a commit
that referenced
this pull request
Nov 7, 2025
…ord()`.
This resolves a "passing null to non-nullable" deprecation notice on PHP 8.1+:
{{{
Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated
}}}
Follow-up to [50129], [54477].
Props afragen, peterwilsoncc, SergeyBiryukov.
Fixes #62298.
git-svn-id: https://develop.svn.wordpress.org/trunk@59312 602fd350-edb4-49c9-b593-d223f7449a82
adamziel
pushed a commit
that referenced
this pull request
Mar 20, 2026
…om `previous_posts()`. The `esc_url()` function expects to a string for `$url` parameter. There is no input validation within that function. The function contains a `ltrim()` which also expects a string. Passing `null` to this parameter results in `Deprecated: ltrim(): Passing null to parameter #1 ($string) of type string is deprecated` notice on PHP 8.1+. Tracing the stack back, a `null` is being passed to it within `previous_posts()` when `get_previous_posts_page_link()` returns `null` (it can return a string or `null`). On PHP 7.0 to PHP 8.x, an empty string is returned from `esc_url()` when `null` is passed to it. The change in this changeset avoids the deprecation notice by not invoking `esc_url()` when `get_previous_posts_page_link()` returns `null` and instead sets the `$output` to an empty string, thus maintaining the same behavior as before (minus the deprecation notice). Adds a test to validate an empty string is returned and the absence of the deprecation (when running on PHP 8.1+). Follow-up to [9632], [11383], [56740]. Props dd32, alexodiy. Fixes #64864. git-svn-id: https://develop.svn.wordpress.org/trunk@62034 602fd350-edb4-49c9-b593-d223f7449a82
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closing in favor of more visible WordPress#4125