HTML API: Add missing subclass methods to the HTML Processor.#6860
HTML API: Add missing subclass methods to the HTML Processor.#6860dmsnell wants to merge 8 commits intoWordPress:trunkfrom
Conversation
Test using WordPress PlaygroundThe changes in this pull request can previewed and tested using a WordPress Playground instance. WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser. Some things to be aware of
For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation. |
There was a problem hiding this comment.
I think this is a very valuable addition to introducing a new private is_virtual() method. The only thing that I'm not entirely sure about is the best default when performing all overridden methods in the case they are called on virtual nodes. For example, has_self_closing_flag always returns false for the virtual node. The question is how well that lands for people interacting with the API. Could a virtual node have a corresponding virtual closing tag that would validate returning true? The other one that I would be happy to hear some clarification on is get_comment_type. Is it even possible to ever interact with a virtual comment? It might not matter after all when covering it as an edge case. However, we could clarify that and cover it with unit tests to seal it with a long-term contract.
|
|
||
| /** | ||
| * Constructor function. | ||
| * |
There was a problem hiding this comment.
It is worth noting that the 3rd parameter was added in WP 6.6.0. Although it's probably enough to put it for the constructor as it seems to be a new class in this release.
There was a problem hiding this comment.
Yes, given this class was added in 6.6.0, I don't think we need to note the change.
| * | ||
| * @return bool Whether the current token is virtual. | ||
| */ | ||
| private function is_virtual() { |
There was a problem hiding this comment.
Interesting. Since it's private there is no new testing code coverage necessary.
However, we probably should add some tests that cover the cases when overridden methods are called on virtual nodes to ensure they work as intended.
Co-authored-by: Greg Ziółkowski <grzegorz@gziolo.pl>
|
This fixes a regression that was currently in 6.6 that has tests in #6765 where That's working correctly as it was in 6.5 with this PR. |
sirreal
left a comment
There was a problem hiding this comment.
I've spent some time with this and it's a nice improvement to push/pop that addresses one of the known issues with comment types. It's important that we add the methods to the subclass.
I would like to see some comments describing what's happening exactly with real/virtual nodes, but in general this looks good.
| function ( WP_HTML_Token $token ) { | ||
| $this->element_queue[] = new WP_HTML_Stack_Event( $token, WP_HTML_Stack_Event::PUSH ); | ||
| $is_virtual = ! isset( $this->state->current_token ) || $this->is_tag_closer(); | ||
| $same_node = isset( $this->state->current_token ) && $token->node_name === $this->state->current_token->node_name; | ||
| $provenance = ( ! $same_node || $is_virtual ) ? 'virtual' : 'real'; |
There was a problem hiding this comment.
In the push and pop handlers I think we need some explanatory comments.
For virtual nodes, it's not obvious why tag closers are related to virtual nodes. I think this is because when when we find a tag closer but we're in the push handler, it must not be the same node. The opposite is true in the pop handler. We expect to push openers and pop closers.
I'm also curious about the same_node check. Is it sufficient to compare the node name? Is it impossible for the virtual node name to coincide with the current node name?
That said, this does seem to work correctly. I added virtual/real information to the debugger (𝑖 and ℝ), here's what I see:
<a>
<a></a>
<h1><h3></h6>
</p>(The P closer generates a virtual P opener at the wrong level, that was an issue before this PR.)
There was a problem hiding this comment.
Love the addition to the debugger.
…n provenance. This patch introduces two related changes: - It adds missing subclass methods on the HTML Processor which needed to be implemented since it started visiting virtual nodes. These methods need to account for the fact that not all tokens truly exist. - It adds a new concept and internal method, `is_virtual()`, indicating if the currently-matched token comes from the raw text in the input HTML document or if it was the byproduct of semantic parsing rules. This internal method and new vocabulary around token provenance considerably simplifies the logic spread throughout the rest of the class and its subclass methods. Developed in #6860 Discussed in https://core.trac.wordpress.org/ticket/61348 Follow-up to [58304]. Props dmsnell, jonsurrell, gziolo. See #61348. git-svn-id: https://develop.svn.wordpress.org/trunk@58558 602fd350-edb4-49c9-b593-d223f7449a82
…n provenance. This patch introduces two related changes: - It adds missing subclass methods on the HTML Processor which needed to be implemented since it started visiting virtual nodes. These methods need to account for the fact that not all tokens truly exist. - It adds a new concept and internal method, `is_virtual()`, indicating if the currently-matched token comes from the raw text in the input HTML document or if it was the byproduct of semantic parsing rules. This internal method and new vocabulary around token provenance considerably simplifies the logic spread throughout the rest of the class and its subclass methods. Developed in WordPress/wordpress-develop#6860 Discussed in https://core.trac.wordpress.org/ticket/61348 Follow-up to [58304]. Props dmsnell, jonsurrell, gziolo. See #61348. Built from https://develop.svn.wordpress.org/trunk@58558 git-svn-id: http://core.svn.wordpress.org/trunk@58006 1a063a9b-81f0-0310-95a4-ce76da25c4cd
…n provenance. This patch introduces two related changes: - It adds missing subclass methods on the HTML Processor which needed to be implemented since it started visiting virtual nodes. These methods need to account for the fact that not all tokens truly exist. - It adds a new concept and internal method, `is_virtual()`, indicating if the currently-matched token comes from the raw text in the input HTML document or if it was the byproduct of semantic parsing rules. This internal method and new vocabulary around token provenance considerably simplifies the logic spread throughout the rest of the class and its subclass methods. Developed in WordPress/wordpress-develop#6860 Discussed in https://core.trac.wordpress.org/ticket/61348 Follow-up to [58304]. Props dmsnell, jonsurrell, gziolo. See #61348. Built from https://develop.svn.wordpress.org/trunk@58558 git-svn-id: https://core.svn.wordpress.org/trunk@58006 1a063a9b-81f0-0310-95a4-ce76da25c4cd
…n provenance. This patch introduces two related changes: - It adds missing subclass methods on the HTML Processor which needed to be implemented since it started visiting virtual nodes. These methods need to account for the fact that not all tokens truly exist. - It adds a new concept and internal method, `is_virtual()`, indicating if the currently-matched token comes from the raw text in the input HTML document or if it was the byproduct of semantic parsing rules. This internal method and new vocabulary around token provenance considerably simplifies the logic spread throughout the rest of the class and its subclass methods. Developed in WordPress#6860 Discussed in https://core.trac.wordpress.org/ticket/61348 Follow-up to [58304]. Props dmsnell, jonsurrell, gziolo. See #61348. git-svn-id: https://develop.svn.wordpress.org/trunk@58558 602fd350-edb4-49c9-b593-d223f7449a82

See Core-61348
Status
Description
Since the introduction of visiting all nodes in the HTML Processor, the internal logic within the class can be confusing. It's also missing a few important subclass methods.
In this fix the missing subclass methods are added, and a helper
is_virtual()provides clearer semantics to leave the logic more direct in what it's measuring. Instead of repeating the same frail conditions everywhere, the methods can ask directly what we care about: "is this node real or virtual."In 6.7 we can open up the method as
publicif it seems worthwhile.