Clarify how PLAINTEXT elements may contain child nodes.#10540
Merged
annevk merged 8 commits intowhatwg:mainfrom Aug 14, 2024
Merged
Clarify how PLAINTEXT elements may contain child nodes.#10540annevk merged 8 commits intowhatwg:mainfrom
annevk merged 8 commits intowhatwg:mainfrom
Conversation
Resolves whatwg#8009 All major HTML parsers reconstruct active formatting elements when inserting a new PLAINTEXT element, leaving formatting elements as children of the PLAINTEXT element. However, the spec implies that this should not happen, because it doesn't instruct reconstruction. The implication in the spec is that a PLAINTEXT element may contain no children other than the plaintext content of the remainder of the HTML document. > Once a start tag with the tag name "plaintext" has been seen, that > will be the last token ever seen other than character tokens > (and the end-of-file token), because there is no way to switch out > of the PLAINTEXT state. This patch updates the spec to conform to the existing implementations by adding the mention to trigger reconstruction.
Member
|
See #8009 (comment) |
Contributor
Author
|
thanks @zcorpan - I have updated the patch and included screenshots of the changed section. I think that explicitly calling out that active format reconstruction may take place, and that PLAINTEXT elements may have child nodes, would be a worthwhile addition to the note. |
zcorpan
requested changes
Aug 5, 2024
zcorpan
requested changes
Aug 6, 2024
Co-authored-by: Simon Pieters <zcorpan@gmail.com>
zcorpan
approved these changes
Aug 8, 2024
Contributor
Author
|
Thanks @zcorpan! |
annevk
reviewed
Aug 12, 2024
Co-authored-by: Anne van Kesteren <annevk@annevk.nl>
Member
|
@dmsnell you will also need to make your membership of the "automattic" GitHub organization public to satisfy the IPR bot. |
Co-authored-by: Anne van Kesteren <annevk@annevk.nl>
Contributor
Author
|
Thanks again @annevk. I've lower-cased the |
annevk
approved these changes
Aug 14, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves #8009
When there are active formatting elements open when encountering a start tag whose name is PLAINTEXT, further character tokens may reconstruct the active formatting elements, but the spec implies that this should not happen, because PLAINTEXT effectively disables the HTML parsing after it.
This is confusing because while the tokenizer remains in PLAINTEXT state, the tree builder continues to apply the normal rules for its insertion mode, which is where active format reconstruction may be triggered.
While this is confusing, because it seems to contradict the purpose of the PLAINTEXT element, all major browsers follow this behavior, and a clarified note in the spec could help implementors to avoid mistaking this behavior (as I did).
Before

After

/parsing.html ( diff )