Clarify how PLAINTEXT elements may contain child nodes. by dmsnell · Pull Request #10540 · whatwg/html

dmsnell · 2024-08-02T01:31:30Z

Resolves #8009

When there are active formatting elements open when encountering a start tag whose name is PLAINTEXT, further character tokens may reconstruct the active formatting elements, but the spec implies that this should not happen, because PLAINTEXT effectively disables the HTML parsing after it.

Once a start tag with the tag name "plaintext" has been seen, that
will be the last token ever seen other than character tokens
(and the end-of-file token), because there is no way to switch out
of the PLAINTEXT state.

This is confusing because while the tokenizer remains in PLAINTEXT state, the tree builder continues to apply the normal rules for its insertion mode, which is where active format reconstruction may be triggered.

While this is confusing, because it seems to contradict the purpose of the PLAINTEXT element, all major browsers follow this behavior, and a clarified note in the spec could help implementors to avoid mistaking this behavior (as I did).

Before

After

/parsing.html ( diff )

Resolves whatwg#8009 All major HTML parsers reconstruct active formatting elements when inserting a new PLAINTEXT element, leaving formatting elements as children of the PLAINTEXT element. However, the spec implies that this should not happen, because it doesn't instruct reconstruction. The implication in the spec is that a PLAINTEXT element may contain no children other than the plaintext content of the remainder of the HTML document. > Once a start tag with the tag name "plaintext" has been seen, that > will be the last token ever seen other than character tokens > (and the end-of-file token), because there is no way to switch out > of the PLAINTEXT state. This patch updates the spec to conform to the existing implementations by adding the mention to trigger reconstruction.

zcorpan · 2024-08-02T08:06:29Z

See #8009 (comment)

dmsnell · 2024-08-05T19:12:18Z

thanks @zcorpan - I have updated the patch and included screenshots of the changed section. I think that explicitly calling out that active format reconstruction may take place, and that PLAINTEXT elements may have child nodes, would be a worthwhile addition to the note.

source

Co-authored-by: Simon Pieters <zcorpan@gmail.com>

dmsnell · 2024-08-08T17:12:29Z

Thanks @zcorpan!

source

Co-authored-by: Anne van Kesteren <annevk@annevk.nl>

annevk · 2024-08-13T08:56:14Z

@dmsnell you will also need to make your membership of the "automattic" GitHub organization public to satisfy the IPR bot.

Co-authored-by: Anne van Kesteren <annevk@annevk.nl>

dmsnell · 2024-08-13T19:26:04Z

Thanks again @annevk. I've lower-cased the plaintext element reference, and marked my membership as public. Before today I didn't realize there were public and private memberships on my profile.

dmsnell mentioned this pull request Aug 2, 2024

HTML API: Use full parser for html5lib tests WordPress/wordpress-develop#7117

Closed

domenic assigned zcorpan Aug 2, 2024

domenic added the topic: parser label Aug 2, 2024

zcorpan mentioned this pull request Aug 2, 2024

Surprising parsing behavior with active formatting elements nad PLAINTEXT #8009

Closed

zcorpan added the needs implementer interest Moving the issue forward requires implementers to express interest label Aug 2, 2024

Rephrase note on PLAINTEXT making clear how child nodes can be present.

306bf0d

dmsnell changed the title ~~Reconstruct active formatting elements for PLAINTEXT element.~~ Clarify how PLAINTEXT elements may contain child nodes. Aug 5, 2024

Remove needless comma.

3965b08

zcorpan requested changes Aug 5, 2024

View reviewed changes

source Outdated Show resolved Hide resolved

Wording and formatting nits.

0e76a30

zcorpan requested changes Aug 6, 2024

View reviewed changes

source Outdated Show resolved Hide resolved

dmsnell and others added 2 commits August 6, 2024 15:12

PR Feedback

bb0822c

Co-authored-by: Simon Pieters <zcorpan@gmail.com>

Remove a comma

eb657b0

zcorpan approved these changes Aug 8, 2024

View reviewed changes

annevk reviewed Aug 12, 2024

View reviewed changes

source Show resolved Hide resolved

source Outdated Show resolved Hide resolved

Clarify that child elements may come directly from the parser.

84de4e9

Co-authored-by: Anne van Kesteren <annevk@annevk.nl>

Lowercase "plaintext" element.

609e91f

Co-authored-by: Anne van Kesteren <annevk@annevk.nl>

annevk approved these changes Aug 14, 2024

View reviewed changes

annevk merged commit caf70fa into whatwg:main Aug 14, 2024

dmsnell deleted the reconstruct-on-plaintext branch August 14, 2024 17:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify how PLAINTEXT elements may contain child nodes.#10540

Clarify how PLAINTEXT elements may contain child nodes.#10540
annevk merged 8 commits intowhatwg:mainfrom
dmsnell:reconstruct-on-plaintext

dmsnell commented Aug 2, 2024 •

edited by pr-preview bot

Loading

Uh oh!

zcorpan commented Aug 2, 2024

Uh oh!

dmsnell commented Aug 5, 2024

Uh oh!

Uh oh!

Uh oh!

dmsnell commented Aug 8, 2024

Uh oh!

Uh oh!

Uh oh!

annevk commented Aug 13, 2024

Uh oh!

dmsnell commented Aug 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Conversation

dmsnell commented Aug 2, 2024 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zcorpan commented Aug 2, 2024

Uh oh!

dmsnell commented Aug 5, 2024

Uh oh!

Uh oh!

Uh oh!

dmsnell commented Aug 8, 2024

Uh oh!

Uh oh!

Uh oh!

annevk commented Aug 13, 2024

Uh oh!

dmsnell commented Aug 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

dmsnell commented Aug 2, 2024 •

edited by pr-preview bot

Loading