Fix document.body not including content from DOMParser parsed <body>#108
Conversation
| assert(document.all[3], document.querySelector('body')); | ||
|
|
||
| document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html'); | ||
| assert(document.body.tagName, 'BODY'); |
There was a problem hiding this comment.
Making sure document.body isn't null
|
|
||
| document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html'); | ||
| assert(document.body.tagName, 'BODY'); | ||
| assert(document.body, document.querySelector('body')); |
There was a problem hiding this comment.
Making sure document.body resolves to the same Node as document.querySelector('body')
| document = (new DOMParser).parseFromString('<!DOCTYPE html><html><body><div>foo</div></body></html>', 'text/html'); | ||
| assert(document.body.tagName, 'BODY'); | ||
| assert(document.body, document.querySelector('body')); | ||
| assert(document.body.textContent, 'foo'); |
There was a problem hiding this comment.
Making sure the document.body includes the content we parsed.
esm/shared/parse-from-string.js
Outdated
| // Accessing the document.body property will create the <body> | ||
| node = document.body; | ||
| create = false; | ||
| } |
There was a problem hiding this comment.
The HTML spec says "The body element of a document is the first of the html element's children that is either a body element or a frameset element, or null if there is no such element." -- https://html.spec.whatwg.org/multipage/dom.html#dom-document-body-dev
But in terms of multiple <body> and how it behaves, the browser seems to resolve to the same Node.
new DOMParser().parseFromString('<html><body>foo</body><body>bar</body></html>', "text/html").documentElement.outerHTML
// '<html><head></head><body>foobar</body></html>'For reference, linkedom and the browser both behave the same here ✅
There was a problem hiding this comment.
still it's no goal of this project to care about these shenanigans ... but good to know, thanks
There was a problem hiding this comment.
Understood. In this case, it's the simplest way to support and turns out to behave identically 👌
|
|
068a70d to
8edc4e3
Compare
|
@WebReflection Updated ✅ |
|
Build failing. This MR is ignoring SVG and XML constraints, plus it's not considering that the current logic works already for The |
…parsed <body> Fix WebReflection#106 Before: ```js new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML // '<body></body>' ``` After: ```js new DOMParser().parseFromString(`<html><body><div>asdf</div></body></html>`, "text/html").body.outerHTML // '<body><div>asdf</div></body>' ```
8edc4e3 to
176bec3
Compare
| @@ -59,7 +59,12 @@ const parseFromString = (document, isHTML, markupLanguage) => { | |||
| onopentag(name, attributes) { | |||
There was a problem hiding this comment.
Build failing.
Build fixed.
| @@ -59,7 +59,12 @@ const parseFromString = (document, isHTML, markupLanguage) => { | |||
| onopentag(name, attributes) { | |||
There was a problem hiding this comment.
This MR is ignoring SVG and XML constraints
I'm not sure what you mean (can you explain this more obviously)? The logic was under the isHTML flag and doesn't apply to SVG and XML as far as I know.
| const {documentElement} = this; | ||
| let {firstElementChild} = documentElement; | ||
| if (!firstElementChild) { | ||
| if (!firstElementChild || firstElementChild.tagName !== 'HEAD') { |
There was a problem hiding this comment.
[...] plus it's not considering that the current logic works already for document.body so it's kinda patching it in the wrong place.
The body getter is likely the place to look for a change, not the parse-from-string one, as that file has zero responsibility on how that document or its getters should react .. and bear in mind document.head is another one to eventually fix in the original issue.
Is this more what you mean?
Previously, the code would always assume the that first child was a <head> element but in the case of <!DOCTYPE html><html><body><div>foo</div></body></html>, it's a <body>. This means that when the body getter calls the head getter, it returned the existing <body> and created another <body> for itself after the supposed <head>.
The new code makes sure that the first child is a <head> element otherwise creates one.
This current assumptions about node order here has the downside that if someone defines stuff out of order, it will have 2 <head> when you try to access document.head
// notice <head> after <body>
const document = new DOMParser().parseFromString(`<!DOCTYPE html><html><body></body><head><title>"hello"</title></head></html>`, "text/html").head.outerHTML
// '<head></head>'We could fix this up better in parse-from-string if we enforce <head> as the first child and <body> after when documents are initially parsed and created. But I'd go back to the document.body creation pattern like I had previously to keep things simple.
|
Fixed with latest. |
|
Thanks for incorporating the changes @WebReflection 😀 Looks pretty similar: 535c5bc |
…rowser and linkedom (for SSR) Context: - WebReflection/linkedom#106 - WebReflection/linkedom#108
…er and linkedom (for SSR) Before: ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` ❌ new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body></body>' ``` After (consistent matching output): ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' ``` --- `linkedom` goal is to be close to the current DOM standard, but [not too close](https://github.com/WebReflection/linkedom#faq). Focused on the streamlined cases for server-side rendering (SSR). Here is some context around getting `DOMParser` to interpret things better. The conclusion was to only support the explicit standard cases with a `<html><body></body></html>` specified instead of adding the magic HTML document creation and massaging that the browser does. - WebReflection/linkedom#106 - WebReflection/linkedom#108
…dom`](https://github.com/WebReflection/linkedom) (SSR). - [`linkedom`](https://github.com/WebReflection/linkedom) is being used https://github.com/matrix-org/matrix-public-archive to server-side render (SSR) Hydrogen (`hydrogen-view-sdk`) - This is being fixed by using a explicit HTML wrapper boilerplate with `DOMParser` to get a matching result in the browser and `linkedom`. Currently `parseHTML` is only used for HTML content bodies in events. Events with replies have content bodies that look like `<mx-reply>Hello</mx-reply> What's up` so they're parsed as HTML to strip out the `<mx-reply>` part. Before | After --- | ---  |  Before: ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` ❌ new DOMParser().parseFromString(`<div>foo</div>`, "text/html").body.outerHTML; // '<body></body>' ``` After (consistent matching output): ```js // Browser (Chrome, Firefox) new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' // `linkedom` new DOMParser().parseFromString(`<!DOCTYPE html><html><body><div>foo</div></body></html>`, "text/html").body.outerHTML; // '<body><div>foo</div></body>' ``` `linkedom` goal is to be close to the current DOM standard, but [not too close](https://github.com/WebReflection/linkedom#faq). Focused on the streamlined cases for server-side rendering (SSR). Here is some context around getting `DOMParser` to interpret things better. The conclusion was to only support the explicit standard cases with a `<html><body></body></html>` specified instead of adding the magic HTML document creation and massaging that the browser does. - WebReflection/linkedom#106 - WebReflection/linkedom#108 --- Part of #653 to support server-side rendering Hydrogen for the [`matrix-public-archive`](https://github.com/matrix-org/matrix-public-archive) project.
Fix
document.bodynot including content fromDOMParserparsed<body>Fix #106
Before:
After: