Conversation
|
The problem is parse5's htmlparser2 tree adapter. Removing a node involves an indexOf, which of course scales badly with this many sibling nodes: https://github.com/inikulin/parse5/blob/723d782abee65aaab9c50f92e73ac2029ae853df/lib/tree_adapters/htmlparser2.js#L213 Creating our own tree adapter would probably work better here, as linked list removal via domSymbolTree should be O(1). We've been meaning to do that for a long time. I think last time I talked about it @Sebmaster mentioned that doing it "right" is very hard, but maybe doing it as good as we are doing it now would be easy enough? It's not 100% clear to me why detachNode is being called so much in the first place when parsing this fragment, but I can chalk that up to the vagaries of the HTML parsing algorithm. /cc @inikulin in case he wants to enlighten us. |
|
The "right" mainly has to do with getting script parsing/execution right. We could just parse into a temporary domSymbolTree with our own treeAdapter, then setChild as we do now. |
|
I started trying to just write a tree adapter but it's not working great as parse5 doesn't seem to give us all the context we'd need. For example the createElement hook takes a tagName, namespace, and attrs array, but not a document, so I don't know what document to associate the resulting element with. |
|
Yeah, you'll have to create a custom tree adapter for each document to do that. Wrap it in an anonymous function or something. |
The test has been pushing the limit for a while, but it looks like Travis was having particular problems on that day with some of our other test runs taking 30+ minutes. Still, will be interesting to see if we can improve performance for this case. |
|
I pushed an attempt at a new tree adapter, but it's not quite working. When I run the included test.js, it ends up attempting to call Help appreciated; I think I'm going to take a break for a while. |
|
https://github.com/tmpvar/jsdom/pull/1316/files#diff-d9dea305b1cd56b9d25a28477edfe8f8 is what I came up with the last time, and then I kinda got stuck on properly handling scripts, from what I remember. But the simple use cases worked, I think. |
|
Figured out at least part of the problem. We're not setting namespaceURI correctly on newly-created nodes. The old tree adapter had an |
b3a3dfd to
db915b9
Compare
|
OK, newest revision is stuck on script evaluation. In particular, in the parse5 tree adapter model, the calls for new JSDOM(`<body>
<script>throw new Error();</script>
</body>`, { runScripts: "dangerously", includeNodeLocations: true, virtualConsole: new VirtualConsole() });end up looking like https://gist.github.com/domenic/7068c870281e836757adc6f482393c98. In particular note how it inserts many new text nodes into the script. (I should use @Sebmaster's code to consolidate those into one text node.) Since we're not going to do the wholesale script execution change right now, what we need is some notification that the script tag is "closed" so we can eval it. Maybe we can add that to parse5? That seems like the easiest path right now. Maybe there is something better we can do as well. |
|
Since there's no way to embed another element into a script tag, maybe we should just save the currently open script tag into a global var and when a new createElement call comes in, consider the script closed? |
|
Something like that could work. If you look at the example log in https://gist.github.com/domenic/7068c870281e836757adc6f482393c98 we'd need to detect insertTexts into anything but the script tag as well... probably best to just wait for any mutation? |
|
Ah, I see. Yeah that makes sense to me. I wonder how the call graph looks like when there's reparenting going on due to malformed HTML. We might have to put in tests for that. |
|
It does seem like getting an end tag notification would be simpler. |
|
OK, API tests are now passing. to-upstream WPTs have some interesting failures left. |
|
I think the |
|
I'm not switching to the streaming parser in this PR if I can avoid it... |
This includes reevaluating <style> whenever its content changes, and properly creating new CSSStyleSheet objects each time for the .sheet property.
This didn't work, notably, for `new Document()`, so it wasn't reliable anyway.
|
Down to just 2 failing tests.... |
0be1acb to
0d5f2d8
Compare
|
OK, this is now ready for review, as all tests are passing that can reasonably made to pass. Proposed post-squashing commit message, in case it makes reviewing easier: Overhaul HTML parsing infrastructure This changes the HTML parsing infrastructure, in particular how we interface with parse5. The original motivation for this change was increased performance; in the new benchmark, we see the time for appending 65535 s go from XXX to YYY. Other user-visible changes to script evaluation might be present; in particular, the tests revealed that one document.write-related test that used to somehow pass now fails, while another one succeeds (at least on Node 8). In most cases, however, behavior should be the same. While here, we also changed style sheets to reevaluate their rules and update Behind the scenes details:
|
| } | ||
|
|
||
| createDocumentFragment() { | ||
| return DocumentFragment.createImpl([], { ownerDocument: this._documentImpl }); |
There was a problem hiding this comment.
Is it really a good idea to expose implementation objects to parse5, which I assume expects wrappers?
There was a problem hiding this comment.
parse5 doesn't expect anything. It just gives you back what you give it. You can create objects of any shape, as long as they can be appended to each other in a tree structure.
|
I am a bit weirded out by the fact that we are now passing fewer tests because of this supposedly more spec-complaint approach, but that just shows how bad we were before I guess. |
|
Eh, we lost one, and on Node 8 at least, we gained one. 🤷♂️ |
|
Nice cleanups all around, but particularly with Assuming the tests have us covered for the logic here, LGTM. |
| } | ||
|
|
||
| setDocumentType(document, name, publicId, systemId) { | ||
| // parse5 sometimes gives us these as null. |
There was a problem hiding this comment.
Turns out it's a bug, per spec:
Append a DocumentType node to the Document node, with the name attribute set to the name given in the DOCTYPE token, or the empty string if the name was missing; the publicId attribute set to the public identifier given in the DOCTYPE token, or the empty string if the public identifier was missing; the systemId attribute set to the system identifier given in the DOCTYPE token, or the empty string if the system identifier was missing;
But parse5 doesn't perform such transforms in tree construction stage. Filled inikulin/parse5#236
Inspired by the timeout failure in https://travis-ci.org/tmpvar/jsdom/jobs/323623712.
I wonder if this has regressed recently, or if we've just gotten lucky with that test before?
I tried running this in Chrome but something's not working right with my profiler so I can't get a call graph... maybe others will have better luck.I figured out I had to go back to the old JavaScript profiler... investigating now...