Skip to content

fix: harden head and body regex in domparser#22

Merged
remarkablemark merged 5 commits intomasterfrom
fix/domparser
Nov 4, 2019
Merged

fix: harden head and body regex in domparser#22
remarkablemark merged 5 commits intomasterfrom
fix/domparser

Conversation

@remarkablemark
Copy link
Copy Markdown
Owner

Fixes #18

Because the head and body regexes test against the closing tag,
this causes html with unclosed head or body to not be parsed
correctly.

For example, given the following:

```js
parse('<html><body>');
```

The expected output is:

```
[ { type: 'tag',
    name: 'html',
    attribs: {},
    children:
     [ { type: 'tag',
         name: 'body',
         attribs: {},
         children: [],
         next: null,
         prev: null,
         parent: [Circular] } ],
    next: null,
    prev: null,
    parent: null } ]
```

But the actual output is:

```
[
  {
    "next": null,
    "prev": null,
    "parent": null,
    "name": "html",
    "attribs": {},
    "type": "tag",
    "children": []
  }
]
```

The fix is to update the regex to use the opening tag instead of
the closing tag.

Add test case.

Fixes #18
@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.6%) to 90.909% when pulling 3f3e514 on fix/domparser into c2665b4 on master.

@remarkablemark remarkablemark merged commit 871b1e3 into master Nov 4, 2019
@remarkablemark remarkablemark deleted the fix/domparser branch November 4, 2019 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser completely removes body elements if html string has open <html> and <body> tags, but does not have close </body> and tag

2 participants