Skip to content

Support content after HTML end tag#4297

Merged
westonruter merged 13 commits intodevelopfrom
4282-support-content-after-html-end-tag
Mar 7, 2020
Merged

Support content after HTML end tag#4297
westonruter merged 13 commits intodevelopfrom
4282-support-content-after-html-end-tag

Conversation

@schlessera
Copy link
Copy Markdown
Collaborator

@schlessera schlessera commented Feb 15, 2020

Summary

The normalize_document_structure() method in Dom\Document didn't support content after the HTML end tag. In that case, it failed to remove the end tag, and the subsequent manipulations to normalize further resulted in an overwritten tag that lost all attributes.

This PR changes the regular expressions and the reassembly behavior so that comments are properly kept intact and in the correct ordering across the HTML structure.

It also changes most regular expressions to use atomic groups where needed to avoid any backtracking and improve regex performance.

Fixes #4282

Checklist

  • My pull request is addressing an open issue (please create one otherwise).
  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something isn't working cla: yes Signed the Google CLA Sanitizers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failure to normalize documents with HTML comments after </body>

4 participants