Fix another edge case with splitting code into lines#85
Merged
Conversation
2275c20 to
2509b7e
Compare
westonruter
reviewed
Dec 21, 2020
9c0d06c to
35a5801
Compare
Collaborator
Author
|
@westonruter I've rewritten this function entirely (again) and added two unit tests for splitting things correctly based on the reported issue. Could you take a look and/or if you have any more snippets to share? Here's what WP looks like with this PR |
And in |
Collaborator
Author
Correct |
|
So ~3X faster. Nice! |
westonruter
reviewed
Dec 22, 2020
Co-authored-by: Weston Ruter <westonruter@google.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

The
.hljs-taghere spans two lines, however, thetext()XPath function will trim the newline so the XPath that was introduced in #79 doesn't work.Original Code
Highlighted Result
I have tried changing the XPath to use.instead oftext(), however, that will match deeply nested newlines like those seen in #79 and multiply the number of "results" this XPath returns relative to the depth of the span that contains it.Running the 9.18.1.5 release splits the above snippet in ~29 seconds for 200000 iterations usingtext()(however, it's incorrect in its splitting); replacingtext()with.results in a time of ~34 seconds in the exact scenario but splits correctly.This PR without XPath and manually searching for newlines inside of spans comes in at ~34 seconds for 200000 iterations as well.I have taken a step back and rewritten this function entirely to no longer parse the HTML into a DOM object. I am now using pure regular expressions to do our best in figuring out which tags span multiple lines.
The solution in this PR functions as such:
\Rand I count how many opening and closing span tags there are on each line.<span>...</span>tags from a copy of the current lineWithout parsing the HTML into a DOM object, I've decreased the speed of 200000 iterations down to ~10 seconds.
See westonruter/syntax-highlighting-code-block#246
/cc @westonruter