Skip to content

Fix another edge case with splitting code into lines#85

Merged
allejo merged 5 commits into
9.18from
hotfix/deeply-nested-spans-split
Dec 22, 2020
Merged

Fix another edge case with splitting code into lines#85
allejo merged 5 commits into
9.18from
hotfix/deeply-nested-spans-split

Conversation

@allejo

@allejo allejo commented Dec 20, 2020

Copy link
Copy Markdown
Collaborator

The .hljs-tag here spans two lines, however, the text() XPath function will trim the newline so the XPath that was introduced in #79 doesn't work.

Original Code

<?xml version="1.0" encoding="utf-8" ?>
               <tag a="t"
                        b="t">
                 </tag>

Highlighted Result

<span class="hljs-meta">&lt;?xml version="1.0" encoding="utf-8" ?&gt;</span>
               <span class="hljs-tag">&lt;<span class="hljs-name">tag</span> <span class="hljs-attr">a</span>=<span class="hljs-string">"t"</span>
                        <span class="hljs-attr">b</span>=<span class="hljs-string">"t"</span>&gt;</span>
                 <span class="hljs-tag">&lt;/<span class="hljs-name">tag</span>&gt;</span>

I have tried changing the XPath to use . instead of text(), however, that will match deeply nested newlines like those seen in #79 and multiply the number of "results" this XPath returns relative to the depth of the span that contains it.

Running the 9.18.1.5 release splits the above snippet in ~29 seconds for 200000 iterations using text() (however, it's incorrect in its splitting); replacing text() with . results in a time of ~34 seconds in the exact scenario but splits correctly.

This PR without XPath and manually searching for newlines inside of spans comes in at ~34 seconds for 200000 iterations as well.


I have taken a step back and rewritten this function entirely to no longer parse the HTML into a DOM object. I am now using pure regular expressions to do our best in figuring out which tags span multiple lines.

The solution in this PR functions as such:

  1. With the highlighted code snippet result above, I split things on \R and I count how many opening and closing span tags there are on each line.
  2. If we find an even amount of closing and opening tags, we move on to the next line
  3. If there are an uneven amount of closing tags with opening tags, that means we found a span tag that spans multiple lines.
  4. I remove all complete <span>...</span> tags from a copy of the current line
  5. I then match all of the opening span tags and save them for our next line
  6. I then count how many open tags there are and put them at the end of the current line
  7. In the next line, if I have anything saved from step 5, then I add them as a prefix to the current line in this iteration

Without parsing the HTML into a DOM object, I've decreased the speed of 200000 iterations down to ~10 seconds.


See westonruter/syntax-highlighting-code-block#246

/cc @westonruter

@allejo allejo added the bug label Dec 20, 2020
@allejo allejo force-pushed the hotfix/deeply-nested-spans-split branch from 2275c20 to 2509b7e Compare December 20, 2020 08:52
Comment thread HighlightUtilities/functions.php Outdated
@allejo allejo force-pushed the hotfix/deeply-nested-spans-split branch from 9c0d06c to 35a5801 Compare December 21, 2020 06:38
@allejo allejo marked this pull request as ready for review December 21, 2020 07:45
@allejo

allejo commented Dec 21, 2020

Copy link
Copy Markdown
Collaborator Author

@westonruter I've rewritten this function entirely (again) and added two unit tests for splitting things correctly based on the reported issue. Could you take a look and/or if you have any more snippets to share?

Here's what WP looks like with this PR

image

@westonruter

Copy link
Copy Markdown

Without parsing the HTML into a DOM object, I've decreased the speed of 200000 iterations down to ~10 seconds.

And in master it is currently 29 seconds?

@allejo

allejo commented Dec 22, 2020

Copy link
Copy Markdown
Collaborator Author

Without parsing the HTML into a DOM object, I've decreased the speed of 200000 iterations down to ~10 seconds.

And in master it is currently 29 seconds?

Correct

@westonruter

Copy link
Copy Markdown

So ~3X faster. Nice!

Comment thread HighlightUtilities/functions.php Outdated
Comment thread HighlightUtilities/functions.php Outdated
Comment thread HighlightUtilities/functions.php Outdated
Co-authored-by: Weston Ruter <westonruter@google.com>
@allejo allejo merged commit db45685 into 9.18 Dec 22, 2020
@allejo allejo deleted the hotfix/deeply-nested-spans-split branch December 22, 2020 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants