Fix another edge case with splitting code into lines by allejo · Pull Request #85 · scrivo/highlight.php

allejo · 2020-12-20T08:41:11Z

The .hljs-tag here spans two lines, however, the text() XPath function will trim the newline so the XPath that was introduced in #79 doesn't work.

Original Code

<?xml version="1.0" encoding="utf-8" ?>
               <tag a="t"
                        b="t">
                 </tag>

Highlighted Result

<span class="hljs-meta">&lt;?xml version="1.0" encoding="utf-8" ?&gt;</span>
               <span class="hljs-tag">&lt;<span class="hljs-name">tag</span> <span class="hljs-attr">a</span>=<span class="hljs-string">"t"</span>
                        <span class="hljs-attr">b</span>=<span class="hljs-string">"t"</span>&gt;</span>
                 <span class="hljs-tag">&lt;/<span class="hljs-name">tag</span>&gt;</span>

I have tried changing the XPath to use . instead of text(), however, that will match deeply nested newlines like those seen in #79 and multiply the number of "results" this XPath returns relative to the depth of the span that contains it.

Running the 9.18.1.5 release splits the above snippet in ~29 seconds for 200000 iterations using text() (however, it's incorrect in its splitting); replacing text() with . results in a time of ~34 seconds in the exact scenario but splits correctly.

~~This PR without XPath and manually searching for newlines inside of spans comes in at ~34 seconds for 200000 iterations as well.~~

I have taken a step back and rewritten this function entirely to no longer parse the HTML into a DOM object. I am now using pure regular expressions to do our best in figuring out which tags span multiple lines.

The solution in this PR functions as such:

With the highlighted code snippet result above, I split things on \R and I count how many opening and closing span tags there are on each line.
If we find an even amount of closing and opening tags, we move on to the next line
If there are an uneven amount of closing tags with opening tags, that means we found a span tag that spans multiple lines.
I remove all complete <span>...</span> tags from a copy of the current line
I then match all of the opening span tags and save them for our next line
I then count how many open tags there are and put them at the end of the current line
In the next line, if I have anything saved from step 5, then I add them as a prefix to the current line in this iteration

Without parsing the HTML into a DOM object, I've decreased the speed of 200000 iterations down to ~10 seconds.

See westonruter/syntax-highlighting-code-block#246

/cc @westonruter

...again

allejo · 2020-12-21T07:51:31Z

@westonruter I've rewritten this function entirely (again) and added two unit tests for splitting things correctly based on the reported issue. Could you take a look and/or if you have any more snippets to share?

Here's what WP looks like with this PR

westonruter · 2020-12-22T01:42:24Z

Without parsing the HTML into a DOM object, I've decreased the speed of 200000 iterations down to ~10 seconds.

And in master it is currently 29 seconds?

allejo · 2020-12-22T02:05:06Z

Without parsing the HTML into a DOM object, I've decreased the speed of 200000 iterations down to ~10 seconds.

And in master it is currently 29 seconds?

Correct

westonruter · 2020-12-22T02:57:28Z

So ~3X faster. Nice!

Co-authored-by: Weston Ruter <westonruter@google.com>

allejo added the bug label Dec 20, 2020

Fix splitCodeIntoArray to better handle newlines

2509b7e

...again

allejo force-pushed the hotfix/deeply-nested-spans-split branch from 2275c20 to 2509b7e Compare December 20, 2020 08:52

westonruter reviewed Dec 21, 2020

View reviewed changes

Comment thread HighlightUtilities/functions.php Outdated

allejo added 2 commits December 20, 2020 22:37

Rewrite splitCodeIntoArray to use ReGex (again)

0ef5ec0

Update documentation to remove thrown exceptions

35a5801

allejo force-pushed the hotfix/deeply-nested-spans-split branch from 9c0d06c to 35a5801 Compare December 21, 2020 06:38

Add unit test for XML attrs on multiple lines

6624c8a

allejo marked this pull request as ready for review December 21, 2020 07:45

westonruter reviewed Dec 22, 2020

View reviewed changes

Comment thread HighlightUtilities/functions.php Outdated

Comment thread HighlightUtilities/functions.php Outdated

Comment thread HighlightUtilities/functions.php Outdated

Allow for matching more span situations

f516caa

Co-authored-by: Weston Ruter <westonruter@google.com>

allejo merged commit db45685 into 9.18 Dec 22, 2020

allejo deleted the hotfix/deeply-nested-spans-split branch December 22, 2020 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix another edge case with splitting code into lines#85

Fix another edge case with splitting code into lines#85
allejo merged 5 commits into
9.18from
hotfix/deeply-nested-spans-split

allejo commented Dec 20, 2020 •

edited

Loading

Uh oh!

Uh oh!

allejo commented Dec 21, 2020

Uh oh!

westonruter commented Dec 22, 2020

Uh oh!

allejo commented Dec 22, 2020

Uh oh!

westonruter commented Dec 22, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

allejo commented Dec 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

allejo commented Dec 21, 2020

Uh oh!

westonruter commented Dec 22, 2020

Uh oh!

allejo commented Dec 22, 2020

Uh oh!

westonruter commented Dec 22, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

allejo commented Dec 20, 2020 •

edited

Loading