Skip to content

Reduce unnecessary Regex match attempts for expressions beginning with atomic loops#35824

Merged
danmoseley merged 2 commits intodotnet:masterfrom
stephentoub:atomicfast
May 5, 2020
Merged

Reduce unnecessary Regex match attempts for expressions beginning with atomic loops#35824
danmoseley merged 2 commits intodotnet:masterfrom
stephentoub:atomicfast

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented May 5, 2020

Optimize Regex expressions that begin with atomic unbounded loops (either as written by the dev or because the system detected the loop could be made atomic and did it implicitly) by updating the starting position in the scan loop for the next iteration to be where the loop ended rather than where it started.

Running the examples from https://github.com/mariomka/regex-benchmark/blob/652d55810691ad88e1c2292a2646d301d3928903/csharp/Benchmark.cs#L20-L26:

Before (RegexOptions.None):

558.089 - 92
380.8639 - 5301
42.2791 - 5

After (RegexOptions.None):

195.73 - 92
97.2725 - 5301
42.3631 - 5

Before (RegexOptions.ECMAScript):

435.6631 - 92
299.1351 - 5301
41.6416 - 5

After (RegexOptions.ECMAScript):

184.59 - 92
89.0295 - 5301
42.6699 - 5

Before (RegexOptions.Compiled):

279.0234 - 92
186.4688 - 5301
16.3892 - 5

After (RegexOptions.Compiled):

137.4005 - 92
57.136 - 5301
15.6309 - 5

Before (RegexOptions.Compiled | RegexOptions.ECMAScript):

204.9634 - 92
113.7158 - 5301
15.7937 - 5

After (RegexOptions.Compiled | RegexOptions.ECMAScript):

127.83 - 92
49.2225 - 5301
15.7834 - 5

@danmosemsft, this is the optimization you and I discussed offline.

cc: @eerhardt, @pgovind

@stephentoub stephentoub added the tenet-performance Performance related issue label May 5, 2020
@stephentoub stephentoub added this to the 5.0 milestone May 5, 2020
@ghost
Copy link

ghost commented May 5, 2020

Tagging subscribers to this area: @eerhardt
Notify danmosemsft if you want to be subscribed.

@danmoseley
Copy link
Member

Gosh that was a good trick! Should we add a perf test to protect this change? I don't have an intuition how commonly it would apply.

BTW this issue jumped out of the debug output when running the benchmark

@danmoseley danmoseley merged commit 1657df9 into dotnet:master May 5, 2020
@danmoseley
Copy link
Member

What is the state of the regex perf tests? Do you run them/ track them, or should we? Is there a test eg that would show if this change regressed?

I recall I added a very ad-hoc test some time ago.

@stephentoub
Copy link
Member Author

What is the state of the regex perf tests?

https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Common.cs
https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/libraries/System.Text.RegularExpressions/Perf.Regex.Cache.cs
https://github.com/dotnet/performance/blob/master/src/benchmarks/micro/runtime/BenchmarksGame/regex-redux-5.cs

Do you run them

When they're relevant.

Is there a test eg that would show if this change regressed?

There are two tests that start with a relevant loop. One completes the loop frequently enough that it helps:

Method Job Toolchain Options Mean Error StdDev Median Min Max Ratio
Uri_IsNotMatch Job-WTPBDF \master\corerun.exe None 854.6 ns 8.06 ns 7.15 ns 851.9 ns 847.6 ns 869.4 ns 1.00
Uri_IsNotMatch Job-ZGOHNW \pr\corerun.exe None 337.9 ns 2.74 ns 2.43 ns 337.8 ns 335.1 ns 343.6 ns 0.40
Uri_IsNotMatch Job-WTPBDF \master\corerun.exe Compiled 231.8 ns 2.77 ns 2.45 ns 232.0 ns 228.2 ns 236.1 ns 1.00
Uri_IsNotMatch Job-ZGOHNW \pr\corerun.exe Compiled 103.0 ns 0.65 ns 0.58 ns 102.9 ns 102.3 ns 104.2 ns 0.44
Uri_IsNotMatch Job-WTPBDF \master\corerun.exe IgnoreCase, Compiled 460.4 ns 1.34 ns 1.04 ns 460.3 ns 458.0 ns 461.9 ns 1.00
Uri_IsNotMatch Job-ZGOHNW \pr\corerun.exe IgnoreCase, Compiled 190.6 ns 0.85 ns 0.76 ns 190.9 ns 189.2 ns 191.8 ns 0.41

The other does not:

Method Job Toolchain Options Mean Error StdDev Median Min Max Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
MatchesSet Job-RCSMLO \master\corerun.exe None 179.96 us 2.859 us 2.808 us 179.14 us 177.28 us 186.57 us 1.00 0.00 0.7102 - - 6.54 KB
MatchesSet Job-QUAGIB \pr\corerun.exe None 175.02 us 1.737 us 1.450 us 174.67 us 173.35 us 178.05 us 0.97 0.02 0.6944 - - 6.54 KB
MatchesSet Job-RCSMLO \master\corerun.exe Compiled 47.39 us 0.466 us 0.389 us 47.32 us 46.99 us 48.10 us 1.00 0.00 1.0557 - - 6.54 KB
MatchesSet Job-QUAGIB \pr\corerun.exe Compiled 47.30 us 0.475 us 0.421 us 47.15 us 46.80 us 48.14 us 1.00 0.01 0.9470 - - 6.54 KB
MatchesSet Job-RCSMLO \master\corerun.exe IgnoreCase, Compiled 102.56 us 1.307 us 1.159 us 102.33 us 100.11 us 105.03 us 1.00 0.00 0.8013 - - 6.54 KB
MatchesSet Job-QUAGIB \pr\corerun.exe IgnoreCase, Compiled 103.54 us 0.494 us 0.438 us 103.52 us 102.97 us 104.17 us 1.01 0.01 0.8224 - - 6.54 KB

@stephentoub stephentoub deleted the atomicfast branch May 5, 2020 19:28
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants