Skip to content

Revert "Simplify RegexInterpreter (#124628)"#126221

Merged
danmoseley merged 1 commit intodotnet:mainfrom
danmoseley:revert-124628
Mar 28, 2026
Merged

Revert "Simplify RegexInterpreter (#124628)"#126221
danmoseley merged 1 commit intodotnet:mainfrom
danmoseley:revert-124628

Conversation

@danmoseley
Copy link
Copy Markdown
Member

Revert "Simplify RegexInterpreter (#124628)"

This reverts commit f194942 from #124628.

Closes #126156

#124628 replaced the char-by-char loop in RegexInterpreter.MatchString with StartsWith/EndsWith. This caused a 7-11% regression on arm64 (AmpereUbuntu) for Perf_Regex_Industry_Leipzig patterns that exercise MatchString heavily via alternation:

  • .{0,2}(Tom|Sawyer|Huckleberry|Finn) None: 5.01s to 5.56s (1.11x)
  • .{2,4}(Tom|Sawyer|Huckleberry|Finn) None: 5.16s to 5.51s (1.07x)

These patterns call MatchString millions of times with short strings (3-11 chars: "Tom", "Finn", "Sawyer", "Huckleberry") where Slice + StartsWith + SequenceEqual dispatch overhead exceeds the original tight loop cost, with no SIMD benefit at those lengths.

The MihuBot x64 results for the original PR showed the same patterns regressing at 1.03-1.04x, but this was overlooked during review.

@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@danmoseley
Copy link
Copy Markdown
Member Author

Verified locally that this recovers the regression.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reverts the RegexInterpreter.MatchString change from #124628 that replaced the per-character literal match loop with StartsWith/EndsWith, addressing a measured interpreter performance regression on Linux/arm64 for short literal alternations (e.g., Leipzig benchmarks).

Changes:

  • Restore the original tight per-character comparison loop in RegexInterpreter.MatchString.
  • Avoid Slice + StartsWith/EndsWith overhead in hot paths where literals are short and invoked millions of times.

@stephentoub
Copy link
Copy Markdown
Member

This reverts commit f194942 from #124628.

Just to confirm, it reverts the whole PR, right? Nothing that merged in that PR remains?

@danmoseley
Copy link
Copy Markdown
Member Author

Yes, complete git revert.

@danmoseley
Copy link
Copy Markdown
Member Author

/ba-g unrelated

@danmoseley danmoseley merged commit fb8b250 into dotnet:main Mar 28, 2026
92 of 101 checks passed
@danmoseley danmoseley deleted the revert-124628 branch March 28, 2026 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Perf] Linux/arm64: 2 Regressions on 3/20/2026 12:54:08 AM +00:00

3 participants