Skip to content

Improve regex optimizer through investigation of regex optimizer passes#125289

Open
danmoseley wants to merge 6 commits intodotnet:mainfrom
danmoseley:regex-rereduce
Open

Improve regex optimizer through investigation of regex optimizer passes#125289
danmoseley wants to merge 6 commits intodotnet:mainfrom
danmoseley:regex-rereduce

Conversation

@danmoseley
Copy link
Member

@danmoseley danmoseley commented Mar 7, 2026

Fixes #66031

Human/copilot collaboration to understand whether the regex optimizer passes can be improved, using the real world patterns corpus as a test bed.

Investigation

We analyzed the issue by running experiments against 18,931 real-world regex patterns extracted from NuGet packages (dotnet/runtime-assets corpus):

  1. Fixed-point convergence: Re-ran Reduce() in a loop until the tree stabilized. 221 of 18,931 patterns (1.2%) benefit from a second round. All converge in exactly 2 rounds -- zero oscillation, zero regressions.

  2. Pass ordering sensitivity: Tried all permutations of the FinalOptimize passes. 0 patterns where ordering matters (beyond the FinalReduce placement).

  3. Minimal fix analysis: Compared re-running just Reduce() vs. re-running the full FinalOptimize passes. Re-reduce alone captures 100% of the improvements. Re-running FindAndMakeLoopsAtomic + EliminateEndingBacktracking adds nothing.

Problem

The regex optimizer runs in two phases: per-node Reduce() calls during parsing, then three global FinalOptimize() passes after parsing (FindAndMakeLoopsAtomic, EliminateEndingBacktracking, UpdateBumpalong). The global passes create new tree structures (Atomic wrappers, restructured alternations) that are themselves eligible for further reduction -- but Reduce() never re-runs on them. This leaves optimization opportunities on the table.

Change

Add a single FinalReduce() call at the end of FinalOptimize(), after EliminateEndingBacktracking and before UpdateBumpalong. It walks the tree bottom-up and re-calls Reduce() on each node, replacing any node that simplifies. This is a 15-line private method with a StackHelper.TryEnsureSufficientExecutionStack() guard.

UpdateBumpalong was also moved to run after FinalReduce, since FinalReduce can restructure alternations into concatenations with a leading loop that UpdateBumpalong needs to see.

Alternatives rejected

Alternative Why discounted
Full fixed-point loop (re-run all passes until convergence) Unnecessary -- single re-reduce pass captures 100% of improvements; all patterns converge in 1 extra round
Reorder existing passes instead of adding a new one Pass ordering has zero effect on the 18,931 patterns
Fix individual Reduce methods to handle FinalOptimize-created structures Would require changes across multiple Reduce methods; fragile and wouldn't catch future cases
Do nothing 221 patterns get suboptimal trees; the fix is small and safe

Distinct improvement categories in real world corpus

Analysis of all 231 changed pattern trees (221 unique patterns, some with multiple option variants) across 35 distinct structural signatures identified ~4 distinct improvement variants:

# Test pattern Reduces before fix Reduces to after fix Mechanism
1 a|ab a(?:) a Empty-in-Concat: prefix extraction leaves trailing Empty; re-reduce strips it
2 \n|\n\r|\r\n (?>\n(?:)|\r\n) (?>\n|\r\n) Empty-in-Alternate: shared prefix creates Empty branch; re-reduce collapses
3 [ab]+c[ab]+|[ab]+ (?>[ab]+c[ab]+|[ab]+) (?>(?>[ab]+)(?:c(?>[ab]+))?) Prefix extraction + Alternate-to-Loop: set loop prefix not extracted until re-reduce
4 ab|a|ac a(?>b?) ab? Redundant Atomic removal: Atomic wrapping non-backtracking child is stripped
5 ab|a|ac|d (?>a(?>b?)|d) (?>ab?|d) Same Atomic removal, within a larger Alternate
6 a?b|a??b (?>a?(?>[b])) (?>a?(?>b)) Set-to-One: greedy/lazy branches merge after atomic promotion; single-char [b] simplified to b
7 [ab]?c|[ab]??c (?>[ab]?(?>[c])) (?>[ab]?(?>c)) Same Set-to-One with set loop prefix

These are captured in new reduction tests. All 7 tests are verified to fail without the FinalReduce change and pass with it.

Performance

  • Parse-time cost is negligible: Measured at 0.3% of total parse time, within noise, across 18,931 patterns.
  • Zero tree regressions: All 18,931 patterns produce trees that are either identical (18,710) or strictly simpler (221). No pattern produces a worse tree.
  • All existing tests pass: The 424 pre-existing PatternsReduceIdentically tests continue to pass unchanged.
  • Convergence is immediate: Every pattern stabilizes in at most 1 extra round, so there's no risk of expensive iteration.
  • The improvements are structurally provable: Fewer nodes (Empty removal, Concat unwrapping), simpler node types (Set to One), and eliminated redundant wrappers (Atomic around non-backtracking children) all reduce work at match time.
  • Existing microbenchmarks unaffected: None of the 38 patterns from the dotnet/performance regex benchmarks are affected by this change; their patterns are not complex enough.

After FinalOptimize's EliminateEndingBacktracking and FindAndMakeLoopsAtomic
create new tree structures (Atomic wrappers, restructured alternations), walk
the tree bottom-up and re-call Reduce() on each node. This cleans up patterns
like Concat(X, Empty), redundant Atomic(Oneloopatomic), and enables further
prefix extraction — improving 221 of 18,931 real-world NuGet patterns (1.2%),
all converging in a single extra round.

Also moves UpdateBumpalong after ReReduceTree so it operates on the final
tree structure.

Fixes dotnet#66031

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances System.Text.RegularExpressions’ post-parse optimization pipeline by adding a final “re-reduce” cleanup pass after FinalOptimize transformations, ensuring any newly introduced structures are simplified by running Reduce() again.

Changes:

  • Add a ReReduceTree() traversal at the end of RegexNode.FinalOptimize() (after EliminateEndingBacktracking and before UpdateBumpalong).
  • Move UpdateBumpalong to run after the re-reduction so it sees the final reshaped tree.
  • Add focused unit tests covering patterns that only fully optimize with the new re-reduce pass.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs Adds ReReduceTree() and updates FinalOptimize() pass ordering so reductions are re-applied after global rewrites.
src/libraries/System.Text.RegularExpressions/tests/UnitTests/RegexReductionTests.cs Adds regression cases asserting the re-reduced trees match the expected simplified forms.

@danmoseley danmoseley changed the title Add ReReduceTree pass to regex FinalOptimize Improve regex optimizer through investigation of regex optimizer passes Mar 7, 2026
- Row 3: Remove spurious capture group from before-fix equivalent
  (?>([ab]+c[ab]+|[ab]+)) -> (?>[ab]+c[ab]+|[ab]+)
- Row 5: Add missing 'a' prefix in before-fix equivalent
  (?>(?>b?)|d) -> (?>a(?>b?)|d)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

ReplaceChild already handles Reduce + re-parenting, so delegate to it
instead of duplicating that logic. Also avoids a double-Reduce that
occurred when the manual code passed the reduced node to ReplaceChild.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 7, 2026 09:45
Better name that pairs with FinalOptimize which calls it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 7, 2026 09:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@stephentoub
Copy link
Member

Thanks. Could you explore variations, in particular for overall parse time, like not doing any reduction (other than maybe removing Group) until the end, and then reducing until it stabilizes?

@danmoseley
Copy link
Member Author

danmoseley commented Mar 7, 2026

I explored several variations of deferring reduction to the end. Here's a summary of the experiments and findings.

Setup: 15,817 unique regex patterns from a real-world JSON corpus, Release build, DOTNET_TieredCompilation=0, 15 iterations, 5-round warmup. All timings are average per-pattern cost in microseconds (mean +/- 1 stddev across iterations).

Experiment 1: Baselines

Measured the cost of FinalReduce as implemented in this PR (reduce during parse + one FinalReduce pass at end):

Variant Per-pattern (μs)
Without FinalReduce (upstream behavior) 3.51 +/- 0.06
With FinalReduce (this PR) 4.08 +/- 0.07
FinalReduce overhead +0.57 μs (16%)

Experiment 2: Phase profiling (this PR)

Instrumented each phase to understand where time is spent:

Phase Per-pattern (μs) % of total
Parse + Reduce during parse 2.7 69%
FindAndMakeLoopsAtomic 0.5 13%
EliminateEndingBacktracking 0.2 4%
FinalReduce 0.6 14%
Total 3.9

Experiment 3: Naive deferral -- ReduceMinimal in AddChild

Replaced Reduce() in AddChild/InsertChild with a minimal reducer (Group unwrap + 0/1-child Concatenation/Alternation unwrap + IgnoreCase strip), keeping full Reduce only in ReplaceChild (used by FinalReduce).

Result: 183/431 PatternsReduceIdentically tests fail. The optimization passes (FindAndMakeLoopsAtomic, EliminateEndingBacktracking) produce different results when operating on unreduced trees.

Adding a FinalReduce call before the optimization passes reduced failures to 23. The remaining 23 fail because reduction methods like ReduceAlternation internally create new nodes via AddChild and depend on those nodes being fully reduced. So swapping Reduce for ReduceMinimal in AddChild breaks the reducers themselves.

Experiment 4: Parser-level deferral -- AddChildMinimal for parser only

Created a separate AddChildMinimal method (calls ReduceMinimal) and changed all 13 AddChild calls in RegexParser to use it. AddChild itself still calls full Reduce(), so reduction methods work correctly. FinalReduce runs both before and after the optimization passes. Also moved the initial FinalReduce outside the RTL/NonBacktracking guard since those patterns also need reduction.

Code: danmoseley@75612fc (diff vs this PR)

Result: All 1,043 tests pass, but slower:

Variant Per-pattern (μs)
This PR (reduce during parse + FinalReduce) 4.08 +/- 0.07
Deferred (ReduceMinimal during parse + 2x FinalReduce) 4.29 +/- 0.07
Difference +0.21 μs (5%, t=8.2, p << 0.01)

Deferred phase breakdown:

Phase Per-pattern (μs) % of total
Parse + ReduceMinimal 2.3 54%
Pre-optimization FinalReduce 0.8 18%
FindAndMakeLoopsAtomic 0.5 12%
EliminateEndingBacktracking 0.2 4%
Post-optimization FinalReduce 0.5 12%
Total 4.2

Deferring saves ~0.4 μs/pattern during parse but adds 0.8 μs for the pre-optimization FinalReduce tree walk.

Conclusions

The deferred approach is both more complex and slower:

  1. More code: Needs a new AddChildMinimal method, changes to 13 parser call sites, FinalReduce moved outside the RTL/NonBacktracking guard, and two full tree-walk reduction passes instead of one.
  2. Slower: 4.29 vs 4.08 μs/pattern (5% regression, statistically significant). The parse-time savings from skipping full Reduce are more than offset by the additional tree walk.
  3. Fundamental constraint: Reduction methods (ReduceAlternation, etc.) create new nodes via AddChild and depend on full reduction happening there. You can't simply remove Reduce from AddChild -- you need a two-track system (minimal for parser, full for reducers).

The current PR approach -- reduce during parse + one FinalReduce at end -- appears to be the simplest correct approach. The integrated parse-time reduction is essentially free (no extra tree walk), and only one post-optimization FinalReduce pass is needed.

@danmoseley
Copy link
Member Author

danmoseley commented Mar 7, 2026

Doing some more investigations

  • Recap how many patterns actually benefit from FinalReduce and whether it might be a better tradeoff to not change this at all
  • Is there a way to make FinalReduce cheaper (one possibility, dirty flag to skip patterns that don't benefit)
  • Are there real world patterns where FinalReduce is a significant cost ie., what is the cost distribution tail.

@danmoseley
Copy link
Member Author

Follow-up: Deep analysis of FinalReduce impact and cost distribution

After the initial experiments, I ran three deeper investigations to understand FinalReduce's real-world behavior and whether the overhead can be reduced.

Investigation 1: How many patterns actually benefit from FinalReduce?

Tested all 15,810 parseable patterns from the corpus. FinalReduce produces a different tree for only 88 patterns (0.6%). Those 88 patterns had 104 total nodes changed (avg 1.2 per pattern). The remaining 99.4% get a full tree walk that finds nothing to change.

Investigation 2: Can we skip FinalReduce when it's not needed?

I prototyped a dirty-flag approach: set a ThreadStatic boolean in MakeLoopAtomic and EliminateEndingBacktracking when they create potentially-reducible structures (Atomic wrappers, Empty/Multi node conversions, type changes under existing Atomic parents). Then skip FinalReduce when the flag isn't set.

Results:

  • 79% of patterns skip FinalReduce entirely (flag not set)
  • The flag requires annotations in 6 locations with precise conditions (e.g., "set flag only when parent is Atomic")
  • All 431 PatternsReduceIdentically tests still pass
Variant Per-pattern vs Baseline
Baseline (no FinalReduce) 3.51 +/- 0.06 us --
This PR (unconditional) 4.08 +/- 0.07 us +16%
Dirty-flag early-exit 3.79 +/- 0.17 us +8%

The early-exit saves ~0.3 us/pattern, cutting overhead from +16% to +8%. However, it requires maintaining dirty-flag annotations across 6 code locations in two complex methods, and the conditions are subtle (e.g., only flag when parent is Atomic). Missing a flag causes silent correctness loss. Not worth the complexity for <0.3 us.

Investigation 3: Per-pattern FinalReduce cost distribution

Measured FinalReduce cost individually for each of the 15,810 patterns:

Percentile Cost
Median (p50) 0.3 us
p90 1.3 us
p95 1.9 us
p99 4.5 us
p99.9 15.1 us
Max 106.5 us

Distribution buckets:

Range Count %
< 0.1 us 3,464 21.9%
0.1 - 0.5 us 6,572 41.6%
0.5 - 1.0 us 3,432 21.7%
1.0 - 5.0 us 2,215 14.0%
5.0 - 10 us 94 0.6%
> 10 us 33 0.2%

The top 1% of patterns (158) account for 15.3% of total FinalReduce cost. The expensive tail is dominated by large alternation patterns -- timezone name lists, mobile UA detection strings, SQL keyword lists, and cron expressions. These patterns have deep trees with many nodes to walk.

Conclusions

  1. FinalReduce's overhead is dominated by the tree walk, not actual reduction. 99.4% of patterns walk the tree and find nothing.
  2. An early-exit optimization is possible but the complexity-to-benefit ratio is poor: 6 flag locations for 0.3 us savings.
  3. The per-pattern cost is well-behaved: median 0.3 us, p99 at 4.5 us. The expensive tail (>10 us) affects only 0.2% of patterns and those are inherently large patterns where all parse phases are proportionally more expensive.
  4. The current approach (unconditional FinalReduce) is the right tradeoff: simple, correct, and the 0.57 us/pattern overhead is small relative to the 3.51 us baseline parse cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-examine order / structure of regex optimization passes

3 participants