Optimize regex patterns with both beginning and end anchors for early fail-fast#120916
Merged
stephentoub merged 6 commits intomainfrom Oct 23, 2025
Merged
Optimize regex patterns with both beginning and end anchors for early fail-fast#120916stephentoub merged 6 commits intomainfrom
stephentoub merged 6 commits intomainfrom
Conversation
Implement fail-fast optimization for patterns like ^abc\z that have both a leading Beginning anchor and trailing End anchor with fixed length. Instead of checking if input is at least N characters, check if input is exactly N characters AND position is 0. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Improve fail fast behavior for fixed-length regex patterns
Optimize regex patterns with both beginning and end anchors for early fail-fast
Oct 20, 2025
stephentoub
reviewed
Oct 20, 2025
.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFindOptimizations.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
- Remove trailing anchor computation from RegexFindOptimizations.cs - Compute trailing anchor lazily in RegexGenerator.Emitter.cs only when needed - Add same optimization to RegexCompiler.cs for consistency This avoids adding overhead to the interpreter while keeping benefits for compiled/generated code Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
stephentoub
reviewed
Oct 22, 2025
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes regex patterns with both beginning (^ or \A) and end (\z) anchors where the pattern has a fixed length. The optimization enables earlier fail-fast by checking if the input length exactly matches the pattern length (instead of just checking minimum length), avoiding unnecessary entry into TryMatchAtCurrentPosition when the length doesn't match.
- Moved trailing anchor computation from
RegexFindOptimizations.csto the emitters (source generator and compiler) to avoid interpreter overhead - Added exact length check for patterns with both beginning and end anchors with fixed length
- Added comprehensive test cases covering exact match, too long, too short, and offset scenarios
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| Regex.Match.Tests.cs | Added test cases verifying the optimization works for various scenarios with fixed-length anchored patterns |
| RegexCompiler.cs | Added optimization logic to compiled regexes for exact length checking when both anchors are present |
| RegexGenerator.Emitter.cs | Added optimization logic to source-generated regexes for exact length checking when both anchors are present |
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
- Simplify trailing anchor check to directly compare ComputeMaxLength() with MinRequiredLength - Remove unnecessary success label in RegexCompiler and use Bne(returnFalse) directly - Move minRequiredLength declaration after the condition check Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
stephentoub
reviewed
Oct 22, 2025
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
- Remove unnecessary braces around if statements (no longer needed after removing extra local variables) - Update comment in RegexCompiler.cs to use 'minRequiredLength' instead of 'exactLength' - Update comment in RegexGenerator.Emitter.cs to mention both '^' and '\A' anchors (^/\A) Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
stephentoub
reviewed
Oct 23, 2025
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Show resolved
Hide resolved
…variations - Update comment in RegexGenerator.Emitter.cs to use 'leads with' wording - Inline MinRequiredLength directly in Ldc call in RegexCompiler.cs - Add test variations for edge cases: without starting anchor, with Multiline, with \Z/$ anchors, with content before/after anchors Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
stephentoub
approved these changes
Oct 23, 2025
Contributor
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
tarekgh
approved these changes
Oct 23, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue Summary
Optimize regex patterns with both leading Beginning anchor (^) and trailing End anchor (\z) with fixed length. Instead of checking if input is at least N characters, check if input is exactly N characters AND position is 0.
Implementation Details (Updated)
Modified three files:
The optimization only applies when:
Changes from Previous Versions
Test Results
✅ All unit tests pass (1,005 tests)
✅ All functional tests pass (30,391 tests - includes 56 new tests for edge cases)
✅ No security vulnerabilities detected
Security Summary
No security vulnerabilities were introduced or discovered.
Original prompt
Fixes #118489
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.