Fix regex compiler/source generator resumeAt handling of conditionals inside loops#126561
Fix regex compiler/source generator resumeAt handling of conditionals inside loops#126561stephentoub merged 3 commits intomainfrom
Conversation
… inside loops Update EmitExpressionConditional to reset resumeAt when inside loops, preventing stale values and incorrect matches.
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
Fixes a compiled/source-generated regex correctness bug where EmitExpressionConditional could push a stale resumeAt value when the conditional is inside a loop, leading to incorrect multiple-match results and (per linked issue) possible exceptions.
Changes:
- Update
EmitExpressionConditionalin both the JIT compiler (RegexCompiler) and the source generator emitter to always setresumeAtwhen the conditional is within a loop. - Add functional tests covering expression conditionals with balancing groups and alternation inside loop constructs to prevent regressions.
Show a summary per file
| File | Description |
|---|---|
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs |
Adds regression test cases for expression-conditionals with balancing groups/alternation inside quantified loops across engines (excluding NonBacktracking). |
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs |
Ensures resumeAt is reset for each branch when the conditional is in a loop, preventing stale branch state from being pushed. |
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs |
Mirrors the RegexCompiler fix for source-generated regex code emission. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 0
|
I set a copilot to in background look for missing tests and simpler tests. |
Add 4 more test cases covering:
- Auto-numbered capture groups with dot and literal patterns
- Alternation in no-branch with empty second branch
- Quantified balancing group pop {2}
Move all conditional/balancing group tests outside #if !NETFRAMEWORK
since this bug is specific to the .NET Core regex compiler rewrite
and these patterns work correctly on .NET Framework.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in the regex compiler and source generator where EmitExpressionConditional could reuse a stale resumeAt value when the conditional occurs inside a loop, leading to incorrect matches (and exceptions) compared to the interpreter.
Changes:
- Update
RegexCompiler.EmitExpressionConditionalto always setresumeAtfor each executed branch when the conditional is inside a loop. - Apply the same
resumeAthandling fix in the source generator emitter. - Add regression coverage for balancing-group conditionals inside loops in the multiple-matches functional tests.
Show a summary per file
| File | Description |
|---|---|
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs |
Adds test cases covering expression conditionals with balancing groups inside loops across engines (excluding non-backtracking). |
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs |
Ensures resumeAt is reset/set when inside loops to prevent stale branch selection during backtracking. |
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs |
Mirrors the compiler fix in generated code so source-generated regex behaves consistently. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 0 new
|
Note This review was generated by Copilot. Regex Conditional-in-Loop Fix — Code Review✅ (1) Fix is correct and complete — all three
|
| Site | EmitBackreferenceConditional (reference) |
EmitExpressionConditional (after fix) |
|---|---|---|
| resumeAt=0 | (!isAtomic && post != orig) || isInLoop (RegexCompiler.cs:2333) |
(!isAtomic && post != orig) || isInLoop (RegexCompiler.cs:2532) ✅ |
| resumeAt=1 | same (RegexCompiler.cs:2356) | same (RegexCompiler.cs:2563) ✅ |
| resumeAt=2 | same (RegexCompiler.cs:2368) | same (RegexCompiler.cs:2575) ✅ |
Both RegexCompiler.cs and RegexGenerator.Emitter.cs are updated in lockstep. ✅
No other sites in EmitExpressionConditional need the isInLoop check. The gate condition at line 2585/2577:
if (isAtomic || (postYesDoneLabel == originalDoneLabel && postNoDoneLabel == originalDoneLabel))
```
correctly skips the backtracking section when neither branch backtracks — in that case there is no switch on `resumeAt`, so no stale value can cause harm, regardless of loop status. The stack push/pop at lines 2602–2640 (compiler) / 2594–2620 (emitter) is already inside the `else` block and gated on `isInLoop`. ✅
---
### ✅ (2) `isInLoop` true while both post-labels equal `originalDoneLabel` — harmless dead code
This can happen when the conditional is inside a loop but neither branch introduces backtracking. In that case:
- The `|| isInLoop` causes `resumeAt` assignments to execute (dead stores).
- The gate condition takes the **if**-branch (line 2585), so no backtracking section, no stack push/pop, and no switch on `resumeAt`.
- Since `isInLoop` is only set when `!isAtomic` (lines 2480–2484), and `resumeAt` is only declared when `!isAtomic`, there's no risk of writing to an uninitialized local or undeclared variable.
Net effect: a harmless extra store that never gets read. Same pattern as `EmitBackreferenceConditional`. ✅
---
### ⚠️ (3) Tests are good for resumeAt=0 and resumeAt=1, but miss the resumeAt=2 (pass-through, no no-branch) case
Every new test pattern uses `(?(condition)|no-branch)+` — the yes branch is always empty and a no-branch is always present. This means:
- **resumeAt=0** (yes branch): exercised via `|| isInLoop` because the empty yes branch doesn't change `doneLabel` (`postYesDoneLabel == originalDoneLabel`), so the old condition would have been false. Good.
- **resumeAt=1** (no branch): exercised both ways — some tests have a backtracking no-branch (e.g., `(.)+`), and `(?((?'-1'))|(?'1'a))+` has a non-backtracking no-branch where only `|| isInLoop` triggers the assignment.
- **resumeAt=2** (pass-through, no no-branch): **NOT tested.** This requires a pattern like `(?(condition)yes-branch)+` (no `|`) where the yes branch backtracks, inside a loop. The fix for this case is structurally identical to the other two, but a regression test would harden it. Consider adding e.g.:
```
@"(?((?'-1'))(?'1'.)+)+(?!(?'-1'))"
```
---
### ✅ (4) NETFRAMEWORK guard / comment change is correct
The new tests are placed **before** the `#if !NETFRAMEWORK` block, so they run on all target frameworks. This is appropriate — the tests exercise balancing groups via `IsNonBacktracking` exclusion, not a framework-specific API.
The comment simplification from:
```
// these tests currently fail on .NET Framework, and we need to check IsDynamicCodeCompiled but that doesn't exist on .NET Framework
```
to:
```
// these tests currently fail on .NET Frameworkis a benign editorial cleanup — the removed clause was about the tests below the guard, not the new ones. ✅
Generated by Code Review for issue #126561 · ◷
|
/ba-g unrelated |
…#126657) ## Description The Code Review workflow on PR #126561 identified that the new tests cover `resumeAt=0` (yes-branch) and `resumeAt=1` (no-branch) paths but miss the `resumeAt=2` pass-through path — expression conditional with only a yes-branch (no no-branch) inside a loop. Adds 3 test cases to `Regex.MultipleMatches.Tests.cs`: - **Yes-only conditional, condition always fails** — `(?((?'-1'))(?'1'.)+)+(?!(?'-1'))` with `"abc"` exercises the pass-through path producing empty matches at each position (guarded with `#if !NETFRAMEWORK` since .NET Framework produces 3 empty matches vs 4 on .NET Core due to different empty-match-at-end-of-string semantics) - **Yes-only conditional with prefix capture, even input** — `((?'1'.)(?((?'-1'))(?'1'.)))+` with `"abcd"` exercises the yes-branch-taken path inside a loop - **Yes-only conditional with prefix capture, odd input** — same pattern with `"abc"` verifying partial match behavior The latter two test cases are the primary coverage for the `|| isInLoop` fix, since their non-backtracking yes-branch means `isInLoop` is the necessary trigger for `resumeAt = 2` (the old condition `postYesDoneLabel != originalDoneLabel` would have been false). These run on both .NET Core and .NET Framework. All 32,100 existing tests continue to pass. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com> Co-authored-by: Dan Moseley <danmose@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update EmitExpressionConditional to reset resumeAt when inside loops, preventing stale values and incorrect matches.
Fixes #126556