Skip to content

Fix regex compiler/source generator resumeAt handling of conditionals inside loops#126561

Merged
stephentoub merged 3 commits intomainfrom
stoub/fix126556
Apr 8, 2026
Merged

Fix regex compiler/source generator resumeAt handling of conditionals inside loops#126561
stephentoub merged 3 commits intomainfrom
stoub/fix126556

Conversation

@stephentoub
Copy link
Copy Markdown
Member

Update EmitExpressionConditional to reset resumeAt when inside loops, preventing stale values and incorrect matches.

Fixes #126556

… inside loops

Update EmitExpressionConditional to reset resumeAt when inside loops, preventing stale values and incorrect matches.
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a compiled/source-generated regex correctness bug where EmitExpressionConditional could push a stale resumeAt value when the conditional is inside a loop, leading to incorrect multiple-match results and (per linked issue) possible exceptions.

Changes:

  • Update EmitExpressionConditional in both the JIT compiler (RegexCompiler) and the source generator emitter to always set resumeAt when the conditional is within a loop.
  • Add functional tests covering expression conditionals with balancing groups and alternation inside loop constructs to prevent regressions.
Show a summary per file
File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs Adds regression test cases for expression-conditionals with balancing groups/alternation inside quantified loops across engines (excluding NonBacktracking).
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Ensures resumeAt is reset for each branch when the conditional is in a loop, preventing stale branch state from being pushed.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Mirrors the RegexCompiler fix for source-generated regex code emission.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 0

@stephentoub stephentoub enabled auto-merge (squash) April 6, 2026 00:26
@danmoseley
Copy link
Copy Markdown
Member

I set a copilot to in background look for missing tests and simpler tests.

Add 4 more test cases covering:
- Auto-numbered capture groups with dot and literal patterns
- Alternation in no-branch with empty second branch
- Quantified balancing group pop {2}

Move all conditional/balancing group tests outside #if !NETFRAMEWORK
since this bug is specific to the .NET Core regex compiler rewrite
and these patterns work correctly on .NET Framework.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more tests. LGTM

@github-actions

This comment has been minimized.

Copilot AI review requested due to automatic review settings April 8, 2026 13:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the regex compiler and source generator where EmitExpressionConditional could reuse a stale resumeAt value when the conditional occurs inside a loop, leading to incorrect matches (and exceptions) compared to the interpreter.

Changes:

  • Update RegexCompiler.EmitExpressionConditional to always set resumeAt for each executed branch when the conditional is inside a loop.
  • Apply the same resumeAt handling fix in the source generator emitter.
  • Add regression coverage for balancing-group conditionals inside loops in the multiple-matches functional tests.
Show a summary per file
File Description
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs Adds test cases covering expression conditionals with balancing groups inside loops across engines (excluding non-backtracking).
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCompiler.cs Ensures resumeAt is reset/set when inside loops to prevent stale branch selection during backtracking.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs Mirrors the compiler fix in generated code so source-generated regex behaves consistently.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 0 new

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Note

This review was generated by Copilot.

Regex Conditional-in-Loop Fix — Code Review

✅ (1) Fix is correct and complete — all three resumeAt sites are patched symmetrically

The three changed conditions in EmitExpressionConditional (resumeAt = 0, 1, 2) now match the pattern already established in EmitBackreferenceConditional.

Site EmitBackreferenceConditional (reference) EmitExpressionConditional (after fix)
resumeAt=0 (!isAtomic && post != orig) || isInLoop (RegexCompiler.cs:2333) (!isAtomic && post != orig) || isInLoop (RegexCompiler.cs:2532) ✅
resumeAt=1 same (RegexCompiler.cs:2356) same (RegexCompiler.cs:2563) ✅
resumeAt=2 same (RegexCompiler.cs:2368) same (RegexCompiler.cs:2575) ✅

Both RegexCompiler.cs and RegexGenerator.Emitter.cs are updated in lockstep. ✅

No other sites in EmitExpressionConditional need the isInLoop check. The gate condition at line 2585/2577:

if (isAtomic || (postYesDoneLabel == originalDoneLabel && postNoDoneLabel == originalDoneLabel))
```
correctly skips the backtracking section when neither branch backtracks — in that case there is no switch on `resumeAt`, so no stale value can cause harm, regardless of loop status. The stack push/pop at lines 26022640 (compiler) / 25942620 (emitter) is already inside the `else` block and gated on `isInLoop`.---

### ✅ (2) `isInLoop` true while both post-labels equal `originalDoneLabel` — harmless dead code

This can happen when the conditional is inside a loop but neither branch introduces backtracking. In that case:
- The `|| isInLoop` causes `resumeAt` assignments to execute (dead stores).
- The gate condition takes the **if**-branch (line 2585), so no backtracking section, no stack push/pop, and no switch on `resumeAt`.
- Since `isInLoop` is only set when `!isAtomic` (lines 24802484), and `resumeAt` is only declared when `!isAtomic`, there's no risk of writing to an uninitialized local or undeclared variable.

Net effect: a harmless extra store that never gets read. Same pattern as `EmitBackreferenceConditional`.---

### ⚠️ (3) Tests are good for resumeAt=0 and resumeAt=1, but miss the resumeAt=2 (pass-through, no no-branch) case

Every new test pattern uses `(?(condition)|no-branch)+` — the yes branch is always empty and a no-branch is always present. This means:
- **resumeAt=0** (yes branch): exercised via `|| isInLoop` because the empty yes branch doesn't change `doneLabel` (`postYesDoneLabel == originalDoneLabel`), so the old condition would have been false. Good.
- **resumeAt=1** (no branch): exercised both ways — some tests have a backtracking no-branch (e.g., `(.)+`), and `(?((?'-1'))|(?'1'a))+` has a non-backtracking no-branch where only `|| isInLoop` triggers the assignment.
- **resumeAt=2** (pass-through, no no-branch): **NOT tested.** This requires a pattern like `(?(condition)yes-branch)+` (no `|`) where the yes branch backtracks, inside a loop. The fix for this case is structurally identical to the other two, but a regression test would harden it. Consider adding e.g.:
  ```
  @"(?((?'-1'))(?'1'.)+)+(?!(?'-1'))"
  ```

---

### ✅ (4) NETFRAMEWORK guard / comment change is correct

The new tests are placed **before** the `#if !NETFRAMEWORK` block, so they run on all target frameworks. This is appropriate — the tests exercise balancing groups via `IsNonBacktracking` exclusion, not a framework-specific API.

The comment simplification from:
```
// these tests currently fail on .NET Framework, and we need to check IsDynamicCodeCompiled but that doesn't exist on .NET Framework
```
to:
```
// these tests currently fail on .NET Framework

is a benign editorial cleanup — the removed clause was about the tests below the guard, not the new ones. ✅

Generated by Code Review for issue #126561 ·

@danmoseley
Copy link
Copy Markdown
Member

/ba-g unrelated

@stephentoub stephentoub merged commit 4bd0bf4 into main Apr 8, 2026
87 of 98 checks passed
@stephentoub stephentoub deleted the stoub/fix126556 branch April 8, 2026 19:05
danmoseley added a commit that referenced this pull request Apr 9, 2026
…#126657)

## Description

The Code Review workflow on PR #126561 identified that the new tests
cover `resumeAt=0` (yes-branch) and `resumeAt=1` (no-branch) paths but
miss the `resumeAt=2` pass-through path — expression conditional with
only a yes-branch (no no-branch) inside a loop.

Adds 3 test cases to `Regex.MultipleMatches.Tests.cs`:

- **Yes-only conditional, condition always fails** —
`(?((?'-1'))(?'1'.)+)+(?!(?'-1'))` with `"abc"` exercises the
pass-through path producing empty matches at each position (guarded with
`#if !NETFRAMEWORK` since .NET Framework produces 3 empty matches vs 4
on .NET Core due to different empty-match-at-end-of-string semantics)
- **Yes-only conditional with prefix capture, even input** —
`((?'1'.)(?((?'-1'))(?'1'.)))+` with `"abcd"` exercises the
yes-branch-taken path inside a loop
- **Yes-only conditional with prefix capture, odd input** — same pattern
with `"abc"` verifying partial match behavior

The latter two test cases are the primary coverage for the `|| isInLoop`
fix, since their non-backtracking yes-branch means `isInLoop` is the
necessary trigger for `resumeAt = 2` (the old condition
`postYesDoneLabel != originalDoneLabel` would have been false). These
run on both .NET Core and .NET Framework.

All 32,100 existing tests continue to pass.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Co-authored-by: Dan Moseley <danmose@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compiled conditional regex match different with Interpretor, and it throws exception

3 participants