Rework RegexGenerator incremental source model to support value equality#125438
Rework RegexGenerator incremental source model to support value equality#125438eiriktsarpalis wants to merge 36 commits intomainfrom
Conversation
Add RegexGeneratorIncrementalTests.cs with 18 tests covering: - Same compilation caching (SameInput_DoesNotRegenerate) - Unrelated code changes (UnrelatedCodeChange_DoesNotRegenerate) - Pattern/options/timeout/culture changes triggering regeneration - Declaring type and member name changes - Method add/remove scenarios - Property vs method support - Complex patterns with unrelated changes (Theory with 5 patterns) - Limited support regex (NonBacktracking) caching - Multiple regex methods with partial changes These tests establish the baseline behavior before refactoring the generator's incremental pipeline for proper value equality. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gex generator Add ImmutableEquatableDictionary<TKey, TValue> to Common/src/SourceGenerators/ following the same pattern as the existing ImmutableEquatableArray<T>. Link ImmutableEquatableArray, ImmutableEquatableDictionary, and HashHelpers to the Regex generator project for use in the upcoming equatable model types. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…line Add gen/Model/ directory with immutable, structurally equatable record types: - RegexGenerationSpec: top-level envelope (incremental cache boundary) - RegexMethodSpec: per-method data with declaring type, pattern, options, tree - RegexTreeSpec: equatable regex tree with baked-in analysis results - RegexNodeSpec: equatable tree node with analysis flags per node - FindOptimizationsSpec: equatable find optimizations - FixedDistanceSetSpec: equatable fixed-distance set - RegexTypeSpec: declaring type hierarchy Also add required member polyfills for netstandard2.0 compatibility. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ssion Rewrites the RegexGenerator Initialize() pipeline to: 1. Collect per-method results into a single batch 2. Run a Parser that builds an equatable RegexGenerationSpec model 3. Split into source model (with value equality) and diagnostics providers 4. Register separate source outputs for model and diagnostics The equatable model (RegexGenerationSpec -> RegexMethodSpec -> RegexTreeSpec -> RegexNodeSpec) enables Roslyn to skip the emitter when regex patterns/options haven't changed. During emission, the spec is converted back to the mutable RegexMethod/RegexTree types by re-parsing the regex pattern. This re-parse is fast (~1ms) and only occurs on incremental cache misses, while the expensive emission (10-100ms) is fully skipped on cache hits. Also fixes CultureChange_Regenerates test to check incremental step reason rather than output string equality, since culture changes may not always produce different generated code. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
There was a problem hiding this comment.
Pull request overview
This pull request refactors the RegexGenerator incremental source generator pipeline to use deeply equatable model types, enabling Roslyn incremental caching to skip expensive emission when regex inputs haven’t changed.
Changes:
- Replaced the heterogeneous
ImmutableArray<object>incremental model with equatable*Specmodel records (usingImmutableEquatableArray/ newImmutableEquatableDictionary). - Reworked the pipeline to split source emission from diagnostic reporting and re-parse the regex during emission only on cache misses.
- Added a new incremental-caching-focused test suite and wired it into the functional test project.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/System.Text.RegularExpressions.Tests.csproj | Adds the new incremental generator tests to the test project. |
| src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs | New tests validating incremental caching behavior via tracked generator steps. |
| src/libraries/System.Text.RegularExpressions/gen/System.Text.RegularExpressions.Generator.csproj | Links shared equatable collection helpers and required-member polyfills into the generator project. |
| src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs | Pipeline rewrite to produce a structurally equatable model and re-parse on emission. |
| src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Conversion.cs | Converts RegexTree / analysis into equatable *Spec snapshots. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexGenerationSpec.cs | New top-level equatable incremental cache boundary model. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.cs | New per-member equatable model for generation inputs. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexTreeSpec.cs | New equatable representation of regex trees and baked-in analysis flags. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexNodeSpec.cs | New equatable representation of regex parse nodes. |
| src/libraries/System.Text.RegularExpressions/gen/Model/RegexTypeSpec.cs | New equatable representation of declaring type hierarchy. |
| src/libraries/System.Text.RegularExpressions/gen/Model/FindOptimizationsSpec.cs | New equatable representation of find-optimization state. |
| src/libraries/System.Text.RegularExpressions/gen/Model/FixedDistanceSetSpec.cs | New equatable representation of fixed-distance set optimization data. |
| src/libraries/Common/src/SourceGenerators/ImmutableEquatableDictionary.cs | New shared equatable dictionary helper for structural equality in generator models. |
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/Model/RegexNodeSpec.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/SourceGenerators/ImmutableEquatableDictionary.cs
Outdated
Show resolved
Hide resolved
…mitter - F1: Extract LiteralAfterLoopSpec record from nested tuple in FindOptimizationsSpec for clarity. - F2+F3: Rewrite RegexGenerator.Parser.cs with unified parsing flow. Parse() is the outer method accepting GeneratorAttributeSyntaxContext batch; ParseMethod() handles each member end-to-end (attribute extraction + regex parsing + spec conversion). Eliminates the intermediate RegexPatternAndSyntax type and ImmutableArray<object> boxing. - F4: Move Emit() orchestration and ConvertRegexTypeSpecToRegexType to RegexGenerator.Emitter.cs. RegexGenerator.cs now only contains Initialize() pipeline wiring, constants, and CompilationData. - Fix parent chain construction for immutable RegexTypeSpec records (build outside-in instead of mutating). - Remove unused usings from slimmed RegexGenerator.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ToImmutableEquatableDictionary<TKey, TValue>(this Hashtable) extension method following the existing factory pattern, and simplify the ConvertHashtable caller in Conversion.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
You can also share your feedback on Copilot code review. Take the survey.
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs
Show resolved
Hide resolved
src/libraries/Common/src/SourceGenerators/ImmutableEquatableDictionary.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Show resolved
Hide resolved
- Add ImmutableEquatableSet<T> (from PolyType) providing order- independent structural equality via HashSet<T>. Use it for RegexGenerationSpec.RegexMethods to prevent spurious cache misses from non-deterministic ordering of GeneratorAttributeSyntaxContext. - Fix null-safety bug in ImmutableEquatableDictionary.Equals: use EqualityComparer<TValue>.Default.Equals instead of direct call that would NRE on null values. - Remove unused keySelector/valueSelector and IEnumerable<KVP> overloads from ImmutableEquatableDictionary factory. - Inline ConvertHashtable calls in Conversion.cs. - Extract AddDiagnostic helper to reduce noise in Parser.cs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…culture - Replace var with explicit types throughout changed code - Change RegexMethod/RegexType from internal sealed record to private sealed class - Add using var to runnerSw/runnerWriter for proper disposal - Include CultureName in dedup key for emitted expressions - Remove unused using SourceGenerators from Emitter.cs - Add using System.Text to incremental tests - Strengthen SameInput test with cached/unchanged assertion Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
53e38f5 to
93dd4a4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs:150
- The spec currently doesn’t preserve the
cultureNameargument from[GeneratedRegex].CreateRegexTreeSpeclater relies ontree.Culture?.Name, butRegexParser.ParsesetsRegexTree.Cultureto non-null only for ignore-case backreferences; for most ignore-case regexes the tree culture will be null. That means the emitter may re-parse usingInvariantCultureeven when a culture was specified, potentially generating incorrect code for culture-sensitive case folding. Store the requested/selected culture name directly onRegexMethodSpec(or alongsideTree) and use that for emission and de-duping.
string? pattern = items[0].Value as string;
int? options = null;
int? matchTimeout = null;
string? cultureName = string.Empty;
if (items.Length >= 2)
{
options = items[1].Value as int?;
if (items.Length == 4)
{
matchTimeout = items[2].Value as int?;
cultureName = items[3].Value as string;
}
// If there are 3 parameters, we need to check if the third argument is
// int matchTimeoutMilliseconds, or string cultureName.
else if (items.Length == 3)
{
if (items[2].Type?.SpecialType == SpecialType.System_Int32)
{
matchTimeout = items[2].Value as int?;
}
else
{
cultureName = items[2].Value as string;
}
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/SourceGenerators/ImmutableEquatableSet.cs
Outdated
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs
Outdated
Show resolved
Hide resolved
Sort method specs by fully-qualified type name before ID assignment and emission to ensure deterministic output regardless of HashSet iteration order (which varies with string hash randomization). Add test for CompilationData (AllowUnsafe) change triggering regeneration. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Move CompilationData record struct to gen/Model/CompilationData.cs - Change Parse() accumulators from HashSet/ImmutableArray.Builder to List - Reorder declarations so methods come before diagnostics Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
You can also share your feedback on Copilot code review. Take the survey.
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Conversion.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
...aries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorIncrementalTests.cs
Outdated
Show resolved
Hide resolved
Inline diagnostic accumulation using (diagnostics ??= []).Add(...). Remove sort in Emit() since ImmutableEquatableSet iteration order is deterministic within a process and the emitter only runs on cache misses. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
6421f6a to
4740968
Compare
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs
Show resolved
Hide resolved
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Emitter.cs
Show resolved
Hide resolved
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.Equality.cs
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/libraries/Common/src/SourceGenerators/ImmutableEquatableSet.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/Model/RegexMethodSpec.EqualityComparison.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 Copilot Code Review — PR #125438Note This review was generated by Copilot (Claude Opus 4.6), with cross-validation from GPT-5.2. Holistic AssessmentMotivation: The PR addresses a real and important problem — the RegexGenerator's incremental pipeline was not properly caching because the model types ( Approach: The approach of wrapping Summary: ✅ LGTM. The change is well-designed, thoroughly tested, and the equality implementation correctly covers all fields that affect code generation. Both independent model reviews converged on the same conclusion. Two minor style and robustness observations are noted below as non-blocking suggestions. Detailed Findings✅ Equality Implementation — All code-gen-relevant fields are coveredVerified the
Hash codes are consistent with [Flagged by both Claude Opus 4.6 and GPT-5.2] ✅ Pipeline Architecture — Correct incremental patternThe pipeline restructuring is sound:
The deterministic sorting in Bug fix included: The dedup key now includes [Flagged by both models] ✅ Test Coverage — Comprehensive incremental verificationThe 891-line test file covers the critical scenarios:
Tests use 💡 ImmutableEquatableSet.cs — Missing blank line before namespace
using System.Collections.Generic;
using System.Diagnostics;
+
namespace SourceGenerators💡 AnalysisEquals — Implicitly relies on RegexTreeEquals having passed first
// In AnalysisEquals, inside the loop body (after line 163):
int childCount = left.ChildCount();
Debug.Assert(childCount == right.ChildCount());
for (int i = childCount - 1; i >= 0; i--)[Flagged by GPT-5.2, confirmed by Claude Opus 4.6]
|
Fixes #125409
Discovered while working on #125404 — the RegexGenerator's incremental source model lacked proper value equality, causing the source output callback to fire on every compilation change regardless of whether regex patterns had actually changed.
Approach
Note
AI-generated description update via Copilot.
Refactors the generator to keep the regular parsed
RegexMethod/RegexTree/AnalysisResultsobjects in the parser step, while wrapping each parsed method in a dedicated publicRegexMethodSpecmodel that provides structural equality for incremental caching.This avoids the earlier mirrored
*Specobject graph and also avoids re-parsing in the emitter. The emitter now consumes the parsed tree and analysis directly, and Roslyn can skip source emission when the structural regex state is unchanged.Focused incremental tests cover deterministic ordering, deep-nesting fallback, and signature-affecting changes.