Add RegexOptions.AnyNewLine via parser lowering#124701
Add RegexOptions.AnyNewLine via parser lowering#124701danmoseley wants to merge 26 commits intodotnet:mainfrom
Conversation
Add AnyNewLine = 0x0800 to RegexOptions enum. Update ValidateOptions to bump MaxOptionShift to 12 and reject AnyNewLine | NonBacktracking. ECMAScript already rejects unknown options via allowlist. Update source generator to include AnyNewLine in SupportedOptions mask. Update tests that used 0x800 as an invalid option value to use 0x1000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When AnyNewLine is set without Multiline, lower $ from EndZ into an equivalent sub-tree: (?=\r\n\z|\r?\z)|(?<!\r)(?=\n\z) This matches at end of string, or before \r\n, \r, or \n at end of string, but not between \r and \n. Works across all engines (interpreter, compiled, source generator) since it's pure parser lowering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When AnyNewLine is set, lower \Z using the same sub-tree as $ without Multiline. \Z is not affected by Multiline, so the same lowering applies regardless. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When both Multiline and AnyNewLine are set, lower $ to: (?=\r\n|\r|\z)|(?<!\r)(?=\n) This matches at \r\n, \r, \n boundaries and end-of-string, without matching between \r and \n of a \r\n sequence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When both Multiline and AnyNewLine are set, lower ^ to: (?<=\A|\r\n|\n)|(?<=\r)(?!\n) This matches after \r\n, \n, bare \r (not followed by \n), and at start of string. Without Multiline, ^ remains \A unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When AnyNewLine is set (without Singleline), lower . to [^\n\r] instead of [^\n], so dot does not match \r or \n. Add NotNewLineOrCarriageReturnClass constant to RegexCharClass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Combined ^/$/. tests, Replace/Split, RightToLeft, mixed newlines, empty lines, \Z with trailing newlines, and edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Integration tests using a ~50 char string with all newline types (\r\n, \r, \n, \u0085, \u2028, \u2029) exercising ^, $, \Z, and . together. Replace/Split tests with MatchEvaluator line numbering. Deduplicated cases moved into per-feature tests (RightToLeft, empty lines). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expand test coverage across all AnyNewLine-affected constructs:
- Dollar, EndZ, DollarMultiline, CaretMultiline, Dot test data
with adjacent newlines, newlines at string boundaries,
empty segments, RightToLeft, and all Unicode newline types
- Advanced tests: inline options, backreferences, conditionals,
alternation with anchors, lookahead/lookbehind, quantified dot,
lazy quantifiers, named/atomic groups, word boundaries near
newlines, explicit char classes unaffected
- Methods test: IsMatch, Count, EnumerateMatches, Match with
startat, Replace with group ref, Split
- Unicode expansion: \s/\S behavior, \w behavior, \p{Zl}/\p{Zp}
categories, adjacent Unicode+ASCII newlines, baselines without
AnyNewLine
No bugs found — all initial test failures were wrong expectations.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Verify the fixer correctly emits RegexOptions.Multiline | RegexOptions.AnyNewLine in enum value order when upgrading to GeneratedRegex. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Test cases derived from cross-validation with PCRE2 NEWLINE_ANY behavior (BSD-licensed) and analysis of real-world patterns from dotnet/runtime-assets: - (.+)# greedy where .+ cannot cross newlines (PCRE2 JIT 472) - (.)(.) requiring consecutive non-newlines (PCRE2 JIT 471) - (.). with mixed newline types (PCRE2 JIT 469) - Blank line detection (^ +$) with \n, \r\n, \u0085 separators All 31,528 tests pass. No bugs found — our implementation is fully consistent with PCRE2 NEWLINE_ANY behavior and handles real-world patterns correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add more RightToLeft + AnyNewLine tests (various newline types, dot, anchors, \Z) - Add more Singleline | AnyNewLine tests (all newline types, combined with Multiline) - Replace RegexOptions.AnyNewLine with RegexHelpers.RegexOptionAnyNewLine throughout tests for net481 compilation compatibility - Wrap Count/EnumerateMatches in #if NET for net481 compat - Add clarifying comments on Split behavior with/without AnyNewLine Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
(Finally got around to having AI finish my lowering branch..) |
There was a problem hiding this comment.
Pull request overview
This pull request implements RegexOptions.AnyNewLine (value 0x0800 = 2048), a new regex option that makes ^, $, \Z, and . recognize all Unicode line boundaries (\r, \r\n, \n, \u0085 NEL, \u2028 LS, \u2029 PS) instead of only \n. This addresses a major usability issue where users had to manually work around .NET's hardcoded \n-only line ending behavior.
Changes:
- Added
RegexOptions.AnyNewLine = 0x0800enum value with incompatibility checks for NonBacktracking and ECMAScript modes - Implemented parser-level lowering of
^,$,\Z, and.into equivalent lookaround-based RegexNode trees when AnyNewLine is enabled - Added comprehensive test coverage (~800 new test lines) covering all anchor types, newline combinations, RightToLeft mode, inline options, and edge cases
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexOptions.cs |
Added AnyNewLine = 0x0800 enum value with XML documentation |
src/libraries/System.Text.RegularExpressions/ref/System.Text.RegularExpressions.cs |
Updated ref assembly with AnyNewLine = 2048 |
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs |
Updated MaxOptionShift to 12 and added AnyNewLine to NonBacktracking incompatibility check |
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs |
Implemented lowering methods (AnyNewLineEndZNode, AnyNewLineEolNode, AnyNewLineBolNode) and integrated into ^, $, \Z, . parsing |
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs |
Added NotNewLineOrCarriageReturnClass constant for . with AnyNewLine |
src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs |
Added AnyNewLine to source generator's supported options |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs |
Added ~800 lines of comprehensive tests for all anchor types, newline combinations, and edge cases |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Tests.Common.cs |
Added RegexOptionAnyNewLine constant for test compatibility |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs |
Updated invalid option test from 0x800 to 0x1000; added NonBacktracking+AnyNewLine incompatibility test |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs |
Updated invalid option comments and tests from 0x800 to 0x1000 |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.EnumerateMatches.Tests.cs |
Updated invalid option tests from 0x800 to 0x1000 |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorParserTests.cs |
Updated invalid option tests from 0x800 to 0x1000 |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs |
Updated tests for 0x1000 as invalid option; added AnyNewLine test case for code fixer |
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs
Show resolved
Hide resolved
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@MihuBot benchmark Regex |
|
See benchmark results at https://gist.github.com/MihuBot/c7399f4f318e4febcfd0018436d5fe53 |
|
Mihubot confirms zero perf impact on existing patterns/options, |
AnyNewLine Performance Analysis (Release, Compiled, .NET 11.0, BenchmarkDotNet)Measured impact of converting existing newline-workaround patterns to simplified AnyNewLine equivalents. All scenarios use Section 1: Real-World Patterns on Windows
|
| Old Pattern | New Pattern (+ AnyNewLine) | Old (us) | New (us) | Ratio |
|---|---|---|---|---|
^.+\r?$ (1K lines) |
^.+$ |
46.7 | 48.8 | 1.05x |
^.+\r?$ (10K lines) |
^.+$ |
1,694 | 1,760 | 1.04x |
\[assembly:...\]\s*$(\r?\n)? |
\[assembly:...\]\s*$ |
38.3 | 32.4 | 0.85x |
^([^\s:]+):\s*(.+?)\r?$ |
^([^\s:]+):\s*(.+?)$ |
105.9 | 105.9 | 1.00x |
^# .+\r?$ |
^# .+$ |
11.1 | 9.1 | 0.83x |
^.+\r?$ (CSV, 1K rows) |
^.+$ |
44.4 | 49.2 | 1.11x |
[^\r\n]+ |
.+ |
44.2 | 43.8 | 0.99x |
\w+\r?$ |
\w+$ |
90.8 | 128.7 | 1.42x |
(?:^|\r\n)\w+ |
^\w+ |
208.7 | 214.5 | 1.03x |
Section 2: Unix \n Text (overhead of just enabling the flag)
| Old Pattern | New Pattern (+ AnyNewLine) | Old (us) | New (us) | Ratio |
|---|---|---|---|---|
^.+$ |
^.+$ |
43.5 | 48.9 | 1.12x |
[^\n]+ |
.+ |
39.0 | 44.9 | 1.15x |
Section 3: Mixed \n/\r\n Text
| Old Pattern | New Pattern (+ AnyNewLine) | Old (us) | New (us) | Ratio |
|---|---|---|---|---|
[^\r\n\u0085\u2028\u2029]+ |
.+ |
45.4 | 44.2 | 0.97x |
^.+\r?$ (1K lines) |
^.+$ |
44.1 | 50.1 | 1.14x |
Section 4: Non-anchor/dot Patterns (zero impact expected)
| Old Pattern | New Pattern (+ AnyNewLine) | Old (us) | New (us) | Ratio |
|---|---|---|---|---|
\r\n|\r|\n |
\r\n|\r|\n |
20.0 | 21.7 | 1.08x |
\w+ |
\w+ |
322.4 | 336.4 | 1.04x |
Section 5: Pathological Cases (unlikely in practice)
| Old Pattern | New Pattern (+ AnyNewLine) | Old (us) | New (us) | Ratio |
|---|---|---|---|---|
$ |
$ |
98.2 | 134.1 | 1.37x |
^ |
^ |
145.6 | 131.6 | 0.90x |
\w+\r?\Z (329K chars) |
\w+\Z |
494.2 | 1,039.3 | 2.10x |
Summary
-
Real-world patterns in Compiled mode show 0.83x--1.14x -- essentially zero cost, and sometimes faster because the AnyNewLine pattern is simpler (e.g.,
^# .+$vs^# .+\r?$-- removing the\r?node saves more than the lowered$costs). -
Where small regressions occur (1.1x--1.4x), the cause is the lowered anchor tree: a native
$(Eol) is a single "is next char\n?" check, but AnyNewLine lowers it to a lookahead alternation like(?=\r\n|\r|\n|\u0085|\u2028|\u2029|\z). Even when the input only contains\r\n, the engine must evaluate the alternation branches. This overhead is proportionally more visible when the anchor dominates the work (e.g.,\w+$where the\w+match is short), and nearly invisible when.+dominates each line's work (e.g.,^.+$at 1.04x). -
Patterns without anchors or dot are completely unaffected (1.04--1.08x, within noise) -- the flag only changes behavior of
.,^,$,\Z. -
Only pathological case:
\w+\Zon very large input (329K chars) at 2.1x -- the lowered\Zalternation tree is evaluated during backtracking at many positions. Unlikely in practice. -
In Compiled/source-generated mode, the JIT compiles the lowered alternation branches into efficient single-char comparisons, keeping overhead minimal. Interpreted mode shows larger gaps (2--3x for typical patterns) but AnyNewLine + interpreted + perf-sensitive is an unlikely combination.
Benchmark source code (BenchmarkDotNet)
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;
BenchmarkRunner.Run<AnyNewLineBenchmarks>(
DefaultConfig.Instance
.WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage))
.AddJob(Job.ShortRun.WithToolchain(InProcessEmitToolchain.Instance)));
[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
private const RegexOptions AnyNewLine = (RegexOptions)0x0800;
private static string GenerateText(int lineCount, string[] newlines)
{
var sb = new StringBuilder();
for (int i = 0; i < lineCount; i++)
{
sb.Append("Lorem ipsum dolor sit amet ");
sb.Append(i);
sb.Append(newlines[i % newlines.Length]);
}
return sb.ToString();
}
private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
private static readonly string MixedAll1K = GenerateText(1000,
["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);
private static readonly string AssemblyInfo;
private static readonly string KvConfig;
private static readonly string Markdown;
private static readonly string CsvData;
static AnyNewLineBenchmarks()
{
var sb = new StringBuilder();
string[] attrs = {
"[assembly: AssemblyTitle(\"MyApp\")]",
"[assembly: AssemblyDescription(\"A sample app\")]",
"[assembly: AssemblyConfiguration(\"\")]",
"[assembly: AssemblyCompany(\"Contoso\")]",
"[assembly: AssemblyProduct(\"MyApp\")]",
"[assembly: AssemblyCopyright(\"Copyright 2024\")]",
"[assembly: AssemblyTrademark(\"\")]",
"[assembly: AssemblyCulture(\"\")]",
"[assembly: AssemblyVersion(\"1.0.0.0\")]",
"[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
};
foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));
sb.Clear();
string[] keys = { "Server", "Database", "User", "Password", "Timeout",
"MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
for (int i = 0; i < 50; i++)
{
sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
}
KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));
sb.Clear();
for (int i = 0; i < 200; i++)
{
sb.Append($"# Heading {i}\r\n");
sb.Append($"Some paragraph text about topic {i}.\r\n");
sb.Append($"Another line of content here.\r\n\r\n");
}
Markdown = sb.ToString();
sb.Clear();
sb.Append("Name,Age,City,Email\r\n");
for (int i = 0; i < 1000; i++)
sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
CsvData = sb.ToString();
}
// Section 1: Real-world on Windows \r\n text
private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
[Benchmark(Description = "1a_Lines1K_New")]
public int Lines1K_New() => New_1a.Matches(WinText1K).Count;
private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "1b_Lines10K_Old")]
public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
[Benchmark(Description = "1b_Lines10K_New")]
public int Lines10K_New() => New_1b.Matches(WinText10K).Count;
private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "2_Assembly_Old")]
public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
[Benchmark(Description = "2_Assembly_New")]
public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;
private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "3_KeyVal_Old")]
public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
[Benchmark(Description = "3_KeyVal_New")]
public int KeyVal_New() => New_3.Matches(KvConfig).Count;
private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "4_Markdown_Old")]
public int Markdown_Old() => Old_4.Matches(Markdown).Count;
[Benchmark(Description = "4_Markdown_New")]
public int Markdown_New() => New_4.Matches(Markdown).Count;
private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "5_CSV_Old")]
public int CSV_Old() => Old_5.Matches(CsvData).Count;
[Benchmark(Description = "5_CSV_New")]
public int CSV_New() => New_5.Matches(CsvData).Count;
private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "6_DotExcl_Old")]
public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
[Benchmark(Description = "6_DotExcl_New")]
public int DotExcl_New() => New_6.Matches(WinText1K).Count;
private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "7_WordEOL_Old")]
public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
[Benchmark(Description = "7_WordEOL_New")]
public int WordEOL_New() => New_7.Matches(WinText1K).Count;
private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "8_LineSt_Old")]
public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
[Benchmark(Description = "8_LineSt_New")]
public int LineStart_New() => New_8.Matches(WinText1K).Count;
// Section 2: Unix \n text (control)
private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "9_UnixLines_Old")]
public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
[Benchmark(Description = "9_UnixLines_New")]
public int UnixLines_New() => New_9.Matches(UnixText1K).Count;
private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "10_UnixDot_Old")]
public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
[Benchmark(Description = "10_UnixDot_New")]
public int UnixDot_New() => New_10.Matches(UnixText1K).Count;
// Section 3: Mixed newline text
private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "11_MixedDot_Old")]
public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
[Benchmark(Description = "11_MixedDot_New")]
public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;
private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "12_MixedLines_Old")]
public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
[Benchmark(Description = "12_MixedLines_New")]
public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;
// Section 4: Non-anchor patterns (zero impact)
private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "14_Literal_Old")]
public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
[Benchmark(Description = "14_Literal_New")]
public int Literal_New() => New_14.Matches(MixedAll1K).Count;
private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "15_Words_Old")]
public int Words_Old() => Old_15.Matches(WinText1K).Count;
[Benchmark(Description = "15_Words_New")]
public int Words_New() => New_15.Matches(WinText1K).Count;
// Section 5: Pathological
private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "P1_BareEOL_Old")]
public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
[Benchmark(Description = "P1_BareEOL_New")]
public int BareEOL_New() => New_P1.Matches(WinText1K).Count;
private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "P2_BareBOL_Old")]
public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
[Benchmark(Description = "P2_BareBOL_New")]
public int BareBOL_New() => New_P2.Matches(WinText1K).Count;
private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "P3_EndZ_Old")]
public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
[Benchmark(Description = "P3_EndZ_New")]
public bool EndZ_New() => New_P3.IsMatch(WinText10K);
}|
Verified that adding VT/FF made no material perf difference |
|
Below is analysis of benchmarks that generally compare AnyNewLine to previous workarounds, and adding AnyNewLine to existing (perhaps pathological) patterns. TLDR: sometimes ANL is faster than the workaround, sometimes not (just more functional -- more newline types, eg). On some pathological cases, ANL is slower. One principal example is a file that only has AnyNewLine Performance Benchmark ResultsSetup
HEAD: AnyNewLine ("New") vs Workaround Patterns ("Old")Both columns run on the same HEAD binary. "Old" uses manual workaround patterns (no AnyNewLine). "New" uses simplified AnyNewLine patterns. All patterns use
Baseline (main) vs HEAD: "Old" Patterns OnlyThese patterns don't use AnyNewLine, so the regex code path is identical on both builds. Differences are machine noise.
Conclusions
Benchmark CodePerfTest.csproj<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net11.0</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="BenchmarkDotNet" Version="0.14.0" />
</ItemGroup>
</Project>Program.csusing System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;
BenchmarkRunner.Run<AnyNewLineBenchmarks>(
DefaultConfig.Instance
.WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(BenchmarkDotNet.Columns.RatioStyle.Percentage))
.AddJob(Job.MediumRun.WithToolchain(InProcessEmitToolchain.Instance)));
[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
private const RegexOptions AnyNewLine = (RegexOptions)0x0800;
// ── Inputs ──────────────────────────────────────────────────────
private static string GenerateText(int lineCount, string[] newlines)
{
var sb = new StringBuilder();
for (int i = 0; i < lineCount; i++)
{
sb.Append("Lorem ipsum dolor sit amet ");
sb.Append(i);
sb.Append(newlines[i % newlines.Length]);
}
return sb.ToString();
}
private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
private static readonly string MixedAll1K = GenerateText(1000, ["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);
private static readonly string AssemblyInfo;
private static readonly string KvConfig;
private static readonly string Markdown;
private static readonly string CsvData;
static AnyNewLineBenchmarks()
{
var sb = new StringBuilder();
string[] attrs = {
"[assembly: AssemblyTitle(\"MyApp\")]",
"[assembly: AssemblyDescription(\"A sample app\")]",
"[assembly: AssemblyConfiguration(\"\")]",
"[assembly: AssemblyCompany(\"Contoso\")]",
"[assembly: AssemblyProduct(\"MyApp\")]",
"[assembly: AssemblyCopyright(\"Copyright 2024\")]",
"[assembly: AssemblyTrademark(\"\")]",
"[assembly: AssemblyCulture(\"\")]",
"[assembly: AssemblyVersion(\"1.0.0.0\")]",
"[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
};
foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));
sb.Clear();
string[] keys = { "Server", "Database", "User", "Password", "Timeout",
"MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
for (int i = 0; i < 50; i++)
{
sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
}
KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));
sb.Clear();
for (int i = 0; i < 200; i++)
{
sb.Append($"# Heading {i}\r\n");
sb.Append($"Some paragraph text about topic {i}.\r\n");
sb.Append($"Another line of content here.\r\n\r\n");
}
Markdown = sb.ToString();
sb.Clear();
sb.Append("Name,Age,City,Email\r\n");
for (int i = 0; i < 1000; i++)
sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
CsvData = sb.ToString();
}
// ── Section 1: Real-world patterns on Windows \r\n text ─────────
// 1a. Line matching 1K
private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
[Benchmark(Description = "1a_Lines1K_New")]
public int Lines1K_New() => New_1a.Matches(WinText1K).Count;
// 1b. Line matching 10K
private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "1b_Lines10K_Old")]
public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
[Benchmark(Description = "1b_Lines10K_New")]
public int Lines10K_New() => New_1b.Matches(WinText10K).Count;
// 2. Assembly attributes
private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "2_Assembly_Old")]
public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
[Benchmark(Description = "2_Assembly_New")]
public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;
// 3. Key-value config
private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "3_KeyVal_Old")]
public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
[Benchmark(Description = "3_KeyVal_New")]
public int KeyVal_New() => New_3.Matches(KvConfig).Count;
// 4. Markdown headings
private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "4_Markdown_Old")]
public int Markdown_Old() => Old_4.Matches(Markdown).Count;
[Benchmark(Description = "4_Markdown_New")]
public int Markdown_New() => New_4.Matches(Markdown).Count;
// 5. CSV line parsing
private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "5_CSV_Old")]
public int CSV_Old() => Old_5.Matches(CsvData).Count;
[Benchmark(Description = "5_CSV_New")]
public int CSV_New() => New_5.Matches(CsvData).Count;
// 6. [^\r\n]+ vs .+
private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "6_DotExcl_Old")]
public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
[Benchmark(Description = "6_DotExcl_New")]
public int DotExcl_New() => New_6.Matches(WinText1K).Count;
// 7. \w+\r?$ vs \w+$
private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "7_WordEOL_Old")]
public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
[Benchmark(Description = "7_WordEOL_New")]
public int WordEOL_New() => New_7.Matches(WinText1K).Count;
// 8. (?:^|\r\n)\w+ vs ^\w+
private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "8_LineSt_Old")]
public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
[Benchmark(Description = "8_LineSt_New")]
public int LineStart_New() => New_8.Matches(WinText1K).Count;
// ── Section 2: Unix \n text (control) ───────────────────────────
// 9. Same pattern, flag only
private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "9_UnixLines_Old")]
public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
[Benchmark(Description = "9_UnixLines_New")]
public int UnixLines_New() => New_9.Matches(UnixText1K).Count;
// 10. [^\n]+ vs .+
private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "10_UnixDot_Old")]
public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
[Benchmark(Description = "10_UnixDot_New")]
public int UnixDot_New() => New_10.Matches(UnixText1K).Count;
// ── Section 3: Mixed newline text ───────────────────────────────
// 11. Full char class workaround vs .+
private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "11_MixedDot_Old")]
public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
[Benchmark(Description = "11_MixedDot_New")]
public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;
// 12. Mixed \n/\r\n line matching
private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "12_MixedLines_Old")]
public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
[Benchmark(Description = "12_MixedLines_New")]
public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;
// ── Section 4: Non-anchor patterns (zero impact) ────────────────
// 14. Literal newlines
private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "14_Literal_Old")]
public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
[Benchmark(Description = "14_Literal_New")]
public int Literal_New() => New_14.Matches(MixedAll1K).Count;
// 15. Word matching
private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "15_Words_Old")]
public int Words_Old() => Old_15.Matches(WinText1K).Count;
[Benchmark(Description = "15_Words_New")]
public int Words_New() => New_15.Matches(WinText1K).Count;
// ── Section 5: Pathological ─────────────────────────────────────
// P1. Bare $
private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "P1_BareEOL_Old")]
public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
[Benchmark(Description = "P1_BareEOL_New")]
public int BareEOL_New() => New_P1.Matches(WinText1K).Count;
// P2. Bare ^
private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
[Benchmark(Description = "P2_BareBOL_Old")]
public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
[Benchmark(Description = "P2_BareBOL_New")]
public int BareBOL_New() => New_P2.Matches(WinText1K).Count;
// P3. \w+\Z on large input
private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);
[Benchmark(Description = "P3_EndZ_Old")]
public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
[Benchmark(Description = "P3_EndZ_New")]
public bool EndZ_New() => New_P3.IsMatch(WinText10K);
} |
|
@MihuBot benchmark Regex |
|
@MihuBot regexdiff |
|
0 out of 18857 patterns have generated source code changes. JIT assembly changesSample source code for further analysisconst string JsonPath = "RegexResults-1803.json";
if (!File.Exists(JsonPath))
{
await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/FIQBKc7A");
using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}
using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");
record KnownPattern(string Pattern, RegexOptions Options, int Count);
sealed class RegexEntry
{
public required KnownPattern Regex { get; set; }
public required string MainSource { get; set; }
public required string PrSource { get; set; }
public string? FullDiff { get; set; }
public string? ShortDiff { get; set; }
public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
} |
|
See benchmark results at https://gist.github.com/MihuBot/d906406489feb8a231966adc754246f4 |
MihuBot Benchmark Analysis (Post VT/FF)Zero regressions. The PR has no impact on existing patterns. Confirmed by two independent signals:
Consistent with the first MihuBot run (pre-VT/FF), which also showed zero impact with an identical regexdiff. The PR adds parser-only lowering gated behind |
|
nuts, didn't mean to close/reopen. |
|
For interest, once we've taken this we can consider |
|
I'm excited to use it. One interesting use case for more powerful Regex functionality is AI models with large context windows. There's been some interesting studies that suggest agents are more effective using grep than RAG pipelines using vector databases, and the inflection point is largely due to large context windows. It seems the main advantage to using a vector database is GDPR compliance and other privacy laws compliance, as you can mask with embeddings the data using GUIDs, and havestrong data governance controls over what parts of an ontology graph a given user has rights to. For anything not sensitive, grep with regex wins. |
Restructure all three anchor lowering methods (Eol, Bol, EndZ) to replace the 2-branch outer Alternate node with a sequential Concatenate of: primary lookaround + shared CRLF guard. Key idea: include ALL newline chars (including \n for $, \r for ^) in the primary lookaround's character class, then append (?!(?<=\r)\n) as a guard to block matching at the \r\n boundary. Before ($ example): (?=[\r\v\f\u0085\u2028\u2029]|\z)|(?<!\r)(?=\n) After: (?=[\n\r\v\f\u0085\u2028\u2029]|\z)(?!(?<=\r)\n) At non-newline positions (the vast majority during backtracking), the primary lookaround fails immediately and the Concatenate short-circuits — the CRLF guard is never evaluated. The old structure evaluated both branches of the outer Alternate at every position. Extract shared AnyNewLineCrLfGuardNode() helper used by all three methods. Replace AnyNewLineExceptLfClass / AnyNewLineExceptCrClass with unified AnyNewLineClass constant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Optimized lowering given the observation that newlines are less common than non newlines. ==== Optimize anchor lowering: eliminate outer alternationThe previous lowerings used a two-branch The new structure replaces the outer Before/after lowerings:
The key structural change in each case: This change is entirely within the AnyNewLine lowering code path -- it has no effect on patterns that don't use |
Perf results after anchor lowering optimizationMeasured locally with BenchmarkDotNet MediumRun (15 iterations), Section 1: Real-world patterns on Windows
Section 2: Unix
Section 3: Mixed
Section 4: Non-anchor/dot patterns (zero impact expected)
Section 5: Bare anchors (no simple workaround exists for these)
Summary:
|
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Tests.Common.cs
Show resolved
Hide resolved
- Remove UTF-8 BOM from RegexParser.cs and Regex.Match.Tests.cs - Remove extra blank line in RegexParser.cs (line 24) - Add blank line between AnyNewLine_Dollar_TestData and AnyNewLine_EndZ - Add missing \u2028 (Line Separator) test case for \Z - Add RegexOptionAnyNewLine assertion in test helpers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation
.NET's
Regexclass hardcodes\nas the only newline character. WithRegexOptions.Multiline,$matches before\nbut not before\r,\r\n, or Unicode line breaks. This is "by far one of the biggest gotchas" withSystem.Text.RegularExpressions:Users are forced into fragile workarounds like
\r?$or(\r\n|\n)to handle mixed line endings. Real-world NuGet packages show how common this is -- from the real-world regex patterns dataset:(\r\n|\n)(18,474 packages) -- CSV parser manually matching both line endings\r?\nin PEM key parsing (1,964 packages) --\r?\nsprinkled throughout withMultiline$(\r?\n)?in assembly attribute matching (2,108 packages) -- usingMultilinewith manual newline handling[\r\n]+(2,422 packages) -- matching any newline characterThese workarounds are error-prone, don't compose well with
^and$anchors, and miss Unicode newlines (\u0085,\u2028,\u2029).Summary
Implements
RegexOptions.AnyNewLine(api-approved) which makes$,^,\Z, and.recognize all Unicode line boundaries:\r,\r\n,\n,\u0085(NEL),\u2028(LS),\u2029(PS) -- consistent with Unicode TR18 RL1.6 and PCRE2's(*ANY)behavior.With
AnyNewLine, the example above just works:Approach: Parser Lowering
All logic lives in
RegexParser.cs-- no changes to the interpreter, compiler, or source generator engines. Each affected construct is lowered into an equivalentRegexNodesub-tree:$(no Multiline) /\Z(?=\r\n\z|\r?\z)|(?<!\r)(?=\n\z)|(?=[\u0085\u2028\u2029]\z)$(Multiline)(?=\r\n|\r|[\u0085\u2028\u2029]|\z)|(?<!\r)(?=\n)^(Multiline)(?<=\A|\r\n|\n|[\u0085\u2028\u2029])|(?<=\r)(?!\n).[^\r\n\u0085\u2028\u2029](butSinglelinetakes precedence)Key design choices:
\r\nis atomic:$never matches between\rand\n. This is enforced with lookbehind/lookahead guards.Singlelinetakes precedence:.withSingleline | AnyNewLinematches everything (including newlines), consistent withSingleline's documented behavior.\Aand\zare unaffected: absolute start/end anchors don't change.NonBacktrackingandECMAScript: throwsArgumentOutOfRangeException(lowered patterns use lookaround).AnyNewLineflag, so patterns that don't use it take the same code paths as before. The only new cost is a flag check ((_options & RegexOptions.AnyNewLine) != 0) in the parser for$,^,\Z, and., which is negligible.Out of scope:
\RUnicode TR18 RL1.6 also recommends a meta-character
\Rfor matching any newline sequence (consuming the characters), equivalent to(?:\r\n|[\n\v\f\r\u0085\u2028\u2029]). This is distinct from whatAnyNewLinedoes:AnyNewLinemodifies the behavior of existing zero-width anchors (^,$,\Z) and the character class., while\Rwould be a new consuming pattern element. Adding\Rcould be done independently as a separate feature.Changes
Production code
RegexOptions.cs-- addAnyNewLine = 0x0800RegexParser.cs-- lowering methodsAnyNewLineEndZNode(),AnyNewLineEolNode(),AnyNewLineBolNode(), plus.handlingRegexCharClass.cs-- addNotNewLineOrCarriageReturnClassconstantRegex.cs/RegexCompilationInfo.cs-- validationTests
$,^,\Z),RightToLeft,Singleline,Multiline,Replace,Split,Count,EnumerateMatches,NonBacktrackingrejection, edge cases (adjacent newlines, empty lines, all-newline strings), and PCRE2-inspired scenariosFixes #25598