Add RegexOptions.AnyNewLine via parser lowering by danmoseley · Pull Request #124701 · dotnet/runtime

danmoseley · 2026-02-21T08:27:44Z

Motivation

.NET's Regex class hardcodes \n as the only newline character. With RegexOptions.Multiline, $ matches before \n but not before \r, \r\n, or Unicode line breaks. This is "by far one of the biggest gotchas" with System.Text.RegularExpressions:

// BUG: on a file with Windows \r\n line endings, .+$ captures trailing \r
var match = Regex.Match("foo\r\nbar", ".*$", RegexOptions.Multiline);
// match.Value == "foo\r" -- not "foo"!

Users are forced into fragile workarounds like \r?$ or (\r\n|\n) to handle mixed line endings. Real-world NuGet packages show how common this is -- from the real-world regex patterns dataset:

(\r\n|\n) (18,474 packages) -- CSV parser manually matching both line endings
\r?\n in PEM key parsing (1,964 packages) -- \r?\n sprinkled throughout with Multiline
$(\r?\n)? in assembly attribute matching (2,108 packages) -- using Multiline with manual newline handling
[\r\n]+ (2,422 packages) -- matching any newline character

These workarounds are error-prone, don't compose well with ^ and $ anchors, and miss Unicode newlines (\u0085, \u2028, \u2029).

Summary

Implements RegexOptions.AnyNewLine (api-approved) which makes $, ^, \Z, and . recognize all Unicode line boundaries: \r, \r\n, \n, \u0085 (NEL), \u2028 (LS), \u2029 (PS) -- consistent with Unicode TR18 RL1.6 and PCRE2's (*ANY) behavior.

With AnyNewLine, the example above just works:

var match = Regex.Match("foo\r\nbar", ".*$", RegexOptions.Multiline | RegexOptions.AnyNewLine);
// match.Value == "foo"

Approach: Parser Lowering

All logic lives in RegexParser.cs -- no changes to the interpreter, compiler, or source generator engines. Each affected construct is lowered into an equivalent RegexNode sub-tree:

Construct	Lowered to
`$` (no Multiline) / `\Z`	`(?=\r\n\z\|\r?\z)\|(?<!\r)(?=\n\z)\|(?=[\u0085\u2028\u2029]\z)`
`$` (Multiline)	`(?=\r\n\|\r\|[\u0085\u2028\u2029]\|\z)\|(?<!\r)(?=\n)`
`^` (Multiline)	`(?<=\A\|\r\n\|\n\|[\u0085\u2028\u2029])\|(?<=\r)(?!\n)`
`.`	`[^\r\n\u0085\u2028\u2029]` (but `Singleline` takes precedence)

Key design choices:

\r\n is atomic: $ never matches between \r and \n. This is enforced with lookbehind/lookahead guards.
Singleline takes precedence: . with Singleline | AnyNewLine matches everything (including newlines), consistent with Singleline's documented behavior.
\A and \z are unaffected: absolute start/end anchors don't change.
Incompatible with NonBacktracking and ECMAScript: throws ArgumentOutOfRangeException (lowered patterns use lookaround).
Zero perf impact on existing patterns: the lowering is gated on the AnyNewLine flag, so patterns that don't use it take the same code paths as before. The only new cost is a flag check ((_options & RegexOptions.AnyNewLine) != 0) in the parser for $, ^, \Z, and ., which is negligible.

Out of scope: `\R`

Unicode TR18 RL1.6 also recommends a meta-character \R for matching any newline sequence (consuming the characters), equivalent to (?:\r\n|[\n\v\f\r\u0085\u2028\u2029]). This is distinct from what AnyNewLine does: AnyNewLine modifies the behavior of existing zero-width anchors (^, $, \Z) and the character class ., while \R would be a new consuming pattern element. Adding \R could be done independently as a separate feature.

Changes

Production code

RegexOptions.cs -- add AnyNewLine = 0x0800
RegexParser.cs -- lowering methods AnyNewLineEndZNode(), AnyNewLineEolNode(), AnyNewLineBolNode(), plus . handling
RegexCharClass.cs -- add NotNewLineOrCarriageReturnClass constant
Regex.cs / RegexCompilationInfo.cs -- validation

Tests

~120 new test cases covering dot, anchors ($, ^, \Z), RightToLeft, Singleline, Multiline, Replace, Split, Count, EnumerateMatches, NonBacktracking rejection, edge cases (adjacent newlines, empty lines, all-newline strings), and PCRE2-inspired scenarios

Fixes #25598

Add AnyNewLine = 0x0800 to RegexOptions enum. Update ValidateOptions to bump MaxOptionShift to 12 and reject AnyNewLine | NonBacktracking. ECMAScript already rejects unknown options via allowlist. Update source generator to include AnyNewLine in SupportedOptions mask. Update tests that used 0x800 as an invalid option value to use 0x1000. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When AnyNewLine is set without Multiline, lower $ from EndZ into an equivalent sub-tree: (?=\r\n\z|\r?\z)|(?<!\r)(?=\n\z) This matches at end of string, or before \r\n, \r, or \n at end of string, but not between \r and \n. Works across all engines (interpreter, compiled, source generator) since it's pure parser lowering. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When AnyNewLine is set, lower \Z using the same sub-tree as $ without Multiline. \Z is not affected by Multiline, so the same lowering applies regardless. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When both Multiline and AnyNewLine are set, lower $ to: (?=\r\n|\r|\z)|(?<!\r)(?=\n) This matches at \r\n, \r, \n boundaries and end-of-string, without matching between \r and \n of a \r\n sequence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When both Multiline and AnyNewLine are set, lower ^ to: (?<=\A|\r\n|\n)|(?<=\r)(?!\n) This matches after \r\n, \n, bare \r (not followed by \n), and at start of string. Without Multiline, ^ remains \A unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

When AnyNewLine is set (without Singleline), lower . to [^\n\r] instead of [^\n], so dot does not match \r or \n. Add NotNewLineOrCarriageReturnClass constant to RegexCharClass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Combined ^/$/. tests, Replace/Split, RightToLeft, mixed newlines, empty lines, \Z with trailing newlines, and edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Integration tests using a ~50 char string with all newline types (\r\n, \r, \n, \u0085, \u2028, \u2029) exercising ^, $, \Z, and . together. Replace/Split tests with MatchEvaluator line numbering. Deduplicated cases moved into per-feature tests (RightToLeft, empty lines). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Expand test coverage across all AnyNewLine-affected constructs: - Dollar, EndZ, DollarMultiline, CaretMultiline, Dot test data with adjacent newlines, newlines at string boundaries, empty segments, RightToLeft, and all Unicode newline types - Advanced tests: inline options, backreferences, conditionals, alternation with anchors, lookahead/lookbehind, quantified dot, lazy quantifiers, named/atomic groups, word boundaries near newlines, explicit char classes unaffected - Methods test: IsMatch, Count, EnumerateMatches, Match with startat, Replace with group ref, Split - Unicode expansion: \s/\S behavior, \w behavior, \p{Zl}/\p{Zp} categories, adjacent Unicode+ASCII newlines, baselines without AnyNewLine No bugs found — all initial test failures were wrong expectations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Verify the fixer correctly emits RegexOptions.Multiline | RegexOptions.AnyNewLine in enum value order when upgrading to GeneratedRegex. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Test cases derived from cross-validation with PCRE2 NEWLINE_ANY behavior (BSD-licensed) and analysis of real-world patterns from dotnet/runtime-assets: - (.+)# greedy where .+ cannot cross newlines (PCRE2 JIT 472) - (.)(.) requiring consecutive non-newlines (PCRE2 JIT 471) - (.). with mixed newline types (PCRE2 JIT 469) - Blank line detection (^ +$) with \n, \r\n, \u0085 separators All 31,528 tests pass. No bugs found — our implementation is fully consistent with PCRE2 NEWLINE_ANY behavior and handles real-world patterns correctly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Add more RightToLeft + AnyNewLine tests (various newline types, dot, anchors, \Z) - Add more Singleline | AnyNewLine tests (all newline types, combined with Multiline) - Replace RegexOptions.AnyNewLine with RegexHelpers.RegexOptionAnyNewLine throughout tests for net481 compilation compatibility - Wrap Count/EnumerateMatches in #if NET for net481 compat - Add clarifying comments on Split behavior with/without AnyNewLine Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-02-21T08:31:42Z

(Finally got around to having AI finish my lowering branch..)

Copilot

Pull request overview

This pull request implements RegexOptions.AnyNewLine (value 0x0800 = 2048), a new regex option that makes ^, $, \Z, and . recognize all Unicode line boundaries (\r, \r\n, \n, \u0085 NEL, \u2028 LS, \u2029 PS) instead of only \n. This addresses a major usability issue where users had to manually work around .NET's hardcoded \n-only line ending behavior.

Changes:

Added RegexOptions.AnyNewLine = 0x0800 enum value with incompatibility checks for NonBacktracking and ECMAScript modes
Implemented parser-level lowering of ^, $, \Z, and . into equivalent lookaround-based RegexNode trees when AnyNewLine is enabled
Added comprehensive test coverage (~800 new test lines) covering all anchor types, newline combinations, RightToLeft mode, inline options, and edge cases

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexOptions.cs`	Added `AnyNewLine = 0x0800` enum value with XML documentation
`src/libraries/System.Text.RegularExpressions/ref/System.Text.RegularExpressions.cs`	Updated ref assembly with `AnyNewLine = 2048`
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/Regex.cs`	Updated `MaxOptionShift` to 12 and added AnyNewLine to NonBacktracking incompatibility check
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs`	Implemented lowering methods (`AnyNewLineEndZNode`, `AnyNewLineEolNode`, `AnyNewLineBolNode`) and integrated into `^`, `$`, `\Z`, `.` parsing
`src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexCharClass.cs`	Added `NotNewLineOrCarriageReturnClass` constant for `.` with AnyNewLine
`src/libraries/System.Text.RegularExpressions/gen/RegexGenerator.Parser.cs`	Added AnyNewLine to source generator's supported options
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs`	Added ~800 lines of comprehensive tests for all anchor types, newline combinations, and edge cases
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Tests.Common.cs`	Added `RegexOptionAnyNewLine` constant for test compatibility
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs`	Updated invalid option test from 0x800 to 0x1000; added NonBacktracking+AnyNewLine incompatibility test
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.MultipleMatches.Tests.cs`	Updated invalid option comments and tests from 0x800 to 0x1000
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.EnumerateMatches.Tests.cs`	Updated invalid option tests from 0x800 to 0x1000
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/RegexGeneratorParserTests.cs`	Updated invalid option tests from 0x800 to 0x1000
`src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/UpgradeToGeneratedRegexAnalyzerTests.cs`	Updated tests for 0x1000 as invalid option; added AnyNewLine test case for code fixer

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

danmoseley · 2026-02-21T08:40:57Z

@MihuBot benchmark Regex

MihuBot · 2026-02-21T10:17:19Z

See benchmark results at https://gist.github.com/MihuBot/c7399f4f318e4febcfd0018436d5fe53

danmoseley · 2026-02-21T16:02:25Z

Mihubot confirms zero perf impact on existing patterns/options,

danmoseley · 2026-02-21T19:14:12Z

AnyNewLine Performance Analysis (Release, Compiled, .NET 11.0, BenchmarkDotNet)

Measured impact of converting existing newline-workaround patterns to simplified AnyNewLine equivalents. All scenarios use RegexOptions.Compiled (representative of source-generated too). Measured with BenchmarkDotNet (InProcess, ShortRun). All match counts verified identical between old and new patterns.

Section 1: Real-World Patterns on Windows `\r\n` Text

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`^.+\r?$` (1K lines)	`^.+$`	46.7	48.8	1.05x
`^.+\r?$` (10K lines)	`^.+$`	1,694	1,760	1.04x
`\[assembly:...\]\s*$(\r?\n)?`	`\[assembly:...\]\s*$`	38.3	32.4	0.85x
`^([^\s:]+):\s*(.+?)\r?$`	`^([^\s:]+):\s*(.+?)$`	105.9	105.9	1.00x
`^# .+\r?$`	`^# .+$`	11.1	9.1	0.83x
`^.+\r?$` (CSV, 1K rows)	`^.+$`	44.4	49.2	1.11x
`[^\r\n]+`	`.+`	44.2	43.8	0.99x
`\w+\r?$`	`\w+$`	90.8	128.7	1.42x
`(?:^\|\r\n)\w+`	`^\w+`	208.7	214.5	1.03x

Section 2: Unix `\n` Text (overhead of just enabling the flag)

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`^.+$`	`^.+$`	43.5	48.9	1.12x
`[^\n]+`	`.+`	39.0	44.9	1.15x

Section 3: Mixed `\n`/`\r\n` Text

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`[^\r\n\u0085\u2028\u2029]+`	`.+`	45.4	44.2	0.97x
`^.+\r?$` (1K lines)	`^.+$`	44.1	50.1	1.14x

Section 4: Non-anchor/dot Patterns (zero impact expected)

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`\r\n\|\r\|\n`	`\r\n\|\r\|\n`	20.0	21.7	1.08x
`\w+`	`\w+`	322.4	336.4	1.04x

Section 5: Pathological Cases (unlikely in practice)

Old Pattern	New Pattern (+ AnyNewLine)	Old (us)	New (us)	Ratio
`$`	`$`	98.2	134.1	1.37x
`^`	`^`	145.6	131.6	0.90x
`\w+\r?\Z` (329K chars)	`\w+\Z`	494.2	1,039.3	2.10x

Summary

Real-world patterns in Compiled mode show 0.83x--1.14x -- essentially zero cost, and sometimes faster because the AnyNewLine pattern is simpler (e.g., ^# .+$ vs ^# .+\r?$ -- removing the \r? node saves more than the lowered $ costs).
Where small regressions occur (1.1x--1.4x), the cause is the lowered anchor tree: a native $ (Eol) is a single "is next char \n?" check, but AnyNewLine lowers it to a lookahead alternation like (?=\r\n|\r|\n|\u0085|\u2028|\u2029|\z). Even when the input only contains \r\n, the engine must evaluate the alternation branches. This overhead is proportionally more visible when the anchor dominates the work (e.g., \w+$ where the \w+ match is short), and nearly invisible when .+ dominates each line's work (e.g., ^.+$ at 1.04x).
Patterns without anchors or dot are completely unaffected (1.04--1.08x, within noise) -- the flag only changes behavior of ., ^, $, \Z.
Only pathological case: \w+\Z on very large input (329K chars) at 2.1x -- the lowered \Z alternation tree is evaluated during backtracking at many positions. Unlikely in practice.
In Compiled/source-generated mode, the JIT compiles the lowered alternation branches into efficient single-char comparisons, keeping overhead minimal. Interpreted mode shows larger gaps (2--3x for typical patterns) but AnyNewLine + interpreted + perf-sensitive is an unlikely combination.

Benchmark source code (BenchmarkDotNet)

using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;

BenchmarkRunner.Run<AnyNewLineBenchmarks>(
    DefaultConfig.Instance
        .WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(RatioStyle.Percentage))
        .AddJob(Job.ShortRun.WithToolchain(InProcessEmitToolchain.Instance)));

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
    private const RegexOptions AnyNewLine = (RegexOptions)0x0800;

    private static string GenerateText(int lineCount, string[] newlines)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < lineCount; i++)
        {
            sb.Append("Lorem ipsum dolor sit amet ");
            sb.Append(i);
            sb.Append(newlines[i % newlines.Length]);
        }
        return sb.ToString();
    }

    private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
    private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
    private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
    private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
    private static readonly string MixedAll1K = GenerateText(1000,
        ["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);

    private static readonly string AssemblyInfo;
    private static readonly string KvConfig;
    private static readonly string Markdown;
    private static readonly string CsvData;

    static AnyNewLineBenchmarks()
    {
        var sb = new StringBuilder();
        string[] attrs = {
            "[assembly: AssemblyTitle(\"MyApp\")]",
            "[assembly: AssemblyDescription(\"A sample app\")]",
            "[assembly: AssemblyConfiguration(\"\")]",
            "[assembly: AssemblyCompany(\"Contoso\")]",
            "[assembly: AssemblyProduct(\"MyApp\")]",
            "[assembly: AssemblyCopyright(\"Copyright 2024\")]",
            "[assembly: AssemblyTrademark(\"\")]",
            "[assembly: AssemblyCulture(\"\")]",
            "[assembly: AssemblyVersion(\"1.0.0.0\")]",
            "[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
        };
        foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
        AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));

        sb.Clear();
        string[] keys = { "Server", "Database", "User", "Password", "Timeout",
                          "MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
        for (int i = 0; i < 50; i++)
        {
            sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
        }
        KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));

        sb.Clear();
        for (int i = 0; i < 200; i++)
        {
            sb.Append($"# Heading {i}\r\n");
            sb.Append($"Some paragraph text about topic {i}.\r\n");
            sb.Append($"Another line of content here.\r\n\r\n");
        }
        Markdown = sb.ToString();

        sb.Clear();
        sb.Append("Name,Age,City,Email\r\n");
        for (int i = 0; i < 1000; i++)
            sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
        CsvData = sb.ToString();
    }

    // Section 1: Real-world on Windows \r\n text
    private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
    public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
    [Benchmark(Description = "1a_Lines1K_New")]
    public int Lines1K_New() => New_1a.Matches(WinText1K).Count;

    private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "1b_Lines10K_Old")]
    public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
    [Benchmark(Description = "1b_Lines10K_New")]
    public int Lines10K_New() => New_1b.Matches(WinText10K).Count;

    private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "2_Assembly_Old")]
    public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
    [Benchmark(Description = "2_Assembly_New")]
    public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;

    private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "3_KeyVal_Old")]
    public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
    [Benchmark(Description = "3_KeyVal_New")]
    public int KeyVal_New() => New_3.Matches(KvConfig).Count;

    private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "4_Markdown_Old")]
    public int Markdown_Old() => Old_4.Matches(Markdown).Count;
    [Benchmark(Description = "4_Markdown_New")]
    public int Markdown_New() => New_4.Matches(Markdown).Count;

    private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "5_CSV_Old")]
    public int CSV_Old() => Old_5.Matches(CsvData).Count;
    [Benchmark(Description = "5_CSV_New")]
    public int CSV_New() => New_5.Matches(CsvData).Count;

    private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
    private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "6_DotExcl_Old")]
    public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
    [Benchmark(Description = "6_DotExcl_New")]
    public int DotExcl_New() => New_6.Matches(WinText1K).Count;

    private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "7_WordEOL_Old")]
    public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
    [Benchmark(Description = "7_WordEOL_New")]
    public int WordEOL_New() => New_7.Matches(WinText1K).Count;

    private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "8_LineSt_Old")]
    public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
    [Benchmark(Description = "8_LineSt_New")]
    public int LineStart_New() => New_8.Matches(WinText1K).Count;

    // Section 2: Unix \n text (control)
    private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "9_UnixLines_Old")]
    public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
    [Benchmark(Description = "9_UnixLines_New")]
    public int UnixLines_New() => New_9.Matches(UnixText1K).Count;

    private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
    private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "10_UnixDot_Old")]
    public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
    [Benchmark(Description = "10_UnixDot_New")]
    public int UnixDot_New() => New_10.Matches(UnixText1K).Count;

    // Section 3: Mixed newline text
    private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
    private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "11_MixedDot_Old")]
    public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
    [Benchmark(Description = "11_MixedDot_New")]
    public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;

    private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "12_MixedLines_Old")]
    public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
    [Benchmark(Description = "12_MixedLines_New")]
    public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;

    // Section 4: Non-anchor patterns (zero impact)
    private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
    private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "14_Literal_Old")]
    public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
    [Benchmark(Description = "14_Literal_New")]
    public int Literal_New() => New_14.Matches(MixedAll1K).Count;

    private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
    private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "15_Words_Old")]
    public int Words_Old() => Old_15.Matches(WinText1K).Count;
    [Benchmark(Description = "15_Words_New")]
    public int Words_New() => New_15.Matches(WinText1K).Count;

    // Section 5: Pathological
    private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P1_BareEOL_Old")]
    public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
    [Benchmark(Description = "P1_BareEOL_New")]
    public int BareEOL_New() => New_P1.Matches(WinText1K).Count;

    private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);
    [Benchmark(Description = "P2_BareBOL_Old")]
    public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
    [Benchmark(Description = "P2_BareBOL_New")]
    public int BareBOL_New() => New_P2.Matches(WinText1K).Count;

    private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
    private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);
    [Benchmark(Description = "P3_EndZ_Old")]
    public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
    [Benchmark(Description = "P3_EndZ_New")]
    public bool EndZ_New() => New_P3.IsMatch(WinText10K);
}

danmoseley · 2026-03-03T07:38:53Z

Verified that adding VT/FF made no material perf difference

danmoseley · 2026-03-03T08:02:06Z

Below is analysis of benchmarks that generally compare AnyNewLine to previous workarounds, and adding AnyNewLine to existing (perhaps pathological) patterns. TLDR: sometimes ANL is faster than the workaround, sometimes not (just more functional -- more newline types, eg). On some pathological cases, ANL is slower. One principal example is a file that only has \n's in (where an option is to just not use ANL). No surprises here.

AnyNewLine Performance Benchmark Results

Setup

Branch: anynewline-lower-v2 (PR #124701)
Baseline: PR base commit 213a41d3d95b (main)
HEAD: f08383ab77d (with VT/FF support)
Tool: BenchmarkDotNet 0.14.0, InProcessEmitToolchain, MediumRun
Runtime: locally-built .NET 11 via testhost
Machine: Developer workstation (Windows), not a controlled benchmark environment

HEAD: AnyNewLine ("New") vs Workaround Patterns ("Old")

Both columns run on the same HEAD binary. "Old" uses manual workaround patterns (no AnyNewLine). "New" uses simplified AnyNewLine patterns. All patterns use Compiled | Multiline unless noted.

Benchmark	Old Pattern	New Pattern	Old (μs)	New (μs)	Ratio	Notes
1a_Lines1K	`^.+\r?$`	`^.+$`	48.1	48.7	1.01x
1b_Lines10K	`^.+\r?$`	`^.+$`	1,791	1,738	0.97x
2_Assembly	`\[assembly:...\]\s*$(\r?\n)?`	`\[assembly:...\]\s*$`	39.4	32.8	0.83x	AnyNewLine faster
3_KeyVal	`^([^\s:]+):\s*(.+?)\r?$`	`^([^\s:]+):\s*(.+?)$`	109.2	106.3	0.97x
4_Markdown	`^# .+\r?$`	`^# .+$`	11.2	9.6	0.86x	AnyNewLine faster
5_CSV	`^.+\r?$`	`^.+$`	45.9	48.2	1.05x
6_DotExcl	`[^\r\n]+`	`.+`	42.9	46.1	1.07x	No Multiline
7_WordEOL	`\w+\r?$`	`\w+$`	89.6	127.6	1.42x	Lookaround cost
8_LineSt	`(?:^\|\r\n)\w+`	`^\w+`	201.6	203.9	1.01x
9_UnixLines	`^.+$`	`^.+$`	42.5	47.8	1.13x	`\n`-only input
10_UnixDot	`[^\n]+`	`.+`	38.7	47.5	1.23x	No Multiline, `\n`-only
11_MixedDot	`[^\r\n\u0085\u2028\u2029]+`	`.+`	45.4	48.9	1.08x	No Multiline, all newlines
12_MixedLines	`^.+\r?$`	`^.+$`	44.0	48.6	1.10x	`\n`/`\r\n` input
14_Literal	`\r\n\|\r\|\n`	`\r\n\|\r\|\n`	21.7	20.6	0.95x	No anchors
15_Words	`\w+`	`\w+`	310.5	305.7	0.98x	No anchors
P1_BareEOL	`$`	`$`	93.3	124.3	1.33x	`$` lowering cost
P2_BareBOL	`^`	`^`	139.7	116.1	0.83x	AnyNewLine faster
P3_EndZ	`\w+\r?\Z`	`\w+\Z`	489.1	895.6	1.83x	`\Z` lowering cost, no Multiline

Baseline (main) vs HEAD: "Old" Patterns Only

These patterns don't use AnyNewLine, so the regex code path is identical on both builds. Differences are machine noise.

Benchmark	Pattern	Baseline (μs)	HEAD (μs)	Δ
1a_Lines1K	`^.+\r?$`	45.0	48.1	+7%
1b_Lines10K	`^.+\r?$`	1,311	1,791	+37%
2_Assembly	`\[assembly:...\]\s*$(\r?\n)?`	37.1	39.4	+6%
3_KeyVal	`^([^\s:]+):\s*(.+?)\r?$`	108.6	109.2	+1%
4_Markdown	`^# .+\r?$`	11.1	11.2	+1%
5_CSV	`^.+\r?$`	45.8	45.9	+0%
6_DotExcl	`[^\r\n]+`	43.2	42.9	−1%
7_WordEOL	`\w+\r?$`	89.2	89.6	+0%
8_LineSt	`(?:^\|\r\n)\w+`	203.4	201.6	−1%
9_UnixLines	`^.+$`	44.7	42.5	−5%
10_UnixDot	`[^\n]+`	37.2	38.7	+4%
11_MixedDot	`[^\r\n\u0085\u2028\u2029]+`	43.7	45.4	+4%
12_MixedLines	`^.+\r?$`	42.6	44.0	+3%
14_Literal	`\r\n\|\r\|\n`	19.0	21.7	+14%
15_Words	`\w+`	293.6	310.5	+6%
P1_BareEOL	`$`	95.9	93.3	−3%
P2_BareBOL	`^`	138.2	139.7	+1%
P3_EndZ	`\w+\r?\Z`	481.9	489.1	+1%

Conclusions

No regression to existing patterns. The baseline-vs-HEAD comparison for "Old" patterns shows only machine noise (no consistent direction, outliers explained by system load).
AnyNewLine perf profile unchanged after VT/FF. Adding VT (U+000B) and FF (U+000C) merged adjacent ranges in character classes, producing no measurable performance difference from the previous run.
Expected costs are in line with design. The $, ^, and \Z lowerings trade a lookaround cost for correct Unicode newline handling. Bare $ (P1, +33%) and \Z (P3, +83%) show the highest overhead. Real-world patterns that combine anchors with other matching (benchmarks 1–12) show minimal impact.
Some AnyNewLine patterns are faster (2_Assembly, 4_Markdown, P2_BareBOL) because the simplified patterns allow better optimization by the regex engine.

Benchmark Code

PerfTest.csproj

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net11.0</TargetFramework>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="BenchmarkDotNet" Version="0.14.0" />
  </ItemGroup>
</Project>

Program.cs

using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Columns;
using BenchmarkDotNet.Reports;
using BenchmarkDotNet.Toolchains.InProcess.Emit;

BenchmarkRunner.Run<AnyNewLineBenchmarks>(
    DefaultConfig.Instance
        .WithSummaryStyle(SummaryStyle.Default.WithRatioStyle(BenchmarkDotNet.Columns.RatioStyle.Percentage))
        .AddJob(Job.MediumRun.WithToolchain(InProcessEmitToolchain.Instance)));

[MemoryDiagnoser(false)]
[HideColumns("Job", "Error", "StdDev", "RatioSD", "Alloc Ratio")]
public class AnyNewLineBenchmarks
{
    private const RegexOptions AnyNewLine = (RegexOptions)0x0800;

    // ── Inputs ──────────────────────────────────────────────────────
    private static string GenerateText(int lineCount, string[] newlines)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < lineCount; i++)
        {
            sb.Append("Lorem ipsum dolor sit amet ");
            sb.Append(i);
            sb.Append(newlines[i % newlines.Length]);
        }
        return sb.ToString();
    }

    private static readonly string WinText1K = GenerateText(1000, ["\r\n"]);
    private static readonly string WinText10K = GenerateText(10000, ["\r\n"]);
    private static readonly string UnixText1K = GenerateText(1000, ["\n"]);
    private static readonly string MixedNR1K = GenerateText(1000, ["\n", "\r\n"]);
    private static readonly string MixedAll1K = GenerateText(1000, ["\n", "\r\n", "\r", "\u0085", "\u2028", "\u2029"]);

    private static readonly string AssemblyInfo;
    private static readonly string KvConfig;
    private static readonly string Markdown;
    private static readonly string CsvData;

    static AnyNewLineBenchmarks()
    {
        var sb = new StringBuilder();
        string[] attrs = {
            "[assembly: AssemblyTitle(\"MyApp\")]",
            "[assembly: AssemblyDescription(\"A sample app\")]",
            "[assembly: AssemblyConfiguration(\"\")]",
            "[assembly: AssemblyCompany(\"Contoso\")]",
            "[assembly: AssemblyProduct(\"MyApp\")]",
            "[assembly: AssemblyCopyright(\"Copyright 2024\")]",
            "[assembly: AssemblyTrademark(\"\")]",
            "[assembly: AssemblyCulture(\"\")]",
            "[assembly: AssemblyVersion(\"1.0.0.0\")]",
            "[assembly: AssemblyFileVersion(\"1.0.0.0\")]"
        };
        foreach (var attr in attrs) { sb.Append(attr); sb.Append("\r\n"); }
        AssemblyInfo = string.Concat(Enumerable.Repeat(sb.ToString(), 50));

        sb.Clear();
        string[] keys = { "Server", "Database", "User", "Password", "Timeout",
                          "MaxPool", "MinPool", "Encrypt", "TrustCert", "AppName" };
        for (int i = 0; i < 50; i++)
        {
            sb.Append(keys[i % keys.Length]); sb.Append(": value_"); sb.Append(i); sb.Append("\r\n");
        }
        KvConfig = string.Concat(Enumerable.Repeat(sb.ToString(), 20));

        sb.Clear();
        for (int i = 0; i < 200; i++)
        {
            sb.Append($"# Heading {i}\r\n");
            sb.Append($"Some paragraph text about topic {i}.\r\n");
            sb.Append($"Another line of content here.\r\n\r\n");
        }
        Markdown = sb.ToString();

        sb.Clear();
        sb.Append("Name,Age,City,Email\r\n");
        for (int i = 0; i < 1000; i++)
            sb.Append($"User{i},{20 + i % 50},City{i % 100},user{i}@example.com\r\n");
        CsvData = sb.ToString();
    }

    // ── Section 1: Real-world patterns on Windows \r\n text ─────────

    // 1a. Line matching 1K
    private static readonly Regex Old_1a = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1a = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Baseline = true, Description = "1a_Lines1K_Old")]
    public int Lines1K_Old() => Old_1a.Matches(WinText1K).Count;
    [Benchmark(Description = "1a_Lines1K_New")]
    public int Lines1K_New() => New_1a.Matches(WinText1K).Count;

    // 1b. Line matching 10K
    private static readonly Regex Old_1b = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_1b = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "1b_Lines10K_Old")]
    public int Lines10K_Old() => Old_1b.Matches(WinText10K).Count;
    [Benchmark(Description = "1b_Lines10K_New")]
    public int Lines10K_New() => New_1b.Matches(WinText10K).Count;

    // 2. Assembly attributes
    private static readonly Regex Old_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$(\r?\n)?",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_2 = new(@"\[assembly:\s*\w+\(.*?\)\]\s*$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "2_Assembly_Old")]
    public int Assembly_Old() => Old_2.Matches(AssemblyInfo).Count;
    [Benchmark(Description = "2_Assembly_New")]
    public int Assembly_New() => New_2.Matches(AssemblyInfo).Count;

    // 3. Key-value config
    private static readonly Regex Old_3 = new(@"^([^\s:]+):\s*(.+?)\r?$",
        RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_3 = new(@"^([^\s:]+):\s*(.+?)$",
        RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "3_KeyVal_Old")]
    public int KeyVal_Old() => Old_3.Matches(KvConfig).Count;
    [Benchmark(Description = "3_KeyVal_New")]
    public int KeyVal_New() => New_3.Matches(KvConfig).Count;

    // 4. Markdown headings
    private static readonly Regex Old_4 = new(@"^# .+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_4 = new(@"^# .+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "4_Markdown_Old")]
    public int Markdown_Old() => Old_4.Matches(Markdown).Count;
    [Benchmark(Description = "4_Markdown_New")]
    public int Markdown_New() => New_4.Matches(Markdown).Count;

    // 5. CSV line parsing
    private static readonly Regex Old_5 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_5 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "5_CSV_Old")]
    public int CSV_Old() => Old_5.Matches(CsvData).Count;
    [Benchmark(Description = "5_CSV_New")]
    public int CSV_New() => New_5.Matches(CsvData).Count;

    // 6. [^\r\n]+ vs .+
    private static readonly Regex Old_6 = new(@"[^\r\n]+", RegexOptions.Compiled);
    private static readonly Regex New_6 = new(@".+", RegexOptions.Compiled | AnyNewLine);

    [Benchmark(Description = "6_DotExcl_Old")]
    public int DotExcl_Old() => Old_6.Matches(WinText1K).Count;
    [Benchmark(Description = "6_DotExcl_New")]
    public int DotExcl_New() => New_6.Matches(WinText1K).Count;

    // 7. \w+\r?$ vs \w+$
    private static readonly Regex Old_7 = new(@"\w+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_7 = new(@"\w+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "7_WordEOL_Old")]
    public int WordEOL_Old() => Old_7.Matches(WinText1K).Count;
    [Benchmark(Description = "7_WordEOL_New")]
    public int WordEOL_New() => New_7.Matches(WinText1K).Count;

    // 8. (?:^|\r\n)\w+ vs ^\w+
    private static readonly Regex Old_8 = new(@"(?:^|\r\n)\w+", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_8 = new(@"^\w+", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "8_LineSt_Old")]
    public int LineStart_Old() => Old_8.Matches(WinText1K).Count;
    [Benchmark(Description = "8_LineSt_New")]
    public int LineStart_New() => New_8.Matches(WinText1K).Count;

    // ── Section 2: Unix \n text (control) ───────────────────────────

    // 9. Same pattern, flag only
    private static readonly Regex Old_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_9 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "9_UnixLines_Old")]
    public int UnixLines_Old() => Old_9.Matches(UnixText1K).Count;
    [Benchmark(Description = "9_UnixLines_New")]
    public int UnixLines_New() => New_9.Matches(UnixText1K).Count;

    // 10. [^\n]+ vs .+
    private static readonly Regex Old_10 = new(@"[^\n]+", RegexOptions.Compiled);
    private static readonly Regex New_10 = new(@".+", RegexOptions.Compiled | AnyNewLine);

    [Benchmark(Description = "10_UnixDot_Old")]
    public int UnixDot_Old() => Old_10.Matches(UnixText1K).Count;
    [Benchmark(Description = "10_UnixDot_New")]
    public int UnixDot_New() => New_10.Matches(UnixText1K).Count;

    // ── Section 3: Mixed newline text ───────────────────────────────

    // 11. Full char class workaround vs .+
    private static readonly Regex Old_11 = new(@"[^\r\n\u0085\u2028\u2029]+", RegexOptions.Compiled);
    private static readonly Regex New_11 = new(@".+", RegexOptions.Compiled | AnyNewLine);

    [Benchmark(Description = "11_MixedDot_Old")]
    public int MixedDot_Old() => Old_11.Matches(MixedAll1K).Count;
    [Benchmark(Description = "11_MixedDot_New")]
    public int MixedDot_New() => New_11.Matches(MixedAll1K).Count;

    // 12. Mixed \n/\r\n line matching
    private static readonly Regex Old_12 = new(@"^.+\r?$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_12 = new(@"^.+$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "12_MixedLines_Old")]
    public int MixedLines_Old() => Old_12.Matches(MixedNR1K).Count;
    [Benchmark(Description = "12_MixedLines_New")]
    public int MixedLines_New() => New_12.Matches(MixedNR1K).Count;

    // ── Section 4: Non-anchor patterns (zero impact) ────────────────

    // 14. Literal newlines
    private static readonly Regex Old_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled);
    private static readonly Regex New_14 = new(@"\r\n|\r|\n", RegexOptions.Compiled | AnyNewLine);

    [Benchmark(Description = "14_Literal_Old")]
    public int Literal_Old() => Old_14.Matches(MixedAll1K).Count;
    [Benchmark(Description = "14_Literal_New")]
    public int Literal_New() => New_14.Matches(MixedAll1K).Count;

    // 15. Word matching
    private static readonly Regex Old_15 = new(@"\w+", RegexOptions.Compiled);
    private static readonly Regex New_15 = new(@"\w+", RegexOptions.Compiled | AnyNewLine);

    [Benchmark(Description = "15_Words_Old")]
    public int Words_Old() => Old_15.Matches(WinText1K).Count;
    [Benchmark(Description = "15_Words_New")]
    public int Words_New() => New_15.Matches(WinText1K).Count;

    // ── Section 5: Pathological ─────────────────────────────────────

    // P1. Bare $
    private static readonly Regex Old_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P1 = new(@"$", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "P1_BareEOL_Old")]
    public int BareEOL_Old() => Old_P1.Matches(WinText1K).Count;
    [Benchmark(Description = "P1_BareEOL_New")]
    public int BareEOL_New() => New_P1.Matches(WinText1K).Count;

    // P2. Bare ^
    private static readonly Regex Old_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline);
    private static readonly Regex New_P2 = new(@"^", RegexOptions.Compiled | RegexOptions.Multiline | AnyNewLine);

    [Benchmark(Description = "P2_BareBOL_Old")]
    public int BareBOL_Old() => Old_P2.Matches(WinText1K).Count;
    [Benchmark(Description = "P2_BareBOL_New")]
    public int BareBOL_New() => New_P2.Matches(WinText1K).Count;

    // P3. \w+\Z on large input
    private static readonly Regex Old_P3 = new(@"\w+\r?\Z", RegexOptions.Compiled);
    private static readonly Regex New_P3 = new(@"\w+\Z", RegexOptions.Compiled | AnyNewLine);

    [Benchmark(Description = "P3_EndZ_Old")]
    public bool EndZ_Old() => Old_P3.IsMatch(WinText10K);
    [Benchmark(Description = "P3_EndZ_New")]
    public bool EndZ_New() => New_P3.IsMatch(WinText10K);
}

danmoseley · 2026-03-03T08:10:06Z

@MihuBot benchmark Regex

danmoseley · 2026-03-03T08:10:56Z

@MihuBot regexdiff

MihuBot · 2026-03-03T08:30:48Z

0 out of 18857 patterns have generated source code changes.

JIT assembly changes

Total bytes of base: 55915821
Total bytes of diff: 55915821
Total bytes of delta: 0 (0.00 % of base)

Sample source code for further analysis

const string JsonPath = "RegexResults-1803.json";
if (!File.Exists(JsonPath))
{
    await using var archiveStream = await new HttpClient().GetStreamAsync("https://mihubot.xyz/r/FIQBKc7A");
    using var archive = new ZipArchive(archiveStream, ZipArchiveMode.Read);
    archive.Entries.First(e => e.Name == "Results.json").ExtractToFile(JsonPath);
}

using FileStream jsonFileStream = File.OpenRead(JsonPath);
RegexEntry[] entries = JsonSerializer.Deserialize<RegexEntry[]>(jsonFileStream, new JsonSerializerOptions { IncludeFields = true })!;
Console.WriteLine($"Working with {entries.Length} patterns");



record KnownPattern(string Pattern, RegexOptions Options, int Count);

sealed class RegexEntry
{
    public required KnownPattern Regex { get; set; }
    public required string MainSource { get; set; }
    public required string PrSource { get; set; }
    public string? FullDiff { get; set; }
    public string? ShortDiff { get; set; }
    public (string Name, string Values)[]? SearchValuesOfChar { get; set; }
    public (string[] Values, StringComparison ComparisonType)[]? SearchValuesOfString { get; set; }
}

MihuBot · 2026-03-03T09:50:05Z

See benchmark results at https://gist.github.com/MihuBot/d906406489feb8a231966adc754246f4

danmoseley · 2026-03-03T17:16:02Z

MihuBot Benchmark Analysis (Post VT/FF)

Zero regressions. The PR has no impact on existing patterns.

Confirmed by two independent signals:

Regexdiff: 0 out of 18,857 patterns changed. JIT assembly is byte-for-byte identical (Total bytes of delta: 0 (0.00% of base)). Since no code paths are altered for patterns without AnyNewLine, any benchmark differences are definitionally noise.
Benchmark ratios are centered on 1.00x across hundreds of benchmarks. The few apparent outliers are explained by noise:

Benchmark	Ratio	Why noise
SliceSlice `IgnoreCase, None`	1.07x	Interpreted mode; IgnoreCase path untouched by PR
BoostDocs Id 10 `Compiled`	1.41x	22→32 ns absolute; sub-10ns jitter at nanosecond scale
BoostDocs Id 13 `Compiled`	1.34x	Same — 26→35 ns, nanosecond-scale noise
Common `ReplaceWords` `IgnoreCase,Compiled`	1.25x	Same config shows `SplitWords` at 0.77x and `MatchesWords` at 0.82x — contradictory = noise
Common `ReplaceWords` `Compiled`	1.12x	1149→1290 ns; within typical BDN jitter
Leipzig `(?i)Tom\|...\|Finn` `Compiled`	1.13x	Error bars ~865 μs on ~3000 μs mean — meaningless

Consistent with the first MihuBot run (pre-VT/FF), which also showed zero impact with an identical regexdiff.

The PR adds parser-only lowering gated behind RegexOptions.AnyNewLine (0x0800). When the flag isn't set, no code path is touched. MihuBot confirms this with zero assembly diff and noise-level benchmark variation across all suites (Sherlock, Leipzig, BoostDocs, Common, Cache, RegexRedux, Mariomkas, SliceSlice, Russian, Chinese).

danmoseley · 2026-03-03T17:16:19Z

nuts, didn't mean to close/reopen.

danmoseley · 2026-03-05T02:04:20Z

For interest, once we've taken this we can consider \R. We'd need to decide we actually want it as a feature first (there are good reasons, including parity with other major engines). But here's what the code looks like -- it's a small change, non breaking and pay for play: danmoseley#35

jzabroski · 2026-03-05T02:40:05Z

I'm excited to use it.

One interesting use case for more powerful Regex functionality is AI models with large context windows. There's been some interesting studies that suggest agents are more effective using grep than RAG pipelines using vector databases, and the inflection point is largely due to large context windows. It seems the main advantage to using a vector database is GDPR compliance and other privacy laws compliance, as you can mask with embeddings the data using GUIDs, and havestrong data governance controls over what parts of an ontology graph a given user has rights to. For anything not sensitive, grep with regex wins.

Restructure all three anchor lowering methods (Eol, Bol, EndZ) to replace the 2-branch outer Alternate node with a sequential Concatenate of: primary lookaround + shared CRLF guard. Key idea: include ALL newline chars (including \n for $, \r for ^) in the primary lookaround's character class, then append (?!(?<=\r)\n) as a guard to block matching at the \r\n boundary. Before ($ example): (?=[\r\v\f\u0085\u2028\u2029]|\z)|(?<!\r)(?=\n) After: (?=[\n\r\v\f\u0085\u2028\u2029]|\z)(?!(?<=\r)\n) At non-newline positions (the vast majority during backtracking), the primary lookaround fails immediately and the Concatenate short-circuits — the CRLF guard is never evaluated. The old structure evaluated both branches of the outer Alternate at every position. Extract shared AnyNewLineCrLfGuardNode() helper used by all three methods. Replace AnyNewLineExceptLfClass / AnyNewLineExceptCrClass with unified AnyNewLineClass constant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley · 2026-03-06T18:22:20Z

Optimized lowering given the observation that newlines are less common than non newlines. . can't be improved without engine changes, which we're avoiding. -- Dan

====

Optimize anchor lowering: eliminate outer alternation

The previous lowerings used a two-branch Alternate at the top level. The problem: at non-newline positions, both branches must be evaluated and fail. Since most characters in typical input are non-newline, this doubles the per-character rejection cost for anchors inside loops like \w+$.

The new structure replaces the outer Alternate with a sequential Concatenate: a single primary lookaround that matches all newline characters (including \n and \r), followed by a shared CRLF guard (?!(?<=\r)\n) that blocks the \r-side of a \r\n pair. At non-newline positions the primary lookaround fails immediately and the guard is never evaluated.

Before/after lowerings:

Construct	Before	After
`$` (multiline)	`(?=[\r\u0085\u2028\u2029]\|\z)\|(?<!\r)(?=\n)`	`(?=[\n\r\v\f\u0085\u2028\u2029]\|\z)(?!(?<=\r)\n)`
`$` (non-multiline) / `\Z`	`(?=\r\n\z\|[\r\u0085\u2028\u2029]?\z)\|(?<!\r)(?=\n\z)`	`(?=\r\n\z\|[\n\r\v\f\u0085\u2028\u2029]?\z)(?!(?<=\r)\n)`
`^` (multiline)	`(?<=[\n\u0085\u2028\u2029]\|\A)\|(?<=\r)(?!\n)`	`(?<=[\n\r\v\f\u0085\u2028\u2029]\|\A)(?!(?<=\r)\n)`

The key structural change in each case: branch1 \| branch2 becomes unified_lookaround + guard. The AnyNewLineExceptLfClass / AnyNewLineExceptCrClass constants are replaced by a single AnyNewLineClass constant since the CRLF guard handles the split.

This change is entirely within the AnyNewLine lowering code path -- it has no effect on patterns that don't use RegexOptions.AnyNewLine.

danmoseley · 2026-03-06T18:29:06Z

Perf results after anchor lowering optimization

Measured locally with BenchmarkDotNet MediumRun (15 iterations), RegexOptions.Compiled, Release build, .NET 11.0. Same methodology as the PR's "AnyNewLine vs Workaround Patterns" table -- ratio is New(AnyNewLine) / Old(manual workaround). All match counts verified identical.

Section 1: Real-world patterns on Windows \r\n text

#	Previous Workaround	AnyNewLine	Previous Workaround	AnyNewLine	Ratio	Notes
1a	`^.+\r?$` (1K lines)	`^.+$`	47.5 us	50.8 us	1.07x	`.` overhead (Set vs Notone)
1b	`^.+\r?$` (10K lines)	`^.+$`	1717 us	1766 us	1.03x	Same, amortized over longer input
2	`\[assembly:...\]$(\r?\n)?`	`\[assembly:...\]$`	39.4 us	35.4 us	0.90x	Simpler pattern wins
3	`^([^\s:]+):\s*(.+?)\r?$`	`^([^\s:]+):\s*(.+?)$`	111.2 us	105.5 us	0.95x	Simpler pattern wins
4	`^# .+\r?$`	`^# .+$`	11.8 us	11.0 us	0.93x	Faster: literal `#` prefix optimizes well
5	`^.+\r?$` (CSV)	`^.+$`	48.4 us	52.3 us	1.08x	`.` overhead
6	`[^\r\n]+`	`.+`	46.3 us	47.3 us	1.02x	`.` overhead, minimal
7	`\w+\r?$`	`\w+$`	91.6 us	92.0 us	1.00x	Was 1.33x before this optimization
8	`(?:^\|\r\n)\w+`	`^\w+`	200.8 us	189.3 us	0.94x	Simpler pattern wins

Section 2: Unix \n text (overhead of just enabling the flag)

#	Previous Workaround	AnyNewLine	Previous Workaround	AnyNewLine	Ratio	Notes
9	`^.+$`	`^.+$`	48.0 us	51.0 us	1.06x	`.` overhead
10	`[^\n]+`	`.+`	41.2 us	47.9 us	1.16x	`.` overhead (Notone vs Set)

Section 3: Mixed \n/\r\n text

#	Previous Workaround	AnyNewLine	Previous Workaround	AnyNewLine	Ratio	Notes
11	`[^\r\n\u0085\u2028\u2029]+`	`.+`	47.1 us	52.4 us	1.11x	`.` overhead
12	`^.+\r?$` (mixed 1K)	`^.+$`	46.6 us	51.7 us	1.11x	`.` + anchor overhead

Section 4: Non-anchor/dot patterns (zero impact expected)

#	Previous Workaround	AnyNewLine	Previous Workaround	AnyNewLine	Ratio	Notes
14	`\r\n\|\r\|\n`	`\r\n\|\r\|\n`	41.1 us	42.8 us	1.04x	No lowering, within noise
15	`\w+`	`\w+`	314.3 us	310.7 us	0.99x	No lowering, within noise

Section 5: Bare anchors (no simple workaround exists for these)

#	Pattern (no workaround)	Pattern (+ AnyNewLine)	Without AnyNewLine	AnyNewLine	Ratio	Notes
P1	`$` (multiline, `\n`-only)	`$` (all newlines)	106.0 us	122.9 us	1.16x	Now correct; was 1.37x before optimization
P2	`^` (multiline, `\n`-only)	`^` (all newlines)	152.9 us	119.4 us	0.78x	Now correct; faster here due to a curiosity (issue)
P3	`\w+\r?\Z` (partial)	`\w+\Z`	113.4 us	114.9 us	1.01x	Was ~1.9x before optimization

Summary:

The remaining overhead in dot-heavy patterns (1.02x--1.16x) comes entirely from . being lowered to a Set node ([^\n\r\v\f\u0085\u2028\u2029]) instead of the engine's native Notone node -- this is inherent to the lowering approach and would require engine changes to address.
The anchor optimization eliminated the worst regressions: \w+$ from 1.33x to 1.00x, bare $ from 1.37x to 1.16x, and \w+\Z from ~1.9x to 1.01x.
Patterns where AnyNewLine simplifies the regex (removing \r?, (\r?\n)?, (?:^|\r\n)) are often faster than the workaround (0.90x--0.95x).

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs

src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexParser.cs

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Match.Tests.cs

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Tests.Common.cs

- Remove UTF-8 BOM from RegexParser.cs and Regex.Match.Tests.cs - Remove extra blank line in RegexParser.cs (line 24) - Add blank line between AnyNewLine_Dollar_TestData and AnyNewLine_EndZ - Add missing \u2028 (Line Separator) test case for \Z - Add RegexOptionAnyNewLine assertion in test helpers Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley and others added 14 commits February 20, 2026 21:59

Lower \Z with AnyNewLine

37317bb

When AnyNewLine is set, lower \Z using the same sub-tree as $ without Multiline. \Z is not affected by Multiline, so the same lowering applies regardless. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Lower . with AnyNewLine

1141050

When AnyNewLine is set (without Singleline), lower . to [^\n\r] instead of [^\n], so dot does not match \r or \n. Add NotNewLineOrCarriageReturnClass constant to RegexCharClass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add AnyNewLine integration tests

283d49d

Combined ^/$/. tests, Replace/Split, RightToLeft, mixed newlines, empty lines, \Z with trailing newlines, and edge cases. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add AnyNewLine test for UpgradeToGeneratedRegex analyzer

a959964

Verify the fixer correctly emits RegexOptions.Multiline | RegexOptions.AnyNewLine in enum value order when upgrading to GeneratedRegex. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Test NonBacktracking|AnyNewLine is rejected

681a3ba

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Style: single-line ternaries for $ and \Z lowering

c96689e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings February 21, 2026 08:27

github-actions bot added the area-System.Text.RegularExpressions label Feb 21, 2026

dotnet-policy-service bot assigned danmoseley Feb 21, 2026

Copilot started reviewing on behalf of danmoseley February 21, 2026 08:28 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

src/libraries/System.Text.RegularExpressions/tests/FunctionalTests/Regex.Ctor.Tests.cs Show resolved Hide resolved

Add ECMAScript+AnyNewLine rejection test

e8c8fcb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

danmoseley requested a review from Copilot February 21, 2026 08:36

Copilot started reviewing on behalf of danmoseley February 21, 2026 08:37 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

MihuBot mentioned this pull request Feb 21, 2026

[Benchmark X64] [danmoseley] Add RegexOptions.AnyNewLine via parser lowering MihuBot/runtime-utils#1775

Open

build-analysis bot mentioned this pull request Feb 21, 2026

[android][clr] No peer certificates when executing System.Net.Http.Functional.Tests on Android emulator #124526

Open

MihuBot mentioned this pull request Mar 3, 2026

[Benchmark X64] [danmoseley] Add RegexOptions.AnyNewLine via parser lowering MihuBot/runtime-utils#1802

Open

MihuBot mentioned this pull request Mar 3, 2026

[RegexDiff X64] [danmoseley] Add RegexOptions.AnyNewLine via parser lowering MihuBot/runtime-utils#1803

Open

danmoseley closed this Mar 3, 2026

danmoseley reopened this Mar 3, 2026

danmoseley mentioned this pull request Mar 4, 2026

Prototype: Implement \R (Unicode line ending escape, TR18 RL1.6) danmoseley/runtime#35

Open

Copilot AI review requested due to automatic review settings March 6, 2026 18:20

Copilot started reviewing on behalf of danmoseley March 6, 2026 18:22 View session

Copilot AI reviewed Mar 6, 2026

View reviewed changes

danmoseley mentioned this pull request Mar 6, 2026

Regex pattern '^' is slower than manually lowered equivalent #125277

Closed

Copilot AI mentioned this pull request Mar 6, 2026

Fix BOL anchor not writing back updated position in TryFindNextPossibleStartingPosition #125280

Merged

Conversation

danmoseley commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

Approach: Parser Lowering

Out of scope: \R

Changes

Production code

Tests

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

MihuBot commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

Uh oh!

danmoseley commented Feb 21, 2026

AnyNewLine Performance Analysis (Release, Compiled, .NET 11.0, BenchmarkDotNet)

Section 1: Real-World Patterns on Windows \r\n Text

Section 2: Unix \n Text (overhead of just enabling the flag)

Section 3: Mixed \n/\r\n Text

Section 4: Non-anchor/dot Patterns (zero impact expected)

Section 5: Pathological Cases (unlikely in practice)

Summary

Uh oh!

danmoseley commented Mar 3, 2026

Uh oh!

danmoseley commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AnyNewLine Performance Benchmark Results

Setup

HEAD: AnyNewLine ("New") vs Workaround Patterns ("Old")

Baseline (main) vs HEAD: "Old" Patterns Only

Conclusions

Benchmark Code

Uh oh!

danmoseley commented Mar 3, 2026

Uh oh!

danmoseley commented Mar 3, 2026

Uh oh!

MihuBot commented Mar 3, 2026

Uh oh!

MihuBot commented Mar 3, 2026

Uh oh!

danmoseley commented Mar 3, 2026

MihuBot Benchmark Analysis (Post VT/FF)

Uh oh!

danmoseley commented Mar 3, 2026

Uh oh!

danmoseley commented Mar 5, 2026

Uh oh!

jzabroski commented Mar 5, 2026

Uh oh!

danmoseley commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Optimize anchor lowering: eliminate outer alternation

Uh oh!

danmoseley commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Perf results after anchor lowering optimization

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

danmoseley commented Feb 21, 2026 •

edited

Loading

Out of scope: `\R`

Section 1: Real-World Patterns on Windows `\r\n` Text

Section 2: Unix `\n` Text (overhead of just enabling the flag)

Section 3: Mixed `\n`/`\r\n` Text

danmoseley commented Mar 3, 2026 •

edited

Loading

danmoseley commented Mar 6, 2026 •

edited

Loading

danmoseley commented Mar 6, 2026 •

edited

Loading