Skip to content

Fix StackOverflowException from deeply nested character class subtractions#32

Closed
danmoseley wants to merge 2 commits intomainfrom
fix/regex-parser-stackoverflow
Closed

Fix StackOverflowException from deeply nested character class subtractions#32
danmoseley wants to merge 2 commits intomainfrom
fix/regex-parser-stackoverflow

Conversation

@danmoseley
Copy link
Owner

@danmoseley danmoseley commented Feb 24, 2026

Fix StackOverflowException from deeply nested character class subtractions

Summary

RegexParser.ScanCharClass and several methods in RegexCharClass recursively process character class subtractions ([a-[b-[c-[...]]]]). A pattern with ~10,000 nesting levels (~50KB) exhausts the thread stack, causing an uncatchable StackOverflowException that terminates the process. This is a DoS vector for any service accepting user-supplied regex patterns.

All regex modes are affected (None, Compiled, NonBacktracking) because the crash occurs during parsing, before engine-specific logic.

Changes

Converted all recursive subtraction processing to iterative:

  • RegexParser.ScanCharClass: Uses an explicit List<RegexCharClass?> parent stack. When a subtraction [ is encountered, the current char class is pushed and a new one started. When ] closes a level, the child is set as the parent's subtraction.
  • RegexCharClass.ToStringClass: Tail-call converted to a do/while loop over _subtractor.
  • RegexCharClass.CharInClassRecursive: Converted to a toggle-based while loop — each consecutive level where the character matches toggles the result.
  • RegexCharClass.ParseRecursive: Converted to forward iterative construction of the subtraction chain.

Character class subtraction was the only recursive parsing path in RegexParser. Other nestable constructs such as groups ((((...)))) are already parsed iteratively via an explicit stack (PushGroup/PopGroup) and are not susceptible to stack overflow regardless of nesting depth.

Tests

  • CharClassSubtraction_DeepNesting_DoesNotStackOverflow: 10,000 depth (1,000 for SourceGenerated to avoid overwhelming Roslyn). Verifies parsing + matching succeeds.
  • CharClassSubtraction_Correctness: Validates [a-z-[d-w]] and [a-z-[d-w-[m]]] produce correct match/non-match results across all engines.

danmoseley and others added 2 commits February 23, 2026 22:15
…tions

Convert recursive methods to iterative to prevent stack overflow when
parsing and evaluating regex patterns with deeply nested character class
subtractions (e.g., [a-[a-[a-[...[a]...]]]]).

The following methods were converted from recursive to iterative:

- RegexParser.ScanCharClass: uses an explicit parent stack to track
  nested character class levels during parsing
- RegexCharClass.ToStringClass: tail-call converted to do/while loop
- RegexCharClass.CharInClassRecursive: converted to toggle-based loop
  that counts consecutive matching levels
- RegexCharClass.ParseRecursive: converted to forward iterative
  construction of the subtraction chain

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Rename CharInClassRecursive -> CharInClassIterative and ParseRecursive -> ParseIterative since implementations are now iterative
- Inline ParseIterative into Parse and ToStringClass(ref) into ToStringClass()
- Inline StressTestNestingDepth constant into its only usage
- Add tests: case-insensitive subtraction, negated outer class, 4-level nesting

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danmoseley danmoseley closed this Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant