Skip to content

Regex optimizations#1086

Merged
colinodell merged 6 commits into2.7from
regex-optimizations
Jul 20, 2025
Merged

Regex optimizations#1086
colinodell merged 6 commits into2.7from
regex-optimizations

Conversation

@colinodell
Copy link
Member

@colinodell colinodell commented Jul 20, 2025

Optimize several regular expressions in RegexHelper.php for better performance:

  • REGEX_PUNCTUATION: Simplified from explicit character list to Unicode character classes [\p{P}\p{S}], removing redundancy
  • REGEX_THEMATIC_BREAK: Consolidated anchors and restructured alternation for more efficient matching
  • PARTIAL_ENTITY: Added atomic group (?>#...) to prevent catastrophic backtracking during entity parsing
  • HTML patterns: Applied possessive quantifiers (*+, ++) to PARTIAL_OPENTAG, PARTIAL_CLOSETAG, PARTIAL_OPENBLOCKTAG, PARTIAL_CLOSEBLOCKTAG, and PARTIAL_LINK_TITLE to eliminate unnecessary backtracking
  • HTML block regexes: Updated TYPE_6_BLOCK_ELEMENT with possessive quantifiers for better performance on malformed HTML

Fixes #674


(Note that we don't consider changes to regex constants to be BC breaks, so long as they continue to correctly parse what they're supposed to)

@colinodell colinodell merged commit 00f2f51 into 2.7 Jul 20, 2025
32 of 33 checks passed
@colinodell colinodell deleted the regex-optimizations branch July 20, 2025 12:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize regular expressions

1 participant