-
Notifications
You must be signed in to change notification settings - Fork 4
Closed
Description
Problem
Pattern (?m)^/.*[\w-]+\.php (multiline anchored with wildcard) is 24% slower than Go stdlib.
Benchmark Results (regex-bench, 6MB input)
| Engine | Time | Matches |
|---|---|---|
| Go stdlib | 124 ms | 1966 |
| Go coregex | 153 ms | 1966 |
Regression: coregex is 24% slower (153ms vs 124ms)
Root Cause
The UseAnchoredLiteral strategy is optimized for single-string matching (^prefix.*suffix$), not multiline (?m) mode.
With (?m) flag:
^matches at position 0 AND after every newline- Need to scan entire input for newline positions
- Current strategy doesn't optimize this case
Expected Behavior
Multiline anchored patterns should be at least as fast as stdlib, ideally faster using:
- Suffix prefilter to find
.phpcandidates - Backward verification to line start (find preceding
\n) - Forward match from line start
Proposed Solution
New strategy UseMultilineAnchoredLiteral:
- Extract suffix literal (
.php) - Use memmem to find all
.phpoccurrences - For each candidate, scan backward to find
\nor start of input - Verify prefix from line start
Related
- Issue noticeably slower than core regexp #79: UseAnchoredLiteral for single-string matching (32-133x speedup) ✅
http_methodspattern(?m)^(GET|POST|...)is 50x faster (uses Teddy prefilter for literal alternation)
Test Case
re := coregex.MustCompile(`(?m)^/.*[\w-]+\.php`)
// Input: 6MB text with ~2000 lines starting with /path/to/file.php
matches := re.FindAll(data, -1) // Should find 1966 matchesBenchmark Command
cd regex-bench && make build && make run
# Look for http_methods and anchored_php patternsMetadata
Metadata
Assignees
Labels
No labels