Releases: coregx/coregex
v0.11.4: FindAll multiline optimization
Fixed
- FindAll/FindAllIndex now use UseMultilineReverseSuffix strategy (Issue #102)
FindIndicesAt()was missing dispatch forUseMultilineReverseSuffixIsMatch/Findwere fast (1µs), butFindAllwas slow (78ms) — 100x gap vs Rust- After fix:
FindAllon 6MB with 2000 matches: ~1ms (was 78ms)
Performance
| Operation | Before | After | Improvement |
|---|---|---|---|
| FindAll (6MB, 2000 matches) | 78ms | ~1ms | 78x faster |
| vs Rust gap | 100x slower | ~1.3x slower | Near parity! |
Changed
- Updated
golang.org/x/sysv0.39.0 → v0.40.0
Full Changelog: v0.11.3...v0.11.4
v0.11.3: Prefix fast path 319-552x speedup
Performance
Pattern (?m)^/.*\.php now 319-552x faster than stdlib (was 3.5-5.7x in v0.11.1)
| Operation | coregex | stdlib | Speedup |
|---|---|---|---|
| IsMatch | 182 ns | 100 µs | 552x |
| Find | 240 ns | 81 µs | 338x |
| CountAll | 58 µs | 18.7 ms | 319x |
Algorithm
- Suffix prefilter finds
.phpcandidates (SIMD memmem) - SIMD backward scan to find line start (
bytes.LastIndexByte) - O(1) prefix byte check (
/at line start) - Skip-to-next-line on mismatch (avoids O(n²) worst case)
- DFA fallback for complex patterns without extractable prefix
Changes
MultilineReverseSuffixSearcher.prefixBytesfor O(1) verificationSetPrefixLiterals()extracts prefix from patternfindLineStart()uses SIMDbytes.LastIndexByte- Skip-to-next-line: on prefix mismatch, jump to next
\nposition
Fixes #99
v0.11.2: DFA verification for UseMultilineReverseSuffix
Performance Improvement
Replace O(n*m) PikeVM verification with O(n) DFA verification for multiline suffix patterns.
Issue: #99 (Rust regex 84x faster on (?m)^/.*\.php)
Benchmark Results
| Case | Before | After | Speedup |
|---|---|---|---|
| No-match (2KB) | 1136 ns | 108 ns | 10.5x |
| Long no-match | 25937 ns | 197 ns | 131x |
| Large input (6MB) | 66 ms | ~5-10 ms | 10-30x (expected) |
Changes
MultilineReverseSuffixSearcher.forwardDFAreplacespikevmfield- Uses
lazy.DFA.SearchAtAnchored()for linear-time anchored matching lazy.CompileWithConfig()creates forward DFA with proper config
Research Insight
Analysis of Rust regex-automata revealed that the hybrid (lazy) DFA does NOT use per-state acceleration — only the dense (pre-compiled) DFA does. The real performance difference comes from using DFA vs NFA/PikeVM for verification.
coregex already has partial state acceleration in dfa/lazy/. The main fix was switching from PikeVM to DFA verification.
Full Changelog: v0.11.1...v0.11.2
v0.11.1: UseMultilineReverseSuffix 3.5-5.7x speedup
What's New
New 18th strategy UseMultilineReverseSuffix for multiline suffix patterns like (?m)^/.*\.php.
Performance (Issue #97)
Before: coregex was 24% slower than stdlib
After: coregex is 3.5-5.7x faster than stdlib
| Operation | coregex | stdlib | Speedup |
|---|---|---|---|
| IsMatch (0.5MB) | 20.6 µs | 72.2 µs | 3.5x |
| Find (0.5MB) | 15.3 µs | 68.7 µs | 4.5x |
| CountAll (200 matches) | 2.56 ms | 14.6 ms | 5.7x |
| No-match (small) | 90 ns | 1.1 µs | 12x |
| No-match (2KB) | 184 ns | 24 µs | 130x |
Algorithm
- Suffix prefilter finds
.phpcandidates - Backward scan to line start (
\nor pos 0) - Forward PikeVM verification
Files
meta/reverse_suffix_multiline.go(NEW)meta/reverse_suffix_multiline_test.go(NEW)
Full Changelog: v0.11.0...v0.11.1
v0.11.0: UseAnchoredLiteral 32-133x speedup
Highlights
Issue #79 RESOLVED! Pattern ^/.*[\w-]+\.php$ now 32-133x faster than stdlib (was 5.3x slower).
New Features
-
UseAnchoredLiteral Strategy - O(1) specialized matching for
^prefix.*suffix$patterns- Algorithm: O(1) length check → O(k) prefix match → O(k) suffix match → O(m) charclass bridge
- 17th strategy in meta-engine
-
V11-002 ASCII Runtime Detection - SIMD-accelerated input classification
- Dual NFA compilation: UTF-8 NFA (28 states) + ASCII NFA (2 states) for patterns with
. - Up to 1.6x faster on ASCII input
- Dual NFA compilation: UTF-8 NFA (28 states) + ASCII NFA (2 states) for patterns with
Bug Fixes
- OnePass DFA handles StateLook anchors (
^,$,\A,\z) - Suffix extraction skips trailing anchors for O(1) rejection
Internal
- meta.go refactored from 2821 lines into 6 focused files (no API changes)
Performance (Issue #79 pattern ^/.*[\w-]+\.php$)
| Input | coregex | stdlib | Speedup |
|---|---|---|---|
| Short (24B) | 7.6 ns | 241 ns | 32x |
| Medium (45B) | 7.8 ns | 347 ns | 44x |
| Long (78B) | 7.9 ns | 516 ns | 65x |
| No match | 4.4 ns | 590 ns | 133x |
Installation
go get github.com/coregx/coregex@v0.11.0Full Changelog: v0.10.10...v0.11.0
v0.10.10: ReverseSuffix CharClass Plus fix
Fixed
- ReverseSuffix whitelist includes CharClass Plus - Performance regression fix
- Bug:
[^\s]+\.txtpattern caused extreme slowdown (266ms/MB instead of µs) - Root cause:
isSafeForReverseSuffixonly recognized.*and.+wildcards - Fix: CharClass Plus patterns (
[^\s]+,[\w]+) now qualify for reverse suffix optimization - Result:
suffix_findpattern now completes in 398µs (was timing out)
- Bug:
Upgrade
go get github.com/coregx/coregex@v0.10.10Full Changelog: https://github.com/coregx/coregex/blob/main/CHANGELOG.md
v0.10.9: UTF-8 optimization + fuzz bug fixes
Added
- UTF-8 suffix sharing for dot NFA - Performance optimization (#79)
- Dot metacharacter NFA states reduced from 39 to 28
- Based on Rust regex-automata approach
- Anchored suffix prefilter - O(1) rejection for suffix patterns (#79)
Fixed
- CharClassSearcher now excludes
*patterns - Zero-width match bug fix - Invalid UTF-8 handling for negated char classes - stdlib compatibility
- ReverseInner whitelist - Strategy safety for patterns with Star of Literal
- ReverseSuffix whitelist - Strategy safety for patterns with optional elements
Full Changelog: https://github.com/coregx/coregex/blob/main/CHANGELOG.md
v0.10.8: FindAll 600x faster for anchored patterns
Performance Fix
FindAll 600x faster for anchored patterns on large inputs (#92)
Problem
FindAll("^HTTP/[12]\.[01]", 6MB_input) took 346µs instead of <1µs
Root Cause
Allocation heuristic make([][2]int, 0, len(haystack)/100+1) created ~1MB buffer for a pattern that matches at most once.
Fix
- Anchored patterns (
^...) usecap=1(max 1 match possible) - Non-anchored patterns capped at 256 (was unbounded)
- Added
Engine.IsStartAnchored()method
Benchmarks (6MB input, 1 match)
| Before | After | |
|---|---|---|
| coregex | 346µs | 567ns |
| stdlib | ~0µs | 566ns |
| Ratio | 600x slower | Equal |
Installation
go get github.com/coregx/coregex@v0.10.8Full Changelog: v0.10.7...v0.10.8
v0.10.7: UTF-8 fixes + 100% stdlib API compatibility
Highlights
🎯 100% stdlib regexp API compatibility - coregex is now a true drop-in replacement for Go's regexp package!
🔧 Critical UTF-8/Unicode fixes - All edge cases now match stdlib behavior.
What's New
100% stdlib API Compatibility
All stdlib regexp methods are now implemented:
CompilePOSIX,MustCompilePOSIX- POSIX ERE semanticsMatch,MatchString,MatchReader- package-level functionsSubexpIndex(name)- named capture group lookupLiteralPrefix()- literal prefix extractionExpand,ExpandString- template substitutionCopy(),MarshalText,UnmarshalText- utility methodsMatchReader,FindReaderIndex,FindReaderSubmatchIndex- io.RuneReader methods
Bug Fixes
- #85: Dot metacharacter now matches UTF-8 codepoints (not bytes)
- #87: Case-insensitive patterns skip literal prefilters correctly
- #88: Empty character classes (
[^\S\s]) no longer match empty strings - #90: Empty pattern Split matches stdlib behavior
- #91: Negated Unicode property classes (
\P{Han}) use proper UTF-8 automata
Migration
Simply replace your imports:
// Before
import "regexp"
// After
import regexp "github.com/coregx/coregex"Full Changelog
See CHANGELOG.md
v0.10.6: CompositeSequenceDFA 25x speedup
What's New
CompositeSequenceDFA: 25x speedup for overlapping char classes
Patterns like \w+[0-9]+ where character classes overlap now run 25x faster than stdlib.
| Engine | Throughput | vs Stdlib |
|---|---|---|
| CompositeSearcher (old) | 56 MB/s | 4.7x |
| CompositeSequenceDFA (new) | 300 MB/s | 25x |
Implementation:
- NFA subset construction for correct overlap handling
- Byte class reduction: 256 bytes → 3-8 equivalence classes
- First-part skip optimization
- Loop unrolling: 4 bytes per iteration (Rust-inspired)
FindAllIndexCompact: Zero per-match allocations
New API for high-performance match iteration:
// Before: 60974 allocations
matches := re.FindAllIndex(data, -1)
// After: 9 allocations (single slice)
matches := re.FindAllIndexCompact(data, -1, nil)
// Zero allocations with buffer reuse
buf := make([][2]int, 0, 1000)
matches = re.FindAllIndexCompact(data, -1, buf)Full Changelog
https://github.com/coregx/coregex/blob/main/CHANGELOG.md#0106---2026-01-14