Skip to content

Conversation

@kolkov
Copy link
Contributor

@kolkov kolkov commented Jan 15, 2026

Summary

  • Fix 600x performance regression for FindAll on anchored patterns with large inputs
  • Anchored patterns (^...) now use cap=1 allocation (max 1 match possible)
  • Non-anchored patterns capped at 256 (was unbounded, causing 1MB allocation for 6MB input)
  • Added Engine.IsStartAnchored() method for allocation optimization

Problem

FindAll("^HTTP/[12]\.[01]", 6MB_input) took 346µs instead of <1µs

Root cause: Allocation heuristic make([][2]int, 0, len(haystack)/100+1) created ~1MB buffer (62,915 capacity) for a pattern that matches at most once.

Benchmarks (6MB input, 1 match)

Before After
coregex 346µs 567ns
stdlib ~0µs 566ns
Ratio 600x slower Equal

Changes

  • meta/meta.go: Smart allocation in findAllIndicesLoop, new IsStartAnchored() method
  • regex.go: Smart allocation in FindAll
  • CHANGELOG.md: Document fix

Test plan

  • All tests pass (go test ./...)
  • Linter passes (golangci-lint run)
  • Benchmarks verified locally
  • CI passes

🤖 Generated with Claude Code

Problem: FindAll("^HTTP/[12]\.[01]", 6MB) took 346µs instead of <1µs
Root cause: Allocation heuristic make([][2]int, 0, len(haystack)/100+1)
created ~1MB buffer (62k capacity) for a pattern that matches at most once.

Fix:
- Anchored patterns (^...) use cap=1 (max 1 match possible)
- Non-anchored patterns capped at 256 (append grows if needed)
- Added Engine.IsStartAnchored() method for allocation optimization

Result: Now matches stdlib performance (~500ns for anchored patterns)

Benchmarks (6MB input, 1 match):
- Before: coregex 346µs vs stdlib ~0µs (600x slower)
- After:  coregex 567ns vs stdlib 566ns (equal performance)
@kolkov kolkov merged commit 9cbc2ef into main Jan 15, 2026
15 checks passed
@github-actions
Copy link

Benchmark Comparison

Comparing main → PR #92

Summary: geomean 252.0n 251.8n -0.08%

⚠️ Potential regressions detected:

geomean                               ³                +0.00%               ³
geomean                               ³                +0.00%               ³
geomean              32.49n         32.56n        +0.23%
geomean                         ³                +0.00%               ³
geomean                         ³                +0.00%               ³
AnchoredAlt_ManyBranches_Stdlib/NoMatch-4               85.51n ± ∞ ¹    86.28n ± ∞ ¹    +0.90% (p=0.008 n=5)
AnchoredAlt_ManyBranches_Coregex/GET-4                  69.50n ± ∞ ¹    70.73n ± ∞ ¹    +1.77% (p=0.008 n=5)
ConcurrentIsMatch-4                                     55.39n ± ∞ ¹    56.08n ± ∞ ¹    +1.25% (p=0.008 n=5)
FatTeddyFallback/small_haystack_37B-4                   84.15n ± ∞ ¹    91.36n ± ∞ ¹    +8.57% (p=0.008 n=5)
IPRegex_Find/stdlib_1KB_sparse-4                        16.13µ ± ∞ ¹    43.88µ ± ∞ ¹  +172.10% (p=0.008 n=5)

Full results available in workflow artifacts. CI runners have ~10-20% variance.
For accurate benchmarks, run locally: ./scripts/bench.sh --compare

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants