Note
This is an AI-generated research report. All text and code in this report was created by an LLM (Large Language Model). For more information on how these reports are created, see the main research repository.
Python bindings for the memchr Rust library, providing high-performance byte and substring search using SIMD optimizations.
This package wraps the high-performance memchr Rust crate by Andrew Gallant (BurntSushi), bringing its SIMD-optimized search algorithms to Python. The memchr library is widely used in the Rust ecosystem (including in ripgrep) and provides heavily optimized routines for:
- Single byte search (finding first/last occurrence of 1, 2, or 3 bytes)
- Substring search (finding occurrences of a byte pattern)
Build from source with maturin:
git clone <repository>
cd pymemchr
pip install maturin
maturin developimport pymemchr
# Find first occurrence of a byte
haystack = b"hello world"
idx = pymemchr.memchr(ord('o'), haystack) # Returns 4
# Find first of two bytes
idx = pymemchr.memchr2(ord('x'), ord('o'), haystack) # Returns 4
# Find first of three bytes
idx = pymemchr.memchr3(ord('x'), ord('y'), ord('o'), haystack) # Returns 4# Find last occurrence of a byte
idx = pymemchr.memrchr(ord('o'), haystack) # Returns 7
# Find last of two or three bytes
idx = pymemchr.memrchr2(ord('x'), ord('o'), haystack) # Returns 7
idx = pymemchr.memrchr3(ord('x'), ord('y'), ord('o'), haystack) # Returns 7# Find all occurrences of a byte
haystack = b"abracadabra"
indices = pymemchr.memchr_iter(ord('a'), haystack) # Returns [0, 3, 5, 7, 10]
# Reverse iteration (indices from last to first)
indices = pymemchr.memrchr_iter(ord('a'), haystack) # Returns [10, 7, 5, 3, 0]
# Find all occurrences of either of two bytes
indices = pymemchr.memchr2_iter(ord('a'), ord('b'), haystack)
# Find all occurrences of any of three bytes
indices = pymemchr.memchr3_iter(ord('a'), ord('b'), ord('c'), haystack)# Find first substring
haystack = b"foo bar foo baz foo"
idx = pymemchr.memmem_find(b"foo", haystack) # Returns 0
# Find last substring
idx = pymemchr.memmem_rfind(b"foo", haystack) # Returns 16
# Find all substrings
indices = pymemchr.memmem_find_iter(b"foo", haystack) # Returns [0, 8, 16]For repeated searches with the same needle, use precompiled finders for better performance:
# Create a reusable finder
finder = pymemchr.Finder(b"needle")
# Search multiple haystacks efficiently
idx1 = finder.find(b"haystack with needle here")
idx2 = finder.find(b"another needle in haystack")
# Find all occurrences
indices = finder.find_iter(b"needle needle needle")
# Reverse finder for finding last occurrence
rfinder = pymemchr.FinderRev(b"needle")
idx = rfinder.rfind(b"needle needle needle") # Returns last occurrence| Function | Description |
|---|---|
memchr(needle, haystack) |
Find first occurrence of byte |
memchr2(n1, n2, haystack) |
Find first occurrence of either byte |
memchr3(n1, n2, n3, haystack) |
Find first occurrence of any of three bytes |
memrchr(needle, haystack) |
Find last occurrence of byte |
memrchr2(n1, n2, haystack) |
Find last occurrence of either byte |
memrchr3(n1, n2, n3, haystack) |
Find last occurrence of any of three bytes |
memchr_iter(needle, haystack) |
Find all occurrences of byte |
memchr2_iter(n1, n2, haystack) |
Find all occurrences of either byte |
memchr3_iter(n1, n2, n3, haystack) |
Find all occurrences of any of three bytes |
memrchr_iter(needle, haystack) |
Find all occurrences (reverse order) |
memrchr2_iter(n1, n2, haystack) |
Find all occurrences of either (reverse) |
memrchr3_iter(n1, n2, n3, haystack) |
Find all of any of three (reverse) |
memmem_find(needle, haystack) |
Find first substring occurrence |
memmem_rfind(needle, haystack) |
Find last substring occurrence |
memmem_find_iter(needle, haystack) |
Find all substring occurrences |
| Class | Description |
|---|---|
Finder(needle) |
Precompiled finder for repeated forward searches |
FinderRev(needle) |
Precompiled finder for repeated reverse searches |
The following benchmarks compare pymemchr against native Python bytes methods. Tests were run on various data sizes from 1KB to 1MB.
| Operation | Speedup |
|---|---|
| memchr (single byte) | 1.84x faster |
| memrchr (reverse) | 1.63x faster |
| memchr2 (two bytes) | 4.27x faster |
| memchr3 (three bytes) | 5.88x faster |
| memmem (short needle, 3 bytes) | 19.55x faster |
| memmem (medium needle, 11 bytes) | 9.43x faster |
| memmem (long needle, 25 bytes) | 5.33x faster |
| memchr_iter (find all bytes) | 4.49x faster |
| memmem_find_iter (find all substrings) | 19.36x faster |
The substring search shows the most dramatic improvements, with up to 20x speedup for short needle searches in large haystacks.
Use pymemchr when:
- Searching through large data (>10KB)
- Performing many repeated searches
- Need to find all occurrences of a pattern
- Searching for one of multiple possible bytes
- Performance is critical
Native Python may be sufficient when:
- Working with small strings (<1KB)
- Performing single, one-off searches
- Code simplicity is more important than performance
This package uses:
- PyO3 (v0.22) for Rust-Python bindings
- memchr (v2.7) for SIMD-optimized search
- maturin for building Python wheels
The underlying memchr library uses:
- SSE2/AVX2 on x86_64
- NEON on aarch64
- SIMD128 on wasm32
- Fallback scalar implementation on other platforms
# Clone and setup
git clone <repository>
cd pymemchr
# Install dependencies and build
uv sync
# Run tests
uv run pytest
# Run benchmarks
uv run python benchmark.py
# Build wheel
uv buildMIT License
- BurntSushi/memchr - The underlying Rust library by Andrew Gallant
- PyO3 - Rust bindings for Python





