memchr-python-wrapper

pymemchr

Note

This is an AI-generated research report. All text and code in this report was created by an LLM (Large Language Model). For more information on how these reports are created, see the main research repository.

Python bindings for the memchr Rust library, providing high-performance byte and substring search using SIMD optimizations.

Overview

This package wraps the high-performance memchr Rust crate by Andrew Gallant (BurntSushi), bringing its SIMD-optimized search algorithms to Python. The memchr library is widely used in the Rust ecosystem (including in ripgrep) and provides heavily optimized routines for:

Single byte search (finding first/last occurrence of 1, 2, or 3 bytes)
Substring search (finding occurrences of a byte pattern)

Installation

Build from source with maturin:

git clone <repository>
cd pymemchr
pip install maturin
maturin develop

Usage

Single Byte Search

import pymemchr

# Find first occurrence of a byte
haystack = b"hello world"
idx = pymemchr.memchr(ord('o'), haystack)  # Returns 4

# Find first of two bytes
idx = pymemchr.memchr2(ord('x'), ord('o'), haystack)  # Returns 4

# Find first of three bytes
idx = pymemchr.memchr3(ord('x'), ord('y'), ord('o'), haystack)  # Returns 4

Reverse Byte Search

# Find last occurrence of a byte
idx = pymemchr.memrchr(ord('o'), haystack)  # Returns 7

# Find last of two or three bytes
idx = pymemchr.memrchr2(ord('x'), ord('o'), haystack)  # Returns 7
idx = pymemchr.memrchr3(ord('x'), ord('y'), ord('o'), haystack)  # Returns 7

Find All Occurrences

# Find all occurrences of a byte
haystack = b"abracadabra"
indices = pymemchr.memchr_iter(ord('a'), haystack)  # Returns [0, 3, 5, 7, 10]

# Reverse iteration (indices from last to first)
indices = pymemchr.memrchr_iter(ord('a'), haystack)  # Returns [10, 7, 5, 3, 0]

# Find all occurrences of either of two bytes
indices = pymemchr.memchr2_iter(ord('a'), ord('b'), haystack)

# Find all occurrences of any of three bytes
indices = pymemchr.memchr3_iter(ord('a'), ord('b'), ord('c'), haystack)

Substring Search

# Find first substring
haystack = b"foo bar foo baz foo"
idx = pymemchr.memmem_find(b"foo", haystack)  # Returns 0

# Find last substring
idx = pymemchr.memmem_rfind(b"foo", haystack)  # Returns 16

# Find all substrings
indices = pymemchr.memmem_find_iter(b"foo", haystack)  # Returns [0, 8, 16]

Precompiled Finders

For repeated searches with the same needle, use precompiled finders for better performance:

# Create a reusable finder
finder = pymemchr.Finder(b"needle")

# Search multiple haystacks efficiently
idx1 = finder.find(b"haystack with needle here")
idx2 = finder.find(b"another needle in haystack")

# Find all occurrences
indices = finder.find_iter(b"needle needle needle")

# Reverse finder for finding last occurrence
rfinder = pymemchr.FinderRev(b"needle")
idx = rfinder.rfind(b"needle needle needle")  # Returns last occurrence

API Reference

Functions

Function	Description
`memchr(needle, haystack)`	Find first occurrence of byte
`memchr2(n1, n2, haystack)`	Find first occurrence of either byte
`memchr3(n1, n2, n3, haystack)`	Find first occurrence of any of three bytes
`memrchr(needle, haystack)`	Find last occurrence of byte
`memrchr2(n1, n2, haystack)`	Find last occurrence of either byte
`memrchr3(n1, n2, n3, haystack)`	Find last occurrence of any of three bytes
`memchr_iter(needle, haystack)`	Find all occurrences of byte
`memchr2_iter(n1, n2, haystack)`	Find all occurrences of either byte
`memchr3_iter(n1, n2, n3, haystack)`	Find all occurrences of any of three bytes
`memrchr_iter(needle, haystack)`	Find all occurrences (reverse order)
`memrchr2_iter(n1, n2, haystack)`	Find all occurrences of either (reverse)
`memrchr3_iter(n1, n2, n3, haystack)`	Find all of any of three (reverse)
`memmem_find(needle, haystack)`	Find first substring occurrence
`memmem_rfind(needle, haystack)`	Find last substring occurrence
`memmem_find_iter(needle, haystack)`	Find all substring occurrences

Classes

Class	Description
`Finder(needle)`	Precompiled finder for repeated forward searches
`FinderRev(needle)`	Precompiled finder for repeated reverse searches

Performance Benchmarks

The following benchmarks compare pymemchr against native Python bytes methods. Tests were run on various data sizes from 1KB to 1MB.

Speedup Summary (1MB data)

Operation	Speedup
memchr (single byte)	1.84x faster
memrchr (reverse)	1.63x faster
memchr2 (two bytes)	4.27x faster
memchr3 (three bytes)	5.88x faster
memmem (short needle, 3 bytes)	19.55x faster
memmem (medium needle, 11 bytes)	9.43x faster
memmem (long needle, 25 bytes)	5.33x faster
memchr_iter (find all bytes)	4.49x faster
memmem_find_iter (find all substrings)	19.36x faster

Single Byte Search

Multi-Byte Search (memchr2, memchr3)

Substring Search

The substring search shows the most dramatic improvements, with up to 20x speedup for short needle searches in large haystacks.

Find All Occurrences (Iterator)

Repeated Searches with Precompiled Finder

When to Use pymemchr

Use pymemchr when:

Searching through large data (>10KB)
Performing many repeated searches
Need to find all occurrences of a pattern
Searching for one of multiple possible bytes
Performance is critical

Native Python may be sufficient when:

Working with small strings (<1KB)
Performing single, one-off searches
Code simplicity is more important than performance

Implementation Details

This package uses:

PyO3 (v0.22) for Rust-Python bindings
memchr (v2.7) for SIMD-optimized search
maturin for building Python wheels

The underlying memchr library uses:

SSE2/AVX2 on x86_64
NEON on aarch64
SIMD128 on wasm32
Fallback scalar implementation on other platforms

Development

# Clone and setup
git clone <repository>
cd pymemchr

# Install dependencies and build
uv sync

# Run tests
uv run pytest

# Run benchmarks
uv run python benchmark.py

# Build wheel
uv build

License

MIT License

Credits

BurntSushi/memchr - The underlying Rust library by Andrew Gallant
PyO3 - Rust bindings for Python

Name		Name	Last commit message	Last commit date
parent directory ..
python/pymemchr		python/pymemchr
src		src
tests		tests
.gitignore		.gitignore
Cargo.toml		Cargo.toml
README.md		README.md
_summary.md		_summary.md
benchmark.py		benchmark.py
benchmark_finder.png		benchmark_finder.png
benchmark_iter.png		benchmark_iter.png
benchmark_multi_byte.png		benchmark_multi_byte.png
benchmark_single_byte.png		benchmark_single_byte.png
benchmark_speedup.png		benchmark_speedup.png
benchmark_substring.png		benchmark_substring.png
notes.md		notes.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

pymemchr

Overview

Installation

Usage

Single Byte Search

Reverse Byte Search

Find All Occurrences

Substring Search

Precompiled Finders

API Reference

Functions

Classes

Performance Benchmarks

Speedup Summary (1MB data)

Single Byte Search

Multi-Byte Search (memchr2, memchr3)

Substring Search

Find All Occurrences (Iterator)

Repeated Searches with Precompiled Finder

When to Use pymemchr

Implementation Details

Development

License

Credits

Uh oh!

FilesExpand file tree

memchr-python-wrapper

Directory actions

More options

Directory actions

More options

Latest commit

History

memchr-python-wrapper

Folders and files

parent directory

README.md

pymemchr

Overview

Installation

Usage

Single Byte Search

Reverse Byte Search

Find All Occurrences

Substring Search

Precompiled Finders

API Reference

Functions

Classes

Performance Benchmarks

Speedup Summary (1MB data)

Single Byte Search

Multi-Byte Search (memchr2, memchr3)

Substring Search

Find All Occurrences (Iterator)

Repeated Searches with Precompiled Finder

When to Use pymemchr

Implementation Details

Development

License

Credits