Skip to content

recsplit: sharded FuseFilter#20644

Merged
JkLondon merged 52 commits into
mainfrom
alex/sharded_fuse2_35
May 12, 2026
Merged

recsplit: sharded FuseFilter#20644
JkLondon merged 52 commits into
mainfrom
alex/sharded_fuse2_35

Conversation

@AskAlexSharov

@AskAlexSharov AskAlexSharov commented Apr 18, 2026

Copy link
Copy Markdown
Collaborator

Fixes #20560.

Problem:
bloatnet 3B keys .efi file building: FuseFIlter used t2hash=29GB and reverseOrder=26GB
we can't mmap FuseFilter without forking it. But maybe it's okey. Maybe it's better to shard FuseFilter rather than mmap large thing. Maybe sharding will improve data-locality.

--
Solution:
shard fusefilter by first byte of keyHash. it will reduce building allocation buffers 256x times. And it will not add reading overhead. (keyHash>>56 is not same as byte(keyHash). it's hi/lo bytes.)

  • Writer: re-using 1 xorfilter.BuildBinaryFuse to build all shards
  • ShardedReader is a [256]Reader. Lookup: shards[keyHash>>56].ContainsHash(hash)

Depends on: #20722

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for a new RecSplit index format (v2) that uses a sharded fusefilter existence filter to reduce false positives and potentially improve lookup efficiency, while extending test coverage to exercise the new version.

Changes:

  • Introduce sharded fusefilter writer/reader (256 shards by top byte of hash) and wire it into RecSplit/Index for dataStructureVersion ≥ 2.
  • Refactor fusefilter blob header handling and add regression tests around header field offsets and sharded round-trips.
  • Extend RecSplit tests to run the existing two-layer index test suite against v2.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
db/recsplit/recsplit_test.go Runs the two-layer index test for RecSplit version 2.
db/recsplit/recsplit.go Creates/uses a sharded existence filter writer for v2 and closes it on cleanup.
db/recsplit/index.go Adds sharded existence filter reader initialization and lookup gating for v2+.
db/datastruct/fusefilter/fusefilter_writer_test.go Adds round-trip, truncation, and regression tests for sharded writer/reader and header parsing.
db/datastruct/fusefilter/fusefilter_writer.go Adds sharded writer implementation and refactors header serialization helpers.
db/datastruct/fusefilter/fusefilter_reader.go Adds sharded reader implementation and refactors header parsing.
Comments suppressed due to low confidence (1)

db/datastruct/fusefilter/fusefilter_reader.go:101

  • NewReaderOnBytes slices m into header/data without checking len(m) >= filterBlobHeaderSize, and then slices data[:fingerprintsLen] without validating fingerprintsLen against len(data). With truncated/corrupt input this will panic instead of returning an error; please add explicit bounds checks and return a descriptive error.
func NewReaderOnBytes(m []byte, fName string) (*Reader, int, error) {
	filter := &xorfilter.BinaryFuse[uint8]{}

	const headerSize = filterBlobHeaderSize
	header, data := m[:headerSize], m[headerSize:]

	v, features, err := parseHeaderFeatures(header, fName)
	if err != nil {
		return nil, 0, err
	}

	filter.SegmentCount = binary.BigEndian.Uint32(header[4:])
	filter.SegmentCountLength = binary.BigEndian.Uint32(header[4+4:])
	filter.Seed = binary.BigEndian.Uint64(header[4+4+4:])
	filter.SegmentLength = binary.BigEndian.Uint32(header[4+4+4+8:])
	filter.SegmentLengthMask = binary.BigEndian.Uint32(header[4+4+4+8+4:])
	fingerprintsLen := int(binary.BigEndian.Uint64(header[4+4+4+8+4+4:]))

	filter.Fingerprints = data[:fingerprintsLen]
	return &Reader{inner: filter, version: v, features: features, m: m}, headerSize + fingerprintsLen, nil
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/recsplit/index.go
Comment thread db/datastruct/fusefilter/fusefilter_reader.go

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends RecSplit’s “less false positives” existence filtering by adding a sharded fusefilter implementation and wiring it into RecSplit/Index for data structure version 2.

Changes:

  • Add v2 existence filter plumbing in recsplit (writer side) and index (reader side) using a new sharded fusefilter format.
  • Implement WriterSharded / ReaderSharded for fusefilter blobs and refactor common header/feature parsing.
  • Expand test coverage to exercise RecSplit v2 and the new sharded fusefilter round-trips.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
db/recsplit/recsplit_test.go Adds TestTwoLayerIndex coverage for version 2.
db/recsplit/recsplit.go Adds v2 existence filter writer and writes it during build.
db/recsplit/index.go Adds v2 existence filter reader and uses it during Lookup/ForceInMem.
db/datastruct/fusefilter/fusefilter_writer_test.go Adds new sharded writer/reader tests and a regression test for segment count fields.
db/datastruct/fusefilter/fusefilter_writer.go Refactors header feature init; implements WriterSharded and shared serialization helper.
db/datastruct/fusefilter/fusefilter_reader.go Refactors header parsing; implements ReaderSharded.
Comments suppressed due to low confidence (1)

db/datastruct/fusefilter/fusefilter_reader.go:100

  • NewReaderOnBytes slices m[:headerSize] and later data[:fingerprintsLen] without any length checks. A truncated/corrupted fusefilter blob (or a shard blob inside NewReaderShardedOnBytes) will panic with an out-of-bounds slice instead of returning an error. Please add upfront bounds checks (len(m) >= headerSize, and headerSize+fingerprintsLen <= len(m)) and return a descriptive error when the blob is too small or claims an impossible fingerprints length.
func NewReaderOnBytes(m []byte, fName string) (*Reader, int, error) {
	filter := &xorfilter.BinaryFuse[uint8]{}

	const headerSize = filterBlobHeaderSize
	header, data := m[:headerSize], m[headerSize:]

	v, features, err := parseHeaderFeatures(header, fName)
	if err != nil {
		return nil, 0, err
	}

	filter.SegmentCount = binary.BigEndian.Uint32(header[4:])
	filter.SegmentCountLength = binary.BigEndian.Uint32(header[4+4:])
	filter.Seed = binary.BigEndian.Uint64(header[4+4+4:])
	filter.SegmentLength = binary.BigEndian.Uint32(header[4+4+4+8:])
	filter.SegmentLengthMask = binary.BigEndian.Uint32(header[4+4+4+8+4:])
	fingerprintsLen := int(binary.BigEndian.Uint64(header[4+4+4+8+4+4:]))

	filter.Fingerprints = data[:fingerprintsLen]
	return &Reader{inner: filter, version: v, features: features, m: m}, headerSize + fingerprintsLen, nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread db/datastruct/fusefilter/fusefilter_reader.go Outdated
Comment thread db/datastruct/fusefilter/fusefilter_reader.go Outdated
@JkLondon JkLondon changed the title [wip] recsplit: sharded FuseFilter recsplit: sharded FuseFilter May 12, 2026
@JkLondon JkLondon enabled auto-merge May 12, 2026 09:26
@JkLondon JkLondon added this pull request to the merge queue May 12, 2026
@JkLondon JkLondon self-assigned this May 12, 2026
Merged via the queue into main with commit be36d4f May 12, 2026
37 checks passed
@JkLondon JkLondon deleted the alex/sharded_fuse2_35 branch May 12, 2026 10:43
JkLondon pushed a commit that referenced this pull request May 13, 2026
Fixes #20560.

Problem: 
bloatnet 3B keys .efi file building: FuseFIlter used `t2hash=29GB` and
`reverseOrder=26GB`
we can't mmap FuseFilter without forking it. But maybe it's okey. Maybe
it's better to shard FuseFilter rather than mmap large thing. Maybe
sharding will improve data-locality.

--
Solution: 
shard `fusefilter` by first byte of `keyHash`. it will reduce building
allocation buffers 256x times. And it will not add reading overhead.
(`keyHash>>56` is not same as `byte(keyHash)`. it's `hi/lo` bytes.)

- Writer: re-using 1 `xorfilter.BuildBinaryFuse` to build all shards
- `ShardedReader` is a `[256]Reader`. Lookup:
`shards[keyHash>>56].ContainsHash(hash)`

Depends on: #20722

---------

Co-authored-by: JkLondon <me@ilyamikheev.com>
AskAlexSharov added a commit that referenced this pull request May 13, 2026
Cherry-pick of
[be36d4f](be36d4f)
(original PR #20644) onto `performance`.

## Summary
- Shards `fusefilter` by `keyHash>>56` (256 shards) to cut FuseFilter
build-time RAM ~256× on large `.efi` files (3B-key bloatnet case).
- Writer reuses one `xorfilter.BuildBinaryFuse` to build all shards;
`ShardedReader` is a `[256]Reader` with lookup
`shards[keyHash>>56].ContainsHash(hash)`.
- Adds `ExistenceFilterVersion = 2` (sharded format), keeps v1
(monolithic) and v0 (legacy) intact. `recsplit.RecSplitArgs.Version` is
now `version.DataStructureVersion` (was `uint8`).
- Wires `snaptype.BuildIndex` to pick `recsplit.ExistenceFilterVersion`
whenever the index version is not `v1.0` (v1.0 keeps inner v=0 for
compat).
- Brings the full new test set: sharded writer/reader tests, corruption
tests, fuzz tests, real-file tests, and benchmarks +
`scripts/fusefilter-bench/`.

## Cherry-pick status
Clean cherry-pick. No manual conflict resolutions required —
`performance` already has all the prerequisites the original commit
assumed:
- `xorfilter v0.5.1` (provides `MakeBinaryFuseBuilder` /
`BuildBinaryFuse`)
- Package-scope `headerSize` in
`db/datastruct/fusefilter/fusefilter_reader.go`
- `offsetFile *os.File` / `offsetWriter *bufio.Writer` in
`recsplit.RecSplit` (from PR #20722)

Auto-merge handled minor textual deltas in `.gitignore`,
`fusefilter_reader.go`, `recsplit/index.go`, and `db/state/domain.go`
without conflicts.

## Test plan
- [ ] `go build` of affected packages: passes locally
(`db/recsplit/...`, `db/datastruct/fusefilter/...`, `db/snaptype/...`)
- [ ] `go test ./db/datastruct/fusefilter/... ./db/recsplit/...`
- [ ] CI on the branch

> Note: `./db/state/statecfg/...` fails to build with `undefined:
InitSchemas` on `origin/performance` independently of this PR —
pre-existing issue on the base branch, unrelated to the cherry-pick.

## References
- Original PR: #20644
- Original issue: #20560

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Alex Sharov <AskAlexSharov@gmail.com>
Co-authored-by: JkLondon <me@ilyamikheev.com>
pull Bot pushed a commit to Dustin4444/erigon that referenced this pull request Jun 13, 2026
## Summary

Re-enable the sharded FuseFilter existence-filter format by setting
`ExistenceFilterVersion` back from `1` to `2` in
`db/recsplit/recsplit.go`. This is the switch that erigontech#21164 turned off
after erigontech#20644 introduced the format. New index files (`.kvi`/`.efi`) will
again embed the sharded BinaryFuse filter (256 shards by `keyHash>>56`),
which cuts existence-filter build-time RAM ~256× (the motivation was
bloatnet 3B-key `.efi` builds).

## Why it's safe

- **Read path is version-aware.** `db/recsplit/index.go` reads the
embedded `dataStructureVersion` per file and dispatches to v0/v1/v2
readers, so existing v1 files stay readable and new v2 files are read by
the matching `case 2` path.
- **Accessor/existence files are built locally, not seeded.**
`SeedableV3Extensions()` publishes only `.kv/.v/.ef/.ap`;
`.kvi/.efi/.kvei` are rebuilt locally via `BuildMissedAccessors`, so
this does not affect the shared snapshot set.

## Test plan
- [ ] CI green

Co-authored-by: JkLondon <me@ilyamikheev.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[performance] OOM on storage .ef indexing

4 participants