recsplit: sharded FuseFilter (cherry-pick #20644)#21144
Merged
Conversation
Collaborator
|
@wmitsuda base branch must be |
Member
|
Fixes #20560. Problem: bloatnet 3B keys .efi file building: FuseFIlter used `t2hash=29GB` and `reverseOrder=26GB` we can't mmap FuseFilter without forking it. But maybe it's okey. Maybe it's better to shard FuseFilter rather than mmap large thing. Maybe sharding will improve data-locality. -- Solution: shard `fusefilter` by first byte of `keyHash`. it will reduce building allocation buffers 256x times. And it will not add reading overhead. (`keyHash>>56` is not same as `byte(keyHash)`. it's `hi/lo` bytes.) - Writer: re-using 1 `xorfilter.BuildBinaryFuse` to build all shards - `ShardedReader` is a `[256]Reader`. Lookup: `shards[keyHash>>56].ContainsHash(hash)` Depends on: #20722 --------- Co-authored-by: JkLondon <me@ilyamikheev.com>
8972630 to
7f4f2ab
Compare
AskAlexSharov
approved these changes
May 13, 2026
Member
|
@AskAlexSharov @JkLondon is it safe to rollout to 3.4 bloatnet? I was planning to wait to merge this one once the 3.6 snapshotter is ready |
Collaborator
|
pr is good - will prevent oom during |
Member
|
ok, but I'm not rolling out this to 3.4 bc I'm finishing testing with the previous commit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Cherry-pick of be36d4fc16 (original PR #20644) onto
performance.Summary
fusefilterbykeyHash>>56(256 shards) to cut FuseFilter build-time RAM ~256× on large.efifiles (3B-key bloatnet case).xorfilter.BuildBinaryFuseto build all shards;ShardedReaderis a[256]Readerwith lookupshards[keyHash>>56].ContainsHash(hash).ExistenceFilterVersion = 2(sharded format), keeps v1 (monolithic) and v0 (legacy) intact.recsplit.RecSplitArgs.Versionis nowversion.DataStructureVersion(wasuint8).snaptype.BuildIndexto pickrecsplit.ExistenceFilterVersionwhenever the index version is notv1.0(v1.0 keeps inner v=0 for compat).scripts/fusefilter-bench/.Cherry-pick status
Clean cherry-pick. No manual conflict resolutions required —
performancealready has all the prerequisites the original commit assumed:xorfilter v0.5.1(providesMakeBinaryFuseBuilder/BuildBinaryFuse)headerSizeindb/datastruct/fusefilter/fusefilter_reader.gooffsetFile *os.File/offsetWriter *bufio.Writerinrecsplit.RecSplit(from PR db/version: enforce upper-bound file version check #20722)Auto-merge handled minor textual deltas in
.gitignore,fusefilter_reader.go,recsplit/index.go, anddb/state/domain.gowithout conflicts.Test plan
go buildof affected packages: passes locally (db/recsplit/...,db/datastruct/fusefilter/...,db/snaptype/...)go test ./db/datastruct/fusefilter/... ./db/recsplit/...References
🤖 Generated with Claude Code