Skip to content

recsplit: sharded FuseFilter (cherry-pick #20644)#21144

Merged
AskAlexSharov merged 1 commit into
performancefrom
cherry-pick/sharded-fusefilter
May 13, 2026
Merged

recsplit: sharded FuseFilter (cherry-pick #20644)#21144
AskAlexSharov merged 1 commit into
performancefrom
cherry-pick/sharded-fusefilter

Conversation

@JkLondon

@JkLondon JkLondon commented May 12, 2026

Copy link
Copy Markdown
Member

Cherry-pick of be36d4fc16 (original PR #20644) onto performance.

Summary

  • Shards fusefilter by keyHash>>56 (256 shards) to cut FuseFilter build-time RAM ~256× on large .efi files (3B-key bloatnet case).
  • Writer reuses one xorfilter.BuildBinaryFuse to build all shards; ShardedReader is a [256]Reader with lookup shards[keyHash>>56].ContainsHash(hash).
  • Adds ExistenceFilterVersion = 2 (sharded format), keeps v1 (monolithic) and v0 (legacy) intact. recsplit.RecSplitArgs.Version is now version.DataStructureVersion (was uint8).
  • Wires snaptype.BuildIndex to pick recsplit.ExistenceFilterVersion whenever the index version is not v1.0 (v1.0 keeps inner v=0 for compat).
  • Brings the full new test set: sharded writer/reader tests, corruption tests, fuzz tests, real-file tests, and benchmarks + scripts/fusefilter-bench/.

Cherry-pick status

Clean cherry-pick. No manual conflict resolutions required — performance already has all the prerequisites the original commit assumed:

Auto-merge handled minor textual deltas in .gitignore, fusefilter_reader.go, recsplit/index.go, and db/state/domain.go without conflicts.

Test plan

  • go build of affected packages: passes locally (db/recsplit/..., db/datastruct/fusefilter/..., db/snaptype/...)
  • go test ./db/datastruct/fusefilter/... ./db/recsplit/...
  • CI on the branch

Note: ./db/state/statecfg/... fails to build with undefined: InitSchemas on origin/performance independently of this PR — pre-existing issue on the base branch, unrelated to the cherry-pick.

References

🤖 Generated with Claude Code

@JkLondon JkLondon self-assigned this May 12, 2026
@JkLondon JkLondon requested a review from wmitsuda May 12, 2026 14:50
@AskAlexSharov

Copy link
Copy Markdown
Collaborator

@wmitsuda base branch must be performance or wmitsuda/performance?

@wmitsuda

Copy link
Copy Markdown
Member

@wmitsuda base branch must be performance or wmitsuda/performance?

performance pls... wmitsuda/performance might be one of my test branches on some machine

@JkLondon JkLondon changed the base branch from wmitsuda/performance to performance May 13, 2026 09:26
@JkLondon JkLondon requested review from awskii and taratorio as code owners May 13, 2026 09:26
Fixes #20560.

Problem: 
bloatnet 3B keys .efi file building: FuseFIlter used `t2hash=29GB` and
`reverseOrder=26GB`
we can't mmap FuseFilter without forking it. But maybe it's okey. Maybe
it's better to shard FuseFilter rather than mmap large thing. Maybe
sharding will improve data-locality.

--
Solution: 
shard `fusefilter` by first byte of `keyHash`. it will reduce building
allocation buffers 256x times. And it will not add reading overhead.
(`keyHash>>56` is not same as `byte(keyHash)`. it's `hi/lo` bytes.)

- Writer: re-using 1 `xorfilter.BuildBinaryFuse` to build all shards
- `ShardedReader` is a `[256]Reader`. Lookup:
`shards[keyHash>>56].ContainsHash(hash)`

Depends on: #20722

---------

Co-authored-by: JkLondon <me@ilyamikheev.com>
@JkLondon JkLondon force-pushed the cherry-pick/sharded-fusefilter branch from 8972630 to 7f4f2ab Compare May 13, 2026 09:33
@AskAlexSharov AskAlexSharov enabled auto-merge (squash) May 13, 2026 09:38
@AskAlexSharov AskAlexSharov merged commit 0c607d9 into performance May 13, 2026
36 checks passed
@AskAlexSharov AskAlexSharov deleted the cherry-pick/sharded-fusefilter branch May 13, 2026 10:16
@wmitsuda

Copy link
Copy Markdown
Member

@AskAlexSharov @JkLondon is it safe to rollout to 3.4 bloatnet? I was planning to wait to merge this one once the 3.6 snapshotter is ready

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

pr is good - will prevent oom during .efi merge - doesn't require new files
3.6 snapshotter is ready - artem/me will gen new .efi there (after deref)

@wmitsuda

Copy link
Copy Markdown
Member

ok, but I'm not rolling out this to 3.4 bc I'm finishing testing with the previous commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants