-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage,kvserver: Improve CheckSSTCollisions for unsorted imports #80980
Description
CheckSSTCollisions is currently optimized for cases where an ingested SST is narrow relative to the store it is being ingested into. This works okay for sorted imports, as each individual ingested SST will be narrow and consisting of adjacent-ish keys. However, for SSTs produced by the batch sorter if the row source is unsorted, this is no longer true. In those imports, CheckSSTCollisions runtime ends up dominating the runtime of the total import. In the past, we've tried to resolve this issue by using SeekPrefixGE for all CheckSSTCollisions calls (#73514) to take advantage of bloom filters, but that ended up regressing for sorted imports and so got reverted.
We do not need to exclusively pick between prefix seeks and regular seeks. That determination can be made based on the MVCC stats of the ingested SST, as well as the estimated bytes in the engine that overlap with it. We can use the pebble EstimateDiskUsage function to check for the number of bytes in the engine in an arbitrary key range, which would be useful here. If the overlap is low or if the ingested SST is tiny, we fall back to the old/current behaviour. Using usePrefixSeek = estimatedOverlapBytes > 500*uint64(len(sst)) || args.MVCCStats.KeyCount < 100 in a prototype was sufficient to preserve the runtime of sorted imports while significantly speeding up unsorted imports.
As part of this issue, we'd need to theorize and/or prototype a better heuristic to pick out the sorted/unsorted cases apart. In addition to that , more of the below changes need to happen to significantly speed up CheckSSTCollisions with unsorted imports:
- Bloom filters need to be written to all L6 sstables. Currently
SeekPrefixGEis unable to take advantage of bloom filters in large L6 sstables due to the lack of filter blocks. As filter blocks in this level can get relatively large, and we wouldn't want to load L6 filter blocks unless we're doing a lot of repeatedSeekPrefixGEsthat are also likely to not find a key, it would also be helpful to add an IterOption to toggle loading/ignoring of L6 filter blocks. This option could default to ignoring L6 filter blocks even if present, whileCheckSSTCollisionscould opt into reading those as it's an exceptional use where L6 filter blocks would really help.
a) To ease backports to stable releases, we could add a pebble optionParseHookthat merely allows us to enable the writing of L6 filter blocks through a command line--store=...,pebble_options=[Level 6]filter_options=argument. We can advise any users that expect to do unsorted imports to enable this first. This has the benefit of not changing behaviour in a patch release. Currently the same command line option wouldn't work becauseoptions.Parsedoesn't setFilterPolicyunless a relevantParseHookis defined. intentInterleavingIterneeds to avoid seeking the intent/engine key iterator if the MVCC iterator did a prefix seek and returned nothing. We're guaranteed that we won't see an intent if there is no MVCC value, so we can safely skip that step here and save one seek in the most common case of seeks inCheckSSTCollisions.
A prototype with the above changes brought down the import time of a 12.5h import to ~6.5-7h, a nearly 50% reduction. After that 50% reduction, CheckSSTCollisions is a small fraction of total import runtime and compactions / SST batching-and-sorting become the bigger bottlenecks.
Jira issue: CRDB-15411
Epic CRDB-16237