storage,kvserver: Improve CheckSSTCollisions for unsorted imports

`CheckSSTCollisions` is currently optimized for cases where an ingested SST is narrow relative to the store it is being ingested into. This works okay for sorted imports, as each individual ingested SST will be narrow and consisting of adjacent-ish keys. However, for SSTs produced by the batch sorter if the row source is unsorted, this is no longer true. In those imports, `CheckSSTCollisions` runtime ends up dominating the runtime of the total import. In the past, we've tried to resolve this issue by using SeekPrefixGE for all `CheckSSTCollisions` calls (#73514) to take advantage of bloom filters, but that ended up regressing for sorted imports and so got reverted.

We do not need to exclusively pick between prefix seeks and regular seeks. That determination can be made based on the MVCC stats of the ingested SST, as well as the estimated bytes in the engine that overlap with it. We can use the pebble `EstimateDiskUsage` function to check for the number of bytes in the engine in an arbitrary key range, which would be useful here. If the overlap is low or if the ingested SST is tiny, we fall back to the old/current behaviour. Using `usePrefixSeek = estimatedOverlapBytes > 500*uint64(len(sst)) || args.MVCCStats.KeyCount < 100` in a prototype was sufficient to preserve the runtime of sorted imports while significantly speeding up unsorted imports.

As part of this issue, we'd need to theorize and/or prototype a better heuristic to pick out the sorted/unsorted cases apart. In addition to that , more of the below changes need to happen to significantly speed up `CheckSSTCollisions` with unsorted imports:

1) Bloom filters need to be written to all L6 sstables. Currently `SeekPrefixGE` is unable to take advantage of bloom filters in large L6 sstables due to the lack of filter blocks. As filter blocks in this level can get relatively large, and we wouldn't want to load L6 filter blocks unless we're doing a lot of repeated `SeekPrefixGEs` that are also likely to not find a key, it would also be helpful to add an IterOption to toggle loading/ignoring of L6 filter blocks. This option could default to ignoring L6 filter blocks even if present, while `CheckSSTCollisions` could opt into reading those as it's an exceptional use where L6 filter blocks would really help.
  a) To ease backports to stable releases, we could add a pebble option `ParseHook` that merely allows us to enable the writing of L6 filter blocks through a command line `--store=...,pebble_options=[Level 6]filter_options=` argument. We can advise any users that expect to do unsorted imports to enable this first. This has the benefit of not changing behaviour in a patch release. Currently the same command line option wouldn't work because `options.Parse` doesn't set `FilterPolicy` unless a relevant `ParseHook` is defined.
2) `intentInterleavingIter` needs to avoid seeking the intent/engine key iterator if the MVCC iterator did a prefix seek and returned nothing. We're guaranteed that we won't see an intent if there is no MVCC value, so we can safely skip that step here and save one seek in the most common case of seeks in `CheckSSTCollisions`.

A prototype with the above changes brought down the import time of a 12.5h import to ~6.5-7h, a nearly 50% reduction. After that 50% reduction, `CheckSSTCollisions` is a small fraction of total import runtime and compactions / SST batching-and-sorting become the bigger bottlenecks.

Jira issue: CRDB-15411

Epic CRDB-16237

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage,kvserver: Improve CheckSSTCollisions for unsorted imports #80980

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

storage,kvserver: Improve CheckSSTCollisions for unsorted imports #80980

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions