-
Notifications
You must be signed in to change notification settings - Fork 4.1k
bulk: adaptively probe into existing data in AddSSTableRequests.checkForKeyCollisions #66410
Description
This is an idea that came from @andreimatei during a conversation about recent large-scale IMPORTs.
During IMPORTs, we repeatedly ingest SSTs through AddSSTableRequests. These requests set the DisallowShadowing flag to true, so they must check that none of the keys in their SST overlap with any of the keys in the destination range:
cockroach/pkg/kv/kvserver/batcheval/cmd_add_sstable.go
Lines 49 to 57 in 9335ade
| // IMPORT INTO should not proceed if any KVs from the SST shadow existing data | |
| // entries - #38044. | |
| var skippedKVStats enginepb.MVCCStats | |
| var err error | |
| if args.DisallowShadowing { | |
| if skippedKVStats, err = checkForKeyCollisions(ctx, readWriter, mvccStartKey, mvccEndKey, args.Data); err != nil { | |
| return result.Result{}, errors.Wrap(err, "checking for key collisions") | |
| } | |
| } |
For IMPORTs where the source is sorted similarly to the destination index, this behavior is fine because the destination keyspace is always empty so the call to checkForKeyCollisions is free. However, for portions of the IMPORT where the source is not sorted like the destination index, as is almost always the case for tables with secondary indexes, this repeat conflict checking on each AddSSTableRequests is quadratic and can dominate the cost of the IMPORT. This is especially true when the IMPORT creates high read amplification in the destination range, further slowing down the scan in AddSSTableRequests.
As an IMPORT continues, it would seem that the ratio of existing to new data would become larger and larger. Eventually, all AddSSTableRequests would begin to set their IngestAsWrites flag. But beyond this, the "efficiency" of each checkForKeyCollisions call would decrease.
It's possible that once we cross some threshold, it would be more efficient to probe into the range to check for conflicts instead of scanning the new data and the existing data in parallel using two iterators. The useful analogy here is to perform a "hash join" instead of a "merge join" once the ratio of existing to new data crosses some threshold. This optimization may be negated to some degree by the fact that we seek each iterator in checkForKeyCollisionsGo instead of calling next on them (a "zig-zag join", following our analogy). But what this still doesn't give us is the ability to use SST bloom filters to ignore most SSTs in the existing data. The ability to use bloom filters would be particularly important once read amplification grows.
So the proposal here is to take the ratio of new/existing keys and maybe the read amplification into account in checkForKeyCollisions and to switch to a probing mode once we determine that it makes sense to do so.
@adityamaru is this something we have considered?
Epic: CRDB-2556