Skip to content

storage: consider removing readahead in MVCCIterate #66828

@erikgrinaker

Description

@erikgrinaker

In #66682 we saw significant performance problems in Raft causing constant elections and stuck ranges. It turned out this was caused by the 1000 key readahead in MVCCIterate:

cockroach/pkg/storage/mvcc.go

Lines 2662 to 2666 in 8f5231d

const maxKeysPerScan = 1000
opts := opts
opts.MaxKeys = maxKeysPerScan
res, err := mvccScanToKvs(
ctx, iter, key, endKey, timestamp, opts)

The caller had passed an iterator function that was supposed to terminate early (in that case after the first entry), but this is only called after the 1000 entries have been fetched:

cockroach/pkg/storage/mvcc.go

Lines 2679 to 2686 in 8f5231d

for i := range res.KVs {
if err := f(res.KVs[i]); err != nil {
if iterutil.Done(err) {
return intents, nil
}
return nil, err
}
}

In #66816 @nvanbenschoten notes:

I considered removing the readahead behavior in MVCCIterate entirely. I think it might have been added to avoid repeat CGo hops back when we had RocksDB. But now that we have Pebble and can manipulate an iterator cheaply, it can maybe be rejected. However, this would then require us to lift some logic from pebbleMVCCScanner into MVCCIterate, which I wasn't keen on doing.

@petermattis adds:

+1 to doing this. Or to removing MVCCIterate entirely. Now that we don't have to minimize cgo crossings I don't see the point of MVCCIterate vs using an MVCCIterator.

Jira issue: CRDB-8248

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-storageRelating to our storage engine (Pebble) on-disk storage.C-performancePerf of queries or internals. Solution not expected to change functional behavior.T-kvKV Team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions