-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: consider removing readahead in MVCCIterate #66828
Description
In #66682 we saw significant performance problems in Raft causing constant elections and stuck ranges. It turned out this was caused by the 1000 key readahead in MVCCIterate:
Lines 2662 to 2666 in 8f5231d
| const maxKeysPerScan = 1000 | |
| opts := opts | |
| opts.MaxKeys = maxKeysPerScan | |
| res, err := mvccScanToKvs( | |
| ctx, iter, key, endKey, timestamp, opts) |
The caller had passed an iterator function that was supposed to terminate early (in that case after the first entry), but this is only called after the 1000 entries have been fetched:
Lines 2679 to 2686 in 8f5231d
| for i := range res.KVs { | |
| if err := f(res.KVs[i]); err != nil { | |
| if iterutil.Done(err) { | |
| return intents, nil | |
| } | |
| return nil, err | |
| } | |
| } |
In #66816 @nvanbenschoten notes:
I considered removing the readahead behavior in MVCCIterate entirely. I think it might have been added to avoid repeat CGo hops back when we had RocksDB. But now that we have Pebble and can manipulate an iterator cheaply, it can maybe be rejected. However, this would then require us to lift some logic from
pebbleMVCCScannerintoMVCCIterate, which I wasn't keen on doing.
@petermattis adds:
+1 to doing this. Or to removing
MVCCIterateentirely. Now that we don't have to minimize cgo crossings I don't see the point ofMVCCIteratevs using anMVCCIterator.
Jira issue: CRDB-8248