db: add NextPrefix method to Iterator #1387
db: add NextPrefix method to Iterator #1387jbowens wants to merge 1 commit intocockroachdb:masterfrom
Conversation
jbowens
left a comment
There was a problem hiding this comment.
I need to think through testing. The Pebble metamorphic tests don't use suffixed keys and Pebble's behavior will increasingly be dependent on suffixes.
Reviewable status: 0 of 15 files reviewed, all discussions resolved
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 5 of 15 files at r1.
Reviewable status: 4 of 15 files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)
sstable/block.go, line 855 at r1 (raw file):
// number of bytes shared is greater than currentPrefixLen, then some of // the suffix is shared and this key must have the same prefix as the // key that the user last saw. In that case, we continue in the loop.
I think we still have a problem. The continuing issue is that the encoding of prefix@suffix can have information about the positioning and length of the suffix anywhere in the userkey, and we have no visibility into that without calling Split or Compare on the assembled key.
Say Compare was sorting in increasing lexicographic order of the prefix, and the part after the + is the suffix (byte concatenation):
b + \0
b\0 + \0\0...
currentPrefixLen will be 1 after the first key, and since the next key shares a byte prefix >= currentPrefixLen we will skip it, even though it is the NextPrefix.
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: 4 of 15 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)
sstable/block.go, line 855 at r1 (raw file):
Previously, sumeerbhola wrote…
I think we still have a problem. The continuing issue is that the encoding of prefix@suffix can have information about the positioning and length of the suffix anywhere in the userkey, and we have no visibility into that without calling Split or Compare on the assembled key.
Say Compare was sorting in increasing lexicographic order of the prefix, and the part after the + is the suffix (byte concatenation):
b + \0
b\0 + \0\0...currentPrefixLen will be 1 after the first key, and since the next key shares a byte prefix >= currentPrefixLen we will skip it, even though it is the NextPrefix.
Yeah, I think we still need the fix I mentioned in #1385: I think we need to Split(i.ikey.UserKey) and only skip over the key if Split(i.ikey.UserKey) == currentPrefixLen. I'll work it in and re-run the benchmarks this flex friday
nicktrav
left a comment
There was a problem hiding this comment.
Reviewed 15 of 15 files at r1, 2 of 2 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)
a1b639f to
dee4b20
Compare
jbowens
left a comment
There was a problem hiding this comment.
Reviewable status: 0 of 19 files reviewed, 1 unresolved discussion (waiting on @nicktrav and @sumeerbhola)
sstable/block.go, line 855 at r1 (raw file):
Previously, jbowens (Jackson Owens) wrote…
Yeah, I think we still need the fix I mentioned in #1385: I think we need to
Split(i.ikey.UserKey)and only skip over the key ifSplit(i.ikey.UserKey) == currentPrefixLen. I'll work it in and re-run the benchmarks this flex friday
I updated this to split the user key before skipping to ensure that the key has the same prefix.
This comment has been minimized.
This comment has been minimized.
e8ec22e to
4101090
Compare
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. Here's a benchmark comparison of two versions of the benchmark. On the 'old' benchmark, there is no NextPrefix optimization and the benchmark manually checks for a new unique prefix. In the 'new' benchmark, the benchmark uses NextPrefix and omits the manual checks since NextPrefix guarantees the next returned key will be a new prefix: ``` name old time/op new time/op delta NextPrefix/versions=1-24 135µs ± 0% 130µs ± 1% -4.25% (p=0.000 n=10+10) NextPrefix/versions=2-24 313µs ± 1% 154µs ± 0% -50.97% (p=0.000 n=10+10) NextPrefix/versions=10-24 1.71ms ± 0% 0.35ms ± 1% -79.70% (p=0.000 n=10+10) NextPrefix/versions=100-24 12.0ms ± 3% 2.6ms ± 2% -78.37% (p=0.000 n=10+9) NextPrefix/versions=1000-24 142ms ± 1% 53ms ± 1% -62.72% (p=0.000 n=10+10) name old alloc/op new alloc/op delta NextPrefix/versions=1-24 10.0B ± 0% 10.0B ± 0% ~ (all equal) NextPrefix/versions=2-24 13.0B ± 0% 10.0B ± 0% -23.08% (p=0.002 n=8+10) NextPrefix/versions=10-24 35.4B ± 2% 13.0B ± 0% -63.28% (p=0.000 n=10+10) NextPrefix/versions=100-24 137B ±29% 36B ± 0% -73.66% (p=0.000 n=10+8) NextPrefix/versions=1000-24 3.26kB ± 2% 1.24kB ± 4% -61.92% (p=0.000 n=9+8) name old allocs/op new allocs/op delta NextPrefix/versions=1-24 1.00 ± 0% 1.00 ± 0% ~ (all equal) NextPrefix/versions=2-24 1.00 ± 0% 1.00 ± 0% ~ (all equal) NextPrefix/versions=10-24 1.00 ± 0% 1.00 ± 0% ~ (all equal) NextPrefix/versions=100-24 1.40 ±43% 1.00 ± 0% ~ (p=0.087 n=10+10) NextPrefix/versions=1000-24 28.4 ± 5% 11.0 ± 0% -61.23% (p=0.000 n=8+8) ```
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit introduces the new positioning method and relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes.
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit introduces the new positioning method and relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes.
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit does not include it. In the case of a single key per-prefix, it relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes. When there are many keys per-prefix, the performance of a new NextPrefix is substantial. Previously, callers would need to SeekGE past the previous key. This would force repositioning all the levels of the iterator. Now the dedicated NextPrefix method is able to use the current iterator position to avoid a seek from scratch, and avoid repositioning any levels that are already positioned at the next prefix. Additionally, during combined iteration over a keyspace covered by a range key, NextPrefix does not need to interleave range keys at the seek key and instead will only interleave range keys if the iterator steps onto a new one. There may be future optimizations that we can make by comparising suffixes first in order to elide prefix comparisons. We don't have suitable benchmarks for measuring those optimizations yet, so they're left for future investigations. ``` name old time/op new time/op delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 3.90µs ± 2% 3.95µs ± 2% ~ (p=0.113 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 3.99µs ± 0% 4.01µs ± 1% +0.56% (p=0.000 n=10+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 4.78µs ± 2% 4.73µs ± 5% ~ (p=0.190 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 8.57µs ± 2% 7.87µs ± 2% -8.13% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 36.4µs ± 2% 31.6µs ± 2% -12.99% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 22.2µs ± 0% 21.5µs ± 0% -3.13% (p=0.000 n=7+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 27.7µs ± 0% 25.8µs ± 0% -6.68% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 64.9µs ± 1% 52.2µs ± 0% -19.58% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 177µs ± 1% 119µs ± 1% -32.57% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 1.21ms ± 3% 0.62ms ± 4% -49.04% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 1.35ms ± 1% 1.35ms ± 0% -0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 1.85ms ± 2% 1.71ms ± 1% -7.56% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 5.70ms ± 4% 4.18ms ± 2% -26.62% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 17.6ms ± 5% 10.7ms ± 4% -38.96% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 145ms ±30% 59ms ±21% -59.36% (p=0.000 n=10+10) name old speed new speed delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 16.4MB/s ± 2% 16.2MB/s ± 2% ~ (p=0.108 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 16.0MB/s ± 0% 15.9MB/s ± 1% -0.52% (p=0.000 n=9+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 13.4MB/s ± 2% 13.5MB/s ± 5% ~ (p=0.184 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 7.47MB/s ± 2% 8.13MB/s ± 2% +8.82% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 1.76MB/s ± 2% 2.02MB/s ± 2% +14.84% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 289MB/s ± 0% 298MB/s ± 0% +3.27% (p=0.000 n=8+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 231MB/s ± 0% 248MB/s ± 0% +7.15% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 98.6MB/s ± 1% 122.7MB/s ± 0% +24.34% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 36.1MB/s ± 1% 53.6MB/s ± 1% +48.31% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 5.29MB/s ± 3% 10.39MB/s ± 4% +96.28% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 474MB/s ± 1% 476MB/s ± 0% +0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 345MB/s ± 2% 373MB/s ± 1% +8.17% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 112MB/s ± 5% 153MB/s ± 2% +36.23% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 36.4MB/s ± 5% 59.7MB/s ± 4% +63.93% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 4.33MB/s ±21% 11.04MB/s ±22% +155.03% (p=0.000 n=9+10) ```
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit does not include it. In the case of a single key per-prefix, it relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes. When there are many keys per-prefix, the performance of a new NextPrefix is substantial. Previously, callers would need to SeekGE past the previous key. This would force repositioning all the levels of the iterator. Now the dedicated NextPrefix method is able to use the current iterator position to avoid a seek from scratch, and avoid repositioning any levels that are already positioned at the next prefix. Additionally, during combined iteration over a keyspace covered by a range key, NextPrefix does not need to interleave range keys at the seek key and instead will only interleave range keys if the iterator steps onto a new one. There may be future optimizations that we can make by comparising suffixes first in order to elide prefix comparisons. We don't have suitable benchmarks for measuring those optimizations yet, so they're left for future investigations. ``` name old time/op new time/op delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 3.90µs ± 2% 3.95µs ± 2% ~ (p=0.113 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 3.99µs ± 0% 4.01µs ± 1% +0.56% (p=0.000 n=10+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 4.78µs ± 2% 4.73µs ± 5% ~ (p=0.190 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 8.57µs ± 2% 7.87µs ± 2% -8.13% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 36.4µs ± 2% 31.6µs ± 2% -12.99% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 22.2µs ± 0% 21.5µs ± 0% -3.13% (p=0.000 n=7+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 27.7µs ± 0% 25.8µs ± 0% -6.68% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 64.9µs ± 1% 52.2µs ± 0% -19.58% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 177µs ± 1% 119µs ± 1% -32.57% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 1.21ms ± 3% 0.62ms ± 4% -49.04% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 1.35ms ± 1% 1.35ms ± 0% -0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 1.85ms ± 2% 1.71ms ± 1% -7.56% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 5.70ms ± 4% 4.18ms ± 2% -26.62% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 17.6ms ± 5% 10.7ms ± 4% -38.96% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 145ms ±30% 59ms ±21% -59.36% (p=0.000 n=10+10) name old speed new speed delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 16.4MB/s ± 2% 16.2MB/s ± 2% ~ (p=0.108 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 16.0MB/s ± 0% 15.9MB/s ± 1% -0.52% (p=0.000 n=9+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 13.4MB/s ± 2% 13.5MB/s ± 5% ~ (p=0.184 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 7.47MB/s ± 2% 8.13MB/s ± 2% +8.82% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 1.76MB/s ± 2% 2.02MB/s ± 2% +14.84% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 289MB/s ± 0% 298MB/s ± 0% +3.27% (p=0.000 n=8+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 231MB/s ± 0% 248MB/s ± 0% +7.15% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 98.6MB/s ± 1% 122.7MB/s ± 0% +24.34% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 36.1MB/s ± 1% 53.6MB/s ± 1% +48.31% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 5.29MB/s ± 3% 10.39MB/s ± 4% +96.28% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 474MB/s ± 1% 476MB/s ± 0% +0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 345MB/s ± 2% 373MB/s ± 1% +8.17% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 112MB/s ± 5% 153MB/s ± 2% +36.23% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 36.4MB/s ± 5% 59.7MB/s ± 4% +63.93% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 4.33MB/s ±21% 11.04MB/s ±22% +155.03% (p=0.000 n=9+10) ```
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit does not include it. In the case of a single key per-prefix, it relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes. When there are many keys per-prefix, the performance of a new NextPrefix is substantial. Previously, callers would need to SeekGE past the previous key. This would force repositioning all the levels of the iterator. Now the dedicated NextPrefix method is able to use the current iterator position to avoid a seek from scratch, and avoid repositioning any levels that are already positioned at the next prefix. Additionally, during combined iteration over a keyspace covered by a range key, NextPrefix does not need to interleave range keys at the seek key and instead will only interleave range keys if the iterator steps onto a new one. There may be future optimizations that we can make by comparising suffixes first in order to elide prefix comparisons. We don't have suitable benchmarks for measuring those optimizations yet, so they're left for future investigations. ``` name old time/op new time/op delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 3.90µs ± 2% 3.95µs ± 2% ~ (p=0.113 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 3.99µs ± 0% 4.01µs ± 1% +0.56% (p=0.000 n=10+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 4.78µs ± 2% 4.73µs ± 5% ~ (p=0.190 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 8.57µs ± 2% 7.87µs ± 2% -8.13% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 36.4µs ± 2% 31.6µs ± 2% -12.99% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 22.2µs ± 0% 21.5µs ± 0% -3.13% (p=0.000 n=7+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 27.7µs ± 0% 25.8µs ± 0% -6.68% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 64.9µs ± 1% 52.2µs ± 0% -19.58% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 177µs ± 1% 119µs ± 1% -32.57% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 1.21ms ± 3% 0.62ms ± 4% -49.04% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 1.35ms ± 1% 1.35ms ± 0% -0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 1.85ms ± 2% 1.71ms ± 1% -7.56% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 5.70ms ± 4% 4.18ms ± 2% -26.62% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 17.6ms ± 5% 10.7ms ± 4% -38.96% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 145ms ±30% 59ms ±21% -59.36% (p=0.000 n=10+10) name old speed new speed delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 16.4MB/s ± 2% 16.2MB/s ± 2% ~ (p=0.108 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 16.0MB/s ± 0% 15.9MB/s ± 1% -0.52% (p=0.000 n=9+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 13.4MB/s ± 2% 13.5MB/s ± 5% ~ (p=0.184 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 7.47MB/s ± 2% 8.13MB/s ± 2% +8.82% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 1.76MB/s ± 2% 2.02MB/s ± 2% +14.84% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 289MB/s ± 0% 298MB/s ± 0% +3.27% (p=0.000 n=8+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 231MB/s ± 0% 248MB/s ± 0% +7.15% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 98.6MB/s ± 1% 122.7MB/s ± 0% +24.34% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 36.1MB/s ± 1% 53.6MB/s ± 1% +48.31% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 5.29MB/s ± 3% 10.39MB/s ± 4% +96.28% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 474MB/s ± 1% 476MB/s ± 0% +0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 345MB/s ± 2% 373MB/s ± 1% +8.17% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 112MB/s ± 5% 153MB/s ± 2% +36.23% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 36.4MB/s ± 5% 59.7MB/s ± 4% +63.93% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 4.33MB/s ±21% 11.04MB/s ±22% +155.03% (p=0.000 n=9+10) ```
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit does not include it. In the case of a single key per-prefix, it relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes. When there are many keys per-prefix, the performance of a new NextPrefix is substantial. Previously, callers would need to SeekGE past the previous key. This would force repositioning all the levels of the iterator. Now the dedicated NextPrefix method is able to use the current iterator position to avoid a seek from scratch, and avoid repositioning any levels that are already positioned at the next prefix. Additionally, during combined iteration over a keyspace covered by a range key, NextPrefix does not need to interleave range keys at the seek key and instead will only interleave range keys if the iterator steps onto a new one. There may be future optimizations that we can make by comparising suffixes first in order to elide prefix comparisons. We don't have suitable benchmarks for measuring those optimizations yet, so they're left for future investigations. ``` name old time/op new time/op delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 3.90µs ± 2% 3.95µs ± 2% ~ (p=0.113 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 3.99µs ± 0% 4.01µs ± 1% +0.56% (p=0.000 n=10+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 4.78µs ± 2% 4.73µs ± 5% ~ (p=0.190 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 8.57µs ± 2% 7.87µs ± 2% -8.13% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 36.4µs ± 2% 31.6µs ± 2% -12.99% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 22.2µs ± 0% 21.5µs ± 0% -3.13% (p=0.000 n=7+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 27.7µs ± 0% 25.8µs ± 0% -6.68% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 64.9µs ± 1% 52.2µs ± 0% -19.58% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 177µs ± 1% 119µs ± 1% -32.57% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 1.21ms ± 3% 0.62ms ± 4% -49.04% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 1.35ms ± 1% 1.35ms ± 0% -0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 1.85ms ± 2% 1.71ms ± 1% -7.56% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 5.70ms ± 4% 4.18ms ± 2% -26.62% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 17.6ms ± 5% 10.7ms ± 4% -38.96% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 145ms ±30% 59ms ±21% -59.36% (p=0.000 n=10+10) name old speed new speed delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 16.4MB/s ± 2% 16.2MB/s ± 2% ~ (p=0.108 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 16.0MB/s ± 0% 15.9MB/s ± 1% -0.52% (p=0.000 n=9+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 13.4MB/s ± 2% 13.5MB/s ± 5% ~ (p=0.184 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 7.47MB/s ± 2% 8.13MB/s ± 2% +8.82% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 1.76MB/s ± 2% 2.02MB/s ± 2% +14.84% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 289MB/s ± 0% 298MB/s ± 0% +3.27% (p=0.000 n=8+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 231MB/s ± 0% 248MB/s ± 0% +7.15% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 98.6MB/s ± 1% 122.7MB/s ± 0% +24.34% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 36.1MB/s ± 1% 53.6MB/s ± 1% +48.31% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 5.29MB/s ± 3% 10.39MB/s ± 4% +96.28% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 474MB/s ± 1% 476MB/s ± 0% +0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 345MB/s ± 2% 373MB/s ± 1% +8.17% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 112MB/s ± 5% 153MB/s ± 2% +36.23% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 36.4MB/s ± 5% 59.7MB/s ± 4% +63.93% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 4.33MB/s ±21% 11.04MB/s ±22% +155.03% (p=0.000 n=9+10) ```
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit does not include it. In the case of a single key per-prefix, it relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes. When there are many keys per-prefix, the performance of a new NextPrefix is substantial. Previously, callers would need to SeekGE past the previous key. This would force repositioning all the levels of the iterator. Now the dedicated NextPrefix method is able to use the current iterator position to avoid a seek from scratch, and avoid repositioning any levels that are already positioned at the next prefix. Additionally, during combined iteration over a keyspace covered by a range key, NextPrefix does not need to interleave range keys at the seek key and instead will only interleave range keys if the iterator steps onto a new one. There may be future optimizations that we can make by comparising suffixes first in order to elide prefix comparisons. We don't have suitable benchmarks for measuring those optimizations yet, so they're left for future investigations. ``` name old time/op new time/op delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 3.90µs ± 2% 3.95µs ± 2% ~ (p=0.113 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 3.99µs ± 0% 4.01µs ± 1% +0.56% (p=0.000 n=10+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 4.78µs ± 2% 4.73µs ± 5% ~ (p=0.190 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 8.57µs ± 2% 7.87µs ± 2% -8.13% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 36.4µs ± 2% 31.6µs ± 2% -12.99% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 22.2µs ± 0% 21.5µs ± 0% -3.13% (p=0.000 n=7+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 27.7µs ± 0% 25.8µs ± 0% -6.68% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 64.9µs ± 1% 52.2µs ± 0% -19.58% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 177µs ± 1% 119µs ± 1% -32.57% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 1.21ms ± 3% 0.62ms ± 4% -49.04% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 1.35ms ± 1% 1.35ms ± 0% -0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 1.85ms ± 2% 1.71ms ± 1% -7.56% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 5.70ms ± 4% 4.18ms ± 2% -26.62% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 17.6ms ± 5% 10.7ms ± 4% -38.96% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 145ms ±30% 59ms ±21% -59.36% (p=0.000 n=10+10) name old speed new speed delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 16.4MB/s ± 2% 16.2MB/s ± 2% ~ (p=0.108 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 16.0MB/s ± 0% 15.9MB/s ± 1% -0.52% (p=0.000 n=9+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 13.4MB/s ± 2% 13.5MB/s ± 5% ~ (p=0.184 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 7.47MB/s ± 2% 8.13MB/s ± 2% +8.82% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 1.76MB/s ± 2% 2.02MB/s ± 2% +14.84% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 289MB/s ± 0% 298MB/s ± 0% +3.27% (p=0.000 n=8+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 231MB/s ± 0% 248MB/s ± 0% +7.15% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 98.6MB/s ± 1% 122.7MB/s ± 0% +24.34% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 36.1MB/s ± 1% 53.6MB/s ± 1% +48.31% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 5.29MB/s ± 3% 10.39MB/s ± 4% +96.28% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 474MB/s ± 1% 476MB/s ± 0% +0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 345MB/s ± 2% 373MB/s ± 1% +8.17% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 112MB/s ± 5% 153MB/s ± 2% +36.23% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 36.4MB/s ± 5% 59.7MB/s ± 4% +63.93% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 4.33MB/s ±21% 11.04MB/s ±22% +155.03% (p=0.000 n=9+10) ```
Expose a NextPrefix method on *pebble.Iterator that may be called when the user wishes to move to the next key prefix (as determined by Comparer.Split). NextPrefix guarantees that it will move to the next key prefix. I'd like to revisit the optimization from cockroachdb#1387, but for now this commit does not include it. In the case of a single key per-prefix, it relies on the slight optimization of only needing to compare prefixes, instead of finding each new user key using user key comparisons and comparing their prefixes. When there are many keys per-prefix, the performance of a new NextPrefix is substantial. Previously, callers would need to SeekGE past the previous key. This would force repositioning all the levels of the iterator. Now the dedicated NextPrefix method is able to use the current iterator position to avoid a seek from scratch, and avoid repositioning any levels that are already positioned at the next prefix. Additionally, during combined iteration over a keyspace covered by a range key, NextPrefix does not need to interleave range keys at the seek key and instead will only interleave range keys if the iterator steps onto a new one. There may be future optimizations that we can make by comparising suffixes first in order to elide prefix comparisons. We don't have suitable benchmarks for measuring those optimizations yet, so they're left for future investigations. ``` name old time/op new time/op delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 3.90µs ± 2% 3.95µs ± 2% ~ (p=0.113 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 3.99µs ± 0% 4.01µs ± 1% +0.56% (p=0.000 n=10+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 4.78µs ± 2% 4.73µs ± 5% ~ (p=0.190 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 8.57µs ± 2% 7.87µs ± 2% -8.13% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 36.4µs ± 2% 31.6µs ± 2% -12.99% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 22.2µs ± 0% 21.5µs ± 0% -3.13% (p=0.000 n=7+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 27.7µs ± 0% 25.8µs ± 0% -6.68% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 64.9µs ± 1% 52.2µs ± 0% -19.58% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 177µs ± 1% 119µs ± 1% -32.57% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 1.21ms ± 3% 0.62ms ± 4% -49.04% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 1.35ms ± 1% 1.35ms ± 0% -0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 1.85ms ± 2% 1.71ms ± 1% -7.56% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 5.70ms ± 4% 4.18ms ± 2% -26.62% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 17.6ms ± 5% 10.7ms ± 4% -38.96% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 145ms ±30% 59ms ±21% -59.36% (p=0.000 n=10+10) name old speed new speed delta MVCCScan_Pebble/rows=1/versions=1/valueSize=64/numRangeKeys=0-10 16.4MB/s ± 2% 16.2MB/s ± 2% ~ (p=0.108 n=9+10) MVCCScan_Pebble/rows=1/versions=2/valueSize=64/numRangeKeys=0-10 16.0MB/s ± 0% 15.9MB/s ± 1% -0.52% (p=0.000 n=9+9) MVCCScan_Pebble/rows=1/versions=10/valueSize=64/numRangeKeys=0-10 13.4MB/s ± 2% 13.5MB/s ± 5% ~ (p=0.184 n=10+10) MVCCScan_Pebble/rows=1/versions=100/valueSize=64/numRangeKeys=0-10 7.47MB/s ± 2% 8.13MB/s ± 2% +8.82% (p=0.000 n=10+10) MVCCScan_Pebble/rows=1/versions=1000/valueSize=64/numRangeKeys=0-10 1.76MB/s ± 2% 2.02MB/s ± 2% +14.84% (p=0.000 n=10+9) MVCCScan_Pebble/rows=100/versions=1/valueSize=64/numRangeKeys=0-10 289MB/s ± 0% 298MB/s ± 0% +3.27% (p=0.000 n=8+10) MVCCScan_Pebble/rows=100/versions=2/valueSize=64/numRangeKeys=0-10 231MB/s ± 0% 248MB/s ± 0% +7.15% (p=0.000 n=9+8) MVCCScan_Pebble/rows=100/versions=10/valueSize=64/numRangeKeys=0-10 98.6MB/s ± 1% 122.7MB/s ± 0% +24.34% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=100/valueSize=64/numRangeKeys=0-10 36.1MB/s ± 1% 53.6MB/s ± 1% +48.31% (p=0.000 n=10+10) MVCCScan_Pebble/rows=100/versions=1000/valueSize=64/numRangeKeys=0-10 5.29MB/s ± 3% 10.39MB/s ± 4% +96.28% (p=0.000 n=10+8) MVCCScan_Pebble/rows=10000/versions=1/valueSize=64/numRangeKeys=0-10 474MB/s ± 1% 476MB/s ± 0% +0.36% (p=0.022 n=10+9) MVCCScan_Pebble/rows=10000/versions=2/valueSize=64/numRangeKeys=0-10 345MB/s ± 2% 373MB/s ± 1% +8.17% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=10/valueSize=64/numRangeKeys=0-10 112MB/s ± 5% 153MB/s ± 2% +36.23% (p=0.000 n=9+10) MVCCScan_Pebble/rows=10000/versions=100/valueSize=64/numRangeKeys=0-10 36.4MB/s ± 5% 59.7MB/s ± 4% +63.93% (p=0.000 n=10+10) MVCCScan_Pebble/rows=10000/versions=1000/valueSize=64/numRangeKeys=0-10 4.33MB/s ±21% 11.04MB/s ±22% +155.03% (p=0.000 n=9+10) ```
Expose a NextPrefix method on *pebble.Iterator that may be called when the user
wishes to move to the next key prefix (as determined by Comparer.Split).
NextPrefix guarantees that it will move to the next key prefix.
Here's a benchmark comparison of two versions of the benchmark. On the 'old'
benchmark, there is no NextPrefix optimization and the benchmark manually
checks for a new unique prefix. In the 'new' benchmark, the benchmark uses
NextPrefix and omits the manual checks since NextPrefix guarantees the next
returned key will be a new prefix: