Skip to content

storage: optimize MVCCGarbageCollect for large number of undeleted version#51462

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
ajwerner:ajwerner/optimize-MVCCGarbageCollect-take-2
Jul 16, 2020
Merged

storage: optimize MVCCGarbageCollect for large number of undeleted version#51462
craig[bot] merged 3 commits intocockroachdb:masterfrom
ajwerner:ajwerner/optimize-MVCCGarbageCollect-take-2

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

This is a better-tested take-2 of #51184.

The win is now only pronounced on pebble, as might be expected.

Andrew Werner added 3 commits July 15, 2020 10:00
Not all implementations of Iterator actually support reverse iteration. In
particular, the rocksdb batch does not. This commit augments the interface by
asking implementers to report whether they do indeed support reverse iteration.

Release note: None
…hmark

We don't use this in the real code, why use it in the benchmark?

Release note: None
Prior to this change, MVCCGarbageCollect performed a linear scan of all
versions of a key, not just the versions being garbage collected. Given
the pagination of deleting versions above this call, the linear behavior
can result in quadratic runtime of GC when the number of versions vastly
exceeds the page size. The benchmark results demonstrate the change's
effectiveness.

It's worth noting that for a single key with a single version, the change
has a negative performance impact. I suspect this is due to the allocation
of a key in order to construct the iterator. In cases involving more keys,
I theorize the positive change is due to the fact that now the iterator
is never seeked backwards due to the sorting of the keys. It's worth
noting that since 20.1, the GC queue has been sending keys in the GC
request in reverse order. I anticipate that this sorting is likely a good
thing in that case too.

The stepping optimization seemed important in the microbenchmarks for cases
where most of the data was garbage. Without it, the change had small negative
impact on performance.

```
name                                                                                                     old time/op  new time/op  delta
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1/numVersions=2/deleteVersions=1-24           4.69µs ± 0%  5.12µs ± 1%   +9.12%  (p=0.004 n=5+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1-24         342µs ± 1%   337µs ± 0%   -1.35%  (p=0.004 n=6+5)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=16-24        348µs ± 2%   347µs ± 1%     ~     (p=0.699 n=6+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=32-24        359µs ± 4%   359µs ± 4%     ~     (p=0.699 n=6+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=512-24       572µs ± 2%   580µs ± 3%     ~     (p=0.132 n=6+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1015-24      799µs ± 1%   811µs ± 3%     ~     (p=0.126 n=5+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1023-24      793µs ± 0%   819µs ± 2%   +3.19%  (p=0.004 n=5+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1024/numVersions=2/deleteVersions=1-24        3.38ms ± 2%  3.11ms ± 2%   -8.07%  (p=0.002 n=6+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1-24      433ms ± 2%   374ms ± 2%  -13.70%  (p=0.002 n=6+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16-24     443ms ± 1%   380ms ± 1%  -14.16%  (p=0.004 n=6+5)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=32-24     449ms ± 0%   386ms ± 1%  -14.22%  (p=0.002 n=6+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=512-24    727ms ± 2%   653ms ± 2%  -10.17%  (p=0.002 n=6+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1015-24   1.00s ± 1%   0.95s ± 3%   -5.84%  (p=0.004 n=5+6)
MVCCGarbageCollect/rocksdb/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1023-24   1.02s ± 2%   0.98s ±10%     ~     (p=0.180 n=6+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=2/deleteVersions=1-24            2.09µs ± 3%  2.42µs ± 1%  +15.56%  (p=0.004 n=6+5)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1-24          140µs ± 1%     5µs ± 1%  -96.78%  (p=0.010 n=6+4)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=16-24         142µs ± 1%     9µs ± 6%  -93.59%  (p=0.002 n=6+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=32-24         144µs ± 0%    14µs ± 2%  -90.50%  (p=0.004 n=5+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=512-24        216µs ± 1%   156µs ± 1%  -27.80%  (p=0.004 n=5+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1015-24       290µs ± 1%   303µs ± 1%   +4.52%  (p=0.002 n=6+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1/numVersions=1024/deleteVersions=1023-24       294µs ± 1%   304µs ± 2%   +3.24%  (p=0.002 n=6+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=2/deleteVersions=1-24         1.93ms ± 2%  1.70ms ± 2%  -11.80%  (p=0.002 n=6+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1-24       390ms ± 1%     7ms ±25%  -98.26%  (p=0.008 n=5+5)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=16-24      394ms ± 2%    20ms ±22%  -95.00%  (p=0.002 n=6+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=32-24      391ms ± 1%    38ms ± 7%  -90.38%  (p=0.004 n=6+5)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=512-24     782ms ± 1%   394ms ± 0%  -49.59%  (p=0.008 n=5+5)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1015-24    1.59s ± 4%   0.80s ± 4%  -49.65%  (p=0.002 n=6+6)
MVCCGarbageCollect/pebble/keySize=128/valSize=128/numKeys=1024/numVersions=1024/deleteVersions=1023-24    1.58s ± 2%   0.80s ± 2%  -49.23%  (p=0.002 n=6+6)
```

Release note (performance improvement): Improved the efficiency of garbage
collection when there are a large number of versions of a single key,
commonly found when utilizing sequences.
@ajwerner ajwerner requested a review from itsbilal July 15, 2020 15:04
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: really like the test in particular! And a nice optimization overall.

It might also be a good idea running any relevant roachtests with the COCKROACH_STORAGE_ENGINE=rocksdb env variable set in the environment calling the roachtest command, just to double-sanity-check the behaviour with rocksdb. But it looks good to me.

Reviewed 6 of 6 files at r1, 1 of 1 files at r2, 2 of 2 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/mvcc.go, line 3270 at r3 (raw file):

				// importantly, this optimization mitigated the overhead of the Seek
				// approach when almost all of the versions are garbage.
				const nextsBeforeSeekLT = 4

Some food for thought in the future: This iterate-a-couple-times-through-versions-then-seek idiom has started to show up a bit more regularly. pebbleMVCCScanner uses a dynamic value for this parameter that gets adjusted between [1, 10] depending on how many versions it encounters in practice. Maybe if we abstracted away some of this logic, we could 1) use it in more places, and 2) be able to take advantage of dynamically adjusting this variable in all those places too.

Copy link
Copy Markdown
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran YCSB with a short GC TTL. GC happened and there were no problems. Thanks again for the review!

bors r=itsbilal

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/storage/mvcc.go, line 3270 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Some food for thought in the future: This iterate-a-couple-times-through-versions-then-seek idiom has started to show up a bit more regularly. pebbleMVCCScanner uses a dynamic value for this parameter that gets adjusted between [1, 10] depending on how many versions it encounters in practice. Maybe if we abstracted away some of this logic, we could 1) use it in more places, and 2) be able to take advantage of dynamically adjusting this variable in all those places too.

Interesting. It's not totally obvious to me how to extract this logic correctly.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 16, 2020

Build succeeded

@craig craig bot merged commit 3fba3b9 into cockroachdb:master Jul 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants