gc: use clear range operations to remove multiple garbage keys across keys and versions#90830
Conversation
cac19b4 to
42e5b13
Compare
549ddf1 to
98b1a41
Compare
92bc9b3 to
ed74dcc
Compare
dhartunian
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
-- commits line 57 at r7:
can you include a release note for the new metric?
erikgrinaker
left a comment
There was a problem hiding this comment.
Flushing some comments after an initial high-level pass. I need to do a more thorough pass later, in particular of the GC batching code.
ed74dcc to
7df3f66
Compare
ffb73af to
f80f540
Compare
erikgrinaker
left a comment
There was a problem hiding this comment.
I still need to do another pass over the batch logic, but I've re-reviewed the rest.
There was a problem hiding this comment.
Let's add a couple of variants that check a live value and a point tombstone that are exactly on the GC threshold too (or do we have them already?).
There was a problem hiding this comment.
Not much sense for the full range deletion. I add one more test but overall we bail if there's anything not masked by range tombstone even if technically it should be collectable. E.g. point tombstone followed by range tombstone is not enough for fast path to succeed.
But I'll check if we can add more for non optimized case.
There was a problem hiding this comment.
The motivation is that I want to make sure we have test coverage for this branch, and the corresponding branch in the full-range deletion:
I don't believe we currently have any test cases that hit it?
There was a problem hiding this comment.
Actually I think this condition is not needed at all. If key is garbage then it must be covered by newer value (its timestamp must be below coveredBy). And coveredBy timestamp is guaranteed to be at or below gcThreshold so we must be good.
pkg/kv/kvserver/gc/gc.go
Outdated
There was a problem hiding this comment.
Yeah, I think we discussed not GCing range keys if point key GC failed.
pkg/kv/kvserver/gc/gc.go
Outdated
There was a problem hiding this comment.
Can this be equal to the last key? Doesn't it have to be greater than it in order to cover it?
Also, this should be "no keys" rather than "no garbage keys", right? Because any live keys would be removed.
There was a problem hiding this comment.
Yeah, that is correct. It could be last when we start iterating, but we will always tighten it to the batch[0][0].Next() before sending.
erikgrinaker
left a comment
There was a problem hiding this comment.
The GC batcher LGTM, feel free to merge once comments have been addressed.
|
PS: this can close out #61455 too. |
f226f81 to
1da6d73
Compare
This commit adds timestamp field to clear_range_key parameters of GC request. New timestamp parameter is used by GC queue to remove chunks of consecutive keys where no new data was written above the GC threshold and storage can optimize deletions with range tombstones. Release note: None
Deleting multiple versions of same key or continuous ranges of point keys with pebble tombstones could be inefficient when range has lots of garbage. This pr adds a storage gc operation that uses pebble range tombstones to remove data. Release note: None
IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans Existing tests moved from ReplicaMVCCDataIterator to IterateMVCCReplicaKeySpans as former is being sunsetted and only a single call site exists. Release note: None
1da6d73 to
c3e941f
Compare
When deleting large chunks of keys usually produced by range tombstones GC will use point deletion for all versions found within replica key range. That kind of operations would generate unnecessary load on the storage because each individual version must be deleted independently. This patch adds an ability for GC to find ranges of consecutive keys which are not interleaved with any newer data and create clear_range_key GC requests to remove them. Release note: None
This commit adds metrics to GC ClearRange requests in GC. Metrics are shared with with existing GC ClearRangeKey full range deletions since they are peforming similar operation and exposed as queue.gc.info.clearrangesuccess/queue.gc.info.clearrangefailed. Release note (ops change): Metrics queue.gc.info.clearrangesuccess queue.gc.info.clearrangefailed updated to include statistics about GC operations that perform ClearRange on parts of range keyspace. Previously those metrics only included requests to remove range data completely when performing a schema change.
ClearRange fields of GCRequests are ignored by earlier versions, this feature needs to be version gated to let GC work when talking to a previous version if GC is run from a different node. Release note: None
Handle ClearRange GC request fields to use GCClearRange storage functions for ranges of keys that contain no visible data instead of removing each key individually. This PR only enables processing of request field, while actual GC and storage functionality is added in previous PRs. Release note: None
c3e941f to
9efbf4f
Compare
|
bors r=erikgrinaker |
|
Build failed (retrying...): |
|
Build succeeded: |
Summary:
GC Will track consecutive keys and issue GCClearRange requests to remove multiple keys at once if it can find more than configured number of keys. This covers multiple versions of the same key as well as multiple consecutive keys.
On the GC side it will count consecutive keys while building "points batches" e.g. batches containing MVCC keys to delete. Those batches are capped to fit into resulting raft entries once deletion is executed. GC could accumulate more than a single batch in memory in that case. Once necessary number of consecutive keys is found, GCClearRange is issued and pending batches are discarded. If non gc-able data is found, pending batches are GC'd and clear range tracking is restarted.
To protect GC from consuming too much memory for example if it collects a range with very large keys, it uses a configrable memory limit on how many key bytes could be held in pending batches. If this limit is reached before GC reaches minimum desired number of keys, oldest batch is GC'd and starting point for clear range is adjusted.
On the evaluation side, GC requests for ranges will obtain write latches removed range to prevent other requests writing in the middle of the range. However, if only a single key is touched i.e. multiple versions are deleted, then no latches are obtained as it is done when deleting individual versions as all the removed data is below the GC threshold.
GC exposes a metric
queue.gc.info.numclearconsecutivekeysthat shows how many times GCClearRange was used.GC adds new system setting
kv.gc.clear_range_min_keysthat sets minimum number of keys eligible for GCClearRange.batcheval: handle ClearSubRange request fields in GC requests
Handle ClearSubRange GC request fields to use GCClearRange storage
functions for ranges of keys that contain no visible data instead
of removing each key individually.
This PR only enables processing of request field, while actual GC
and storage functionality is added in previous PRs.
Release note: None
gc: add version gate to GC clear range requests
ClearRange fields of GCRequests are ignored by earlier versions,
this feature needs to be version gated to let GC work when talking
to a previous version if GC is run from a different node.
Release note: None
gc: add metrics for clear range requests in GC
This commit adds two metrics for clear range functionality in GC:
queue.gc.info.numclearconsecutivekeys is incremented every time GC sends
a ClearRange request for a bunch of consecutive keys to a replica
Release note (ops change): Metric queue.gc.info.numclearconsecutivekeys
added. It exposes number of clear range requests to remove consecutive
keys issued by mvcc garbage collection.
gc: GC use delete_range_keys for consecutive ranges of keys
When deleting large chunks of keys usually produced by range tombstones
GC will use point deletion for all versions found within replica key
range.
That kind of operations would generate unnecessary load on the storage
because each individual version must be deleted independently.
This patch adds an ability for GC to find ranges of consecutive keys
which are not interleaved with any newer data and create
delete_range_key GC requests to remove them.
Release note: None
rditer: helper to visit multiple replica key spans
IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans
Existing tests moved from ReplicaMVCCDataIterator to
IterateMVCCReplicaKeySpans as former is being sunsetted and only a
single call site exists.
Release note: None
storage: add gc operation that uses clear range
Deleting multiple versions of same key or continuous ranges of
point keys with pebble tombstones could be inefficient when range
has lots of garbage.
This pr adds a storage gc operation that uses pebble range tombstones
to remove data.
Release note: None
gc: add delete range parameters to gc request
This commit adds clear_sub_range_key parameters to GC request.
clear_sub_range_keys is used by GC queue to remove chunks of
consecutive keys where no new data was written above the GC
threshold and storage can optimize deletions with range
tombstones.
To support new types of keys, GCer interface is also updated
to pass provided keys down to request.
Release note: None
Fixes: #84560, #61455