-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: export remaining snapshot bytes #85528
Description
Summary
Each snapshot may be a different size, it would be beneficial to track the total remaining snapshot bytes that are queued and in progress on a store's receiver snapshot semaphore. Additionally the remaining bytes that are queued on a store's sender snapshot semaphore.
Note we currently track the current reservations in bytes, which is the current size of the snapshot(s) being processed on a store capacity.reserved.
Solution
The solution is to add four additional exported metrics, with the last two optional and pending how useful they are:
range.snapshots.queued-rcvd: a gauge tracking the sum of all snapshot bytes that are currently queued on a store's receive queue, however have not gotten a reservation (begun processing).range.snapshots.queued-send: a gauge tracking the sum of all snapshot bytes that are currently queued on a store's send queue, however have not begun gotten a reservation (begun processing).range.snapshots.pending-rcvd: a gauge tracking the sum of all snapshot bytes that remain on a store's receiving side, for snapshots that have acquired a reservation. This could be updated more frequently, to track the "remaining bytes" i.e. reservation - processed.range.snapshots.pending-send: a gauge tracking the sum of all snapshot bytes that remain on a store's sending side, for snapshots that have acquired a reservation. Similar to above, this is tracking the remaining bytes to be sent.
Context
(3) and (4) may not present much material benefit, as snapshots should in most cases be processed in under 16 (512mb/32mb/s) seconds. Whilst the default metric update interval is 10 seconds, In cases where the snapshot rate is set lower, it may provide utility - however the existing capacity.reserved metric, tracking the total (unprocessed + processed) in progress snapshot bytes may be more appropriate. This issue leaves them as optional.
related PR, for count rather than bytes: #84947
cc @AlexTalks
Jira issue: CRDB-18293