kvserver: add metrics to track snapshot queueing and application#84947
kvserver: add metrics to track snapshot queueing and application#84947craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
3f1c924 to
8c7789d
Compare
8c7789d to
9d995b1
Compare
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @KnightAsterial)
pkg/kv/kvserver/store_snapshot.go line 596 at r1 (raw file):
} inProgressSnapshotMetric.Inc(1)
The way this is written, we won't be recording empty snapshots ingestion as "in progress". I don't have a strong opinion but curious what you think.
I think it would be better to have this increment (and the following decrement in the returned closure) outside of this if-block, what do you think? I think there might be instances where we might find it useful to observe the impact of a wave of empty snapshot ingestion.
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @KnightAsterial)
pkg/kv/kvserver/store_snapshot.go line 596 at r1 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
The way this is written, we won't be recording empty snapshots ingestion as "in progress". I don't have a strong opinion but curious what you think.
I think it would be better to have this increment (and the following decrement in the returned closure) outside of this if-block, what do you think? I think there might be instances where we might find it useful to observe the impact of a wave of empty snapshot ingestion.
This was intentional - since the empty snapshots don't wait for the semaphore (on either sender or receiver side), I thought it wouldn't be extremely useful to see how many are in progress at a given time, especially when in the same metric as non-empty snapshots. Do you think we should add a separate metric for this? Or incorporate the non-empty snapshots as well?
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @KnightAsterial)
pkg/kv/kvserver/store_snapshot.go line 596 at r1 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
This was intentional - since the empty snapshots don't wait for the semaphore (on either sender or receiver side), I thought it wouldn't be extremely useful to see how many are in progress at a given time, especially when in the same metric as non-empty snapshots. Do you think we should add a separate metric for this? Or incorporate the non-empty snapshots as well?
Incorporate the non- empty snapshots, I mean.
|
Related but not a comment on this patch. Is it possible (straightforward) to instrument tracking for the number of current snapshot bytes queued and the remaining bytes in processing? edit: Tracing is added for timing in @KnightAsterial's patch #83003, however not a metric for latency or bytes. |
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @KnightAsterial)
pkg/kv/kvserver/store_snapshot.go line 596 at r1 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Incorporate the
non-empty snapshots, I mean.
My suggestion would be to incorporate empty snapshots in this metric and add a separate metric that tracks just the set of non-empty snapshots. So we'd have one metric corresponding to all snapshots and one corresponding to just non-empty snapshots.
9d995b1 to
79e7df4
Compare
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @KnightAsterial)
pkg/kv/kvserver/store_snapshot.go line 596 at r1 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
My suggestion would be to incorporate empty snapshots in this metric and add a separate metric that tracks just the set of non-empty snapshots. So we'd have one metric corresponding to all snapshots and one corresponding to just non-empty snapshots.
Added range.snapshots.(send|recv)-total-in-progress metrics to track empty snapshots as well.
AlexTalks
left a comment
There was a problem hiding this comment.
As for @kvoli's suggestion, I assume you mean per-snapshot, tracking on the sender side for remaining, and on the receiver side for the queued bytes? I think this is definitely feasible but I'd put it in a separate PR.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @KnightAsterial)
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @AlexTalks, and @KnightAsterial)
pkg/kv/kvserver/store_snapshot.go line 569 at r2 (raw file):
) (_cleanup func(), _err error) { tBegin := timeutil.Now() totalInProgressSnapshotMetric.Inc(1)
This increment should be after the logic that waits to acquire the snapshot reservation, right?
Currently, we have this asymmetry where the total metric includes the snapshots waiting on the semaphore whereas the non-empty in-progress metric doesn't.
79e7df4 to
c48b1e6
Compare
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @KnightAsterial)
pkg/kv/kvserver/store_snapshot.go line 569 at r2 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
This increment should be after the logic that waits to acquire the snapshot reservation, right?
Currently, we have this asymmetry where the
totalmetric includes the snapshots waiting on the semaphore whereas the non-empty in-progress metric doesn't.
Thanks for catching - should be fixed now.
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @KnightAsterial)
|
bors r+ |
|
Merge conflict. |
While we previously had one metric to track "in-progress" snapshots, `replicas.reserved`, this metric is used by both senders and receivers, and does not track the number of snapshots queued on its semaphore. This change adds the following new metrics, separated by sender and receiver: ``` range.snapshots.send-queue range.snapshots.recv-queue range.snapshots.send-in-progress range.snapshots.recv-in-progress range.snapshots.send-total-in-progress range.snapshots.recv-total-in-progress ``` These metrics specifically track the number of queued non-empty snapshots on a sender or receiver store, as well as the number of snapshots currently being sent or received by a given store. The first four metrics specifically only track non-empty snapshots, because empty snapshots are not throttled at all, and thus do not compete for the semaphore. The final two metrics track all snapshots, including empty ones. Release note (ops change): Added new metrics `range.snapshots.(send|recv)-queue` and `range.snapshots.(send|recv)-in-progress` to track the number of queued and in-progress snapshots being sent or received on a store.
c48b1e6 to
fc85918
Compare
|
bors r+ |
|
Build failed (retrying...): |
|
Build succeeded: |
While we previously had one metric to track "in-progress" snapshots,
replicas.reserved, this metric is used by both senders and receivers,and does not track the number of snapshots queued on its semaphore. This
change adds the following new metrics, separated by sender and receiver:
These metrics specifically track the number of queued non-empty
snapshots on a sender or receiver store, as well as the number of
snapshots currently being sent or received by a given store. The
first four metrics specifically only track non-empty snapshots, because
empty snapshots are not throttled at all, and thus do not compete for the
semaphore. The final two metrics track all snapshots, including empty
ones.
Release note (ops change): Added new metrics
range.snapshots.(send|recv)-queueandrange.snapshots.(send|recv)-in-progressto track the number of queuedand in-progress snapshots being sent or received on a store.