kv: don't hold latches while rate limiting ExportRequests#66338
kv: don't hold latches while rate limiting ExportRequests#66338craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
2d32435 to
f2b537b
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)
pkg/util/limit/limiter.go, line 28 at r1 (raw file):
} // Reservation is a allocation from a limiter which should be released once the
"an allocation"
pkg/util/limit/limiter.go, line 30 at r1 (raw file):
// Reservation is a allocation from a limiter which should be released once the // limited task has been completed. type Reservation interface {
Is this new interface necessary? Can't clients work on quota pool allocs?
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/store_send.go, line 295 at r1 (raw file):
// Limit the number of concurrent Export requests, as these often scan and // entire Range at a time and place significant read load on a Store. return s.limiters.ConcurrentExportRequests.Begin(ctx)
Do you think its worth emitting a trace event or a log message (like above) of some sort?
This commit moves ExportRequest rate limiting up above evaluation and outside of latching. This ensures that if an ExportRequest is rate-limited, it does not inadvertently block others while waiting. Interestingly, we were already careful about this for `AddSSTableRequests` because it is easier for them to block others while holding latches. In the case of read-only requests like `ExportRequest`, blocking others is less common. However, it is still possible. Notably, MVCC write requests with lower timestamps than the read will still block. Additionally, non-MVCC requests like range splits will block. In one customer investigation, we found that an export request was holding latches and blocking a non-MVCC request (a range split) which was transitively blocking all write traffic to the range and all read traffic to the RHS of the range. We believe that the stall lasted for so long (seconds to minutes) because the ExportRequest was throttled while holding its latches. I did notice that some of this code was just touched by cockroachdb#66092, so any potential backport here may be a little involved. Release note (bug fix): Backups no longer risk the possibility of blocking conflicting writes while being rate limited by the kv.bulk_io_write.concurrent_export_requests concurrency limit.
f2b537b to
cbc0810
Compare
nvb
left a comment
There was a problem hiding this comment.
TFTRs!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15 and @andreimatei)
pkg/kv/kvserver/store_send.go, line 295 at r1 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
Do you think its worth emitting a trace event or a log message (like above) of some sort?
Yes, done.
pkg/util/limit/limiter.go, line 28 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
"an allocation"
Done.
pkg/util/limit/limiter.go, line 30 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Is this new interface necessary? Can't clients work on quota pool allocs?
We don't need it, but I found it strange that the limit package would be returning a *quotapool.IntAlloc. Hiding this behind an interface is free and seemed like a good idea.
|
bors r+ |
|
Build succeeded: |
This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.
Interestingly, we were already careful about this for
AddSSTableRequestsbecause it is easier for them to block others while holding latches. In
the case of read-only requests like
ExportRequest, blocking others isless common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.
I did notice that some of this code was just touched by #66092, so any
potential backport here may be a little involved.
Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.
/cc. @cockroachdb/kv