Skip to content

util/limit: replace unfair semaphore with fair quotapool#66092

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/replace-semaphore-with-quotapool-in-limiter
Jun 5, 2021
Merged

util/limit: replace unfair semaphore with fair quotapool#66092
craig[bot] merged 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/replace-semaphore-with-quotapool-in-limiter

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner commented Jun 4, 2021

Release note: None

@ajwerner ajwerner requested review from a team and dt June 4, 2021 17:45
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Jun 4, 2021

@dt want a backport on this?

@dt
Copy link
Copy Markdown
Contributor

dt commented Jun 4, 2021

maybe after a week or two of bake-time on master?

@ajwerner ajwerner force-pushed the ajwerner/replace-semaphore-with-quotapool-in-limiter branch from 5085fcf to b419c90 Compare June 4, 2021 20:22
@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Jun 4, 2021

TFTR!

bors r=dt

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 5, 2021

Build succeeded:

@craig craig bot merged commit 9cef2e9 into cockroachdb:master Jun 5, 2021
nvb added a commit to nvb/cockroach that referenced this pull request Jun 10, 2021
This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.

Interestingly, we were already careful about this for `AddSSTableRequests`
because it is easier for them to block others while holding latches. In
the case of read-only requests like `ExportRequest`, blocking others is
less common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.

I did notice that some of this code was just touched by cockroachdb#66092, so any
potential backport here may be a little involved.

Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.
nvb added a commit to nvb/cockroach that referenced this pull request Jun 11, 2021
This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.

Interestingly, we were already careful about this for `AddSSTableRequests`
because it is easier for them to block others while holding latches. In
the case of read-only requests like `ExportRequest`, blocking others is
less common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.

I did notice that some of this code was just touched by cockroachdb#66092, so any
potential backport here may be a little involved.

Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.
craig bot pushed a commit that referenced this pull request Jun 11, 2021
66338: kv: don't hold latches while rate limiting ExportRequests r=nvanbenschoten a=nvanbenschoten

This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.

Interestingly, we were already careful about this for `AddSSTableRequests`
because it is easier for them to block others while holding latches. In
the case of read-only requests like `ExportRequest`, blocking others is
less common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.

I did notice that some of this code was just touched by #66092, so any
potential backport here may be a little involved.

Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.

/cc. @cockroachdb/kv 

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jun 11, 2021

Do you two feel ok with me backporting this to 20.2 and 21.1 as part of the backport of #66338? Has it baked for long enough?

@ajwerner
Copy link
Copy Markdown
Contributor Author

@nvanbenschoten I feel good about it.

nvb added a commit to nvb/cockroach that referenced this pull request Jun 14, 2021
This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.

Interestingly, we were already careful about this for `AddSSTableRequests`
because it is easier for them to block others while holding latches. In
the case of read-only requests like `ExportRequest`, blocking others is
less common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.

I did notice that some of this code was just touched by cockroachdb#66092, so any
potential backport here may be a little involved.

Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.
nvb added a commit to nvb/cockroach that referenced this pull request Jun 14, 2021
This commit moves ExportRequest rate limiting up above evaluation and
outside of latching. This ensures that if an ExportRequest is
rate-limited, it does not inadvertently block others while waiting.

Interestingly, we were already careful about this for `AddSSTableRequests`
because it is easier for them to block others while holding latches. In
the case of read-only requests like `ExportRequest`, blocking others is
less common. However, it is still possible. Notably, MVCC write requests
with lower timestamps than the read will still block. Additionally,
non-MVCC requests like range splits will block. In one customer
investigation, we found that an export request was holding latches and
blocking a non-MVCC request (a range split) which was transitively
blocking all write traffic to the range and all read traffic to the RHS
of the range. We believe that the stall lasted for so long (seconds to
minutes) because the ExportRequest was throttled while holding its
latches.

I did notice that some of this code was just touched by cockroachdb#66092, so any
potential backport here may be a little involved.

Release note (bug fix): Backups no longer risk the possibility of
blocking conflicting writes while being rate limited by the
kv.bulk_io_write.concurrent_export_requests concurrency limit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants