backport-2.0: storage: proactively renew expiration-based leases#28931
backport-2.0: storage: proactively renew expiration-based leases#28931craig[bot] merged 3 commits intocockroachdb:release-2.0from
Conversation
Run a store-level background worker that proactively renews the lease for r1. It could easily be extended to also renew other expiration-based leases if we so desire in the future. Fixes cockroachdb#24753 Release note: None
This builds off the previous commit which proactively renewed only the first range's lease. Release note: None
petermattis
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/storage/replica_proposal.go, line 272 at r1 (raw file):
// Notify the lease renewer worker that we want it to renew the first // range's expiration-based lease. r.store.expirationBasedLeaseChan <- r
This can block if the channel isn't drained promptly. Is that a problem? Can we ever get into a situation where range 1 is blocked below Raft because the lease renewer goroutine is acquiring a lease which depends on range 1 processing requests. Since the lease renewer contains a map[*Replica]struct{}, perhaps we should elevate that map to a member of Store and protect it with a lock and make expirationBasedLeaseChan a signal.
Apologies if this was already discussed on the original PR.
pkg/storage/replica_proposal.go, line 278 at r2 (raw file):
// It's not worth blocking on a full channel here since the worst that can // happen is the lease times out and has to be reacquired when needed, but // log an error since it's pretty unexpected.
This feels like more evidence for a Store.renewLeases map.
pkg/storage/store.go, line 1541 at r2 (raw file):
// holding a bunch of meta2 ranges just failed), so keep pulling until // we can't get any more. continuepulling:
Thinking about failure scenarios here. Refreshing of the range 1 lease is critical to avoid gossip partitions. Are any of the other expiration-based leases as important? What happens if lease acquisition of a non-range 1 lease blocks prevent acquisition of the range 1 lease? Could that lead to a gossip partition and cascading badness?
Perhaps we should leave off the second commit for this backport. Perhaps I am being overly paranoid right now.
a-robinson
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/storage/replica_proposal.go, line 272 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This can block if the channel isn't drained promptly. Is that a problem? Can we ever get into a situation where range 1 is blocked below Raft because the lease renewer goroutine is acquiring a lease which depends on range 1 processing requests. Since the lease renewer contains a
map[*Replica]struct{}, perhaps we should elevate that map to a member ofStoreand protect it with a lock and makeexpirationBasedLeaseChana signal.Apologies if this was already discussed on the original PR.
Huh? Isn't that what the select statement with a default case is for? It could get into a state where we can't add new replicas because the channel is full, but it would never block.
Using a shared map with a signal would be reasonable, though.
pkg/storage/replica_proposal.go, line 278 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This feels like more evidence for a
Store.renewLeasesmap.
Agreed. I'll make the change on master and cherry pick here once we've agreed on the other open question.
pkg/storage/store.go, line 1541 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Thinking about failure scenarios here. Refreshing of the range 1 lease is critical to avoid gossip partitions. Are any of the other expiration-based leases as important? What happens if lease acquisition of a non-range 1 lease blocks prevent acquisition of the range 1 lease? Could that lead to a gossip partition and cascading badness?
Perhaps we should leave off the second commit for this backport. Perhaps I am being overly paranoid right now.
No other leases are as important. The liveness range is close, but it's getting pretty much constantly refreshed anyways by liveness heartbeats.
We'd have to fail to auto-refresh the r1 lease a lot of times in a row on all of its replicas in order to hit a gossip partition. It might be possible, though, if one of the other replicas' calls to redirectOnOrAcquireLease blocks forever on each of the nodes. In the common case it doesn't block, because when the lease is already valid it just kicks off an async renewal and immediately returns. But it does look possible for it to block forever if the lease is expired.
Putting a timeout on the context may be enough to get such bad replicas out of the queue (since they'd be deleted from repls and wouldn't be re-added unless their lease gets reacquired, meaning they're healthy again). Using multiple goroutines for the redirectOnOrAcquireLease calls would make such blockages less likely, but wouldn't inherently prevent them unless we allowed for unbounded goroutines (1 per replica).
I'd rather use the same code for 2.0 as for 2.1, particularly given that the two commits together have seen infinitely more testing than just the first commit by itself. If that means we have to add a little more logic, so be it.
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/storage/replica_proposal.go, line 272 at r1 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
Huh? Isn't that what the
selectstatement with a default case is for? It could get into a state where we can't add new replicas because the channel is full, but it would never block.Using a shared map with a signal would be reasonable, though.
I made this comment on the first commit where the code is r.store.expirationBasedLeaseChan <- r which might block forever. The next commit uses a select with a default case which avoids the blocking, but has the downside that we might never notify the lease renewer to renew the lease if the channel is full for some reason.
pkg/storage/replica_proposal.go, line 278 at r2 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
Agreed. I'll make the change on master and cherry pick here once we've agreed on the other open question.
Ack.
pkg/storage/store.go, line 1541 at r2 (raw file):
Previously, a-robinson (Alex Robinson) wrote…
No other leases are as important. The liveness range is close, but it's getting pretty much constantly refreshed anyways by liveness heartbeats.
We'd have to fail to auto-refresh the r1 lease a lot of times in a row on all of its replicas in order to hit a gossip partition. It might be possible, though, if one of the other replicas' calls to
redirectOnOrAcquireLeaseblocks forever on each of the nodes. In the common case it doesn't block, because when the lease is already valid it just kicks off an async renewal and immediately returns. But it does look possible for it to block forever if the lease is expired.Putting a timeout on the context may be enough to get such bad replicas out of the queue (since they'd be deleted from
replsand wouldn't be re-added unless their lease gets reacquired, meaning they're healthy again). Using multiple goroutines for theredirectOnOrAcquireLeasecalls would make such blockages less likely, but wouldn't inherently prevent them unless we allowed for unbounded goroutines (1 per replica).I'd rather use the same code for 2.0 as for 2.1, particularly given that the two commits together have seen infinitely more testing than just the first commit by itself. If that means we have to add a little more logic, so be it.
You've convinced me that we should use the same code for 2.0 and 2.1. I think the auto-refresh loop can remain as-is for now, modulo switching to using a map with lock + channel for signaling.
a-robinson
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/storage/replica_proposal.go, line 272 at r1 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I made this comment on the first commit where the code is
r.store.expirationBasedLeaseChan <- rwhich might block forever. The next commit uses aselectwith a default case which avoids the blocking, but has the downside that we might never notify the lease renewer to renew the lease if the channel is full for some reason.
Ack, I didn't realize that.
pkg/storage/replica_proposal.go, line 278 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Ack.
Done in #28975. I'll backport after you've reviewed it there. Tested locally on a fresh cluster to confirm expiration-based leases stay in place.
pkg/storage/store.go, line 1541 at r2 (raw file):
Previously, petermattis (Peter Mattis) wrote…
You've convinced me that we should use the same code for 2.0 and 2.1. I think the auto-refresh loop can remain as-is for now, modulo switching to using a map with lock + channel for signaling.
Ack, made the change to a map.
As discussed on cockroachdb#28931 Release note: None
As discussed on cockroachdb#28931 Release note: None
As discussed on cockroachdb#28931 Release note: None
|
Added the commit from #28975 |
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
|
bors r+ |
Build failed |
28912: kv: teach DistSender about RangeFeeds, use for changefeeds r=nvanbenschoten a=nvanbenschoten This change adds a RangeFeed method to DistSender and teaches it how to split rangefeeds across ranges and how to manage their individual lifecycle events. It then adjusts the changefeed poller to use RangeFeed requests when the corresponding cluster setting is enabled. 28975: storage: Track renewable expiration-based leases in map instead of chan r=a-robinson a=a-robinson As discussed on #28931 Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
|
bors r+ |
Build failed |
|
It looks like a retry should work judging by the build error: I've confirmed that the download works now. I'll try one more time, but it's weird to see two consecutive errors just building the test binary and verifying the archive... bors r+ |
28931: backport-2.0: storage: proactively renew expiration-based leases r=a-robinson a=a-robinson Backport 2/2 commits from #25322. /cc @cockroachdb/release --- This mechanism won't scale incredibly well to proactively renewing all expiration-based leases since it'd require a goroutine per lease. It should be fine if we only auto-renew r1's lease, though. Fixes #24753 Release note: None ---------------- To fix #27731 in v2.0. Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Build succeeded |
29296: backport-2.0: Fix test flakes caused by automatic lease renewal r=a-robinson a=a-robinson Backport: * 1/1 commits from "storage: Adding testing knob to disable automatic lease renewals" (#25625) * 1/1 commits from "storage: Fix flaky TestReplicaNotLeaseHolderError" (#25676) * 1/1 commits from "storage: Deflake TestReplicaLease" (#25781) Please see individual PRs for details. /cc @cockroachdb/release Fixes #29189. The flakiness was introduced to the 2.0 branch by #28931. Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com> Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
Backport 2/2 commits from #25322.
/cc @cockroachdb/release
This mechanism won't scale incredibly well to proactively renewing all
expiration-based leases since it'd require a goroutine per lease. It
should be fine if we only auto-renew r1's lease, though.
Fixes #24753
Release note: None
To fix #27731 in v2.0.