storage: proactively renew expiration-based leases#25322
storage: proactively renew expiration-based leases#25322craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
Review status: 0 of 2 files reviewed at latest revision, all discussions resolved. pkg/storage/replica.go, line 1544 at r1 (raw file):
I don't like the level of detailed knowledge of the lease request machinery here. Could we do something simpler like send a Get request to Replica.Send and let all the lease stuff happen under the covers? pkg/storage/replica_proposal.go, line 274 at r1 (raw file):
I'm slightly worried that we could have a lot of these goroutines pile up if they're not exiting for some reason. I was imagining a single store-level goroutine that would periodically check to see whether we have a replica of r1 and if so try to do something to refresh the lease. Comments from Reviewable |
|
TFTR! Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions. pkg/storage/replica_proposal.go, line 274 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Fair enough. Done. Although only for the leaseholder replica. Do you really think it should be done on all replicas of r1 rather than just on (what we think is) the leaseholder? pkg/storage/replica.go, line 1544 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
That doesn't really help with the timing issue, and I'm not sure sending a request is really simpler than calling I've changed to doing this, just calling it more frequently in place of the awkward timing logic. Comments from Reviewable |
|
Reviewed 2 of 3 files at r2. pkg/storage/replica_proposal.go, line 274 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Yes, I was thinking that all replicas of r1 would periodically try to acquire its lease (driven only by a ticker in the goroutine, no need for a trigger in leasePostApply). But doing it this way so that it just renews leases and leaves new acquisitions to be done lazily is fine too. pkg/storage/store.go, line 1560 at r2 (raw file):
I think we should go ahead and do this for all expiration-based leases now that it's not a goroutine per range. Comments from Reviewable |
|
Review status: 0 of 2 files reviewed at latest revision, 2 unresolved discussions. pkg/storage/store.go, line 1560 at r2 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. The new commit may be worth a look, although it should just be what you expect. Comments from Reviewable |
|
LGTM |
|
bors r+ |
Merge conflict (retrying...) |
Run a store-level background worker that proactively renews the lease for r1. It could easily be extended to also renew other expiration-based leases if we so desire in the future. Fixes cockroachdb#24753 Release note: None
This builds off the previous commit which proactively renewed only the first range's lease. Release note: None
|
bors r+ |
Build failed (retrying...) |
25216: sql: fix a bunch of minor spacing issues in the grammar r=BramGruneir a=BramGruneir I was trying to do some searching and couldn't find what I was looking for due to inconsistent spacing. We should move towards adding a linter for the grammer. Release note: None 25322: storage: proactively renew r1's lease r=a-robinson a=a-robinson This mechanism won't scale incredibly well to proactively renewing all expiration-based leases since it'd require a goroutine per lease. It should be fine if we only auto-renew r1's lease, though. Fixes #24753 Release note: None ---------------- @bdarnell, do you have any suggestions for testing this? I've verified it manually, but it's hard to properly isolate this code in a realistic way. We could ratchet the lease durations way down and wait until the lease's expiration bumps, but that creates a timing-based test that is either flaky if we use a really short lease duration or slow if we use a long one. Co-authored-by: Bram Gruneir <bram@cockroachlabs.com> Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Build succeeded |
|
It would appear this made |
This introduction of proactive lease renewals (cockroachdb#25322) made this test flaky in two ways. First, the test was (oddly) creating two Replica objects for range 1 on the same store (since cockroachdb#2996), leading to races when combined with the background lease renewal thread. Second, the test expects leases to expire so it needs to disable automatic renewals. Fixes cockroachdb#25748 Release note: None
25781: storage: Deflake TestReplicaLease r=tschottdorf a=bdarnell This introduction of proactive lease renewals (#25322) made this test flaky in two ways. First, the test was (oddly) creating two Replica objects for range 1 on the same store (since #2996), leading to races when combined with the background lease renewal thread. Second, the test expects leases to expire so it needs to disable automatic renewals. Fixes #25748 Release note: None Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
Caused by the proactive renewal of expiration-based leases (see cockroachdb#25322 and cockroachdb#25625 for more detail). Fixes cockroachdb#25672 Release note: None
25676: storage: Fix flaky TestReplicaNotLeaseHolderError r=a-robinson a=a-robinson Caused by the proactive renewal of expiration-based leases (see #25322 and #25625 for more detail). Fixes #25731 Release note: None I'm not sure how many more of these flakes to expect over the next couple weeks. I'm currently running `stressrace` on all of `pkg/storage` on a gceworker to try and ferret them out if there are any more, but there's no guarantee that'll find all of them. 25816: ui: don't show login indicator unless login is enabled r=couchand a=couchand In #25005 I accidentally let the login indicator leak even if the environment variable wasn't set. This corrects that issue. Release note: None Fixes: #25843 25853: dep: Bump grpc-go to include perf improvements and data race fix r=a-robinson a=a-robinson Un-reverts #24883 to include grpc/grpc-go#2074 This also removed hdrhistogram-writer because it apparently isn't used anymore. Release note: None Maybe we want to wait until they put this into a more formal release SHA, though. Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com> Co-authored-by: Andrew Couch <andrew@cockroachlabs.com>
28931: backport-2.0: storage: proactively renew expiration-based leases r=a-robinson a=a-robinson Backport 2/2 commits from #25322. /cc @cockroachdb/release --- This mechanism won't scale incredibly well to proactively renewing all expiration-based leases since it'd require a goroutine per lease. It should be fine if we only auto-renew r1's lease, though. Fixes #24753 Release note: None ---------------- To fix #27731 in v2.0. Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Caused by the proactive renewal of expiration-based leases (see cockroachdb#25322 and cockroachdb#25625 for more detail). Fixes cockroachdb#25672 Release note: None
This introduction of proactive lease renewals (cockroachdb#25322) made this test flaky in two ways. First, the test was (oddly) creating two Replica objects for range 1 on the same store (since cockroachdb#2996), leading to races when combined with the background lease renewal thread. Second, the test expects leases to expire so it needs to disable automatic renewals. Fixes cockroachdb#25748 Release note: None
This mechanism won't scale incredibly well to proactively renewing all
expiration-based leases since it'd require a goroutine per lease. It
should be fine if we only auto-renew r1's lease, though.
Fixes #24753
Release note: None
@bdarnell, do you have any suggestions for testing this? I've verified it manually, but it's hard to properly isolate this code in a realistic way. We could ratchet the lease durations way down and wait until the lease's expiration bumps, but that creates a timing-based test that is either flaky if we use a really short lease duration or slow if we use a long one.