Skip to content

backport-2.0: storage: proactively renew expiration-based leases#28931

Merged
craig[bot] merged 3 commits intocockroachdb:release-2.0from
a-robinson:backport2.0-25322
Aug 28, 2018
Merged

backport-2.0: storage: proactively renew expiration-based leases#28931
craig[bot] merged 3 commits intocockroachdb:release-2.0from
a-robinson:backport2.0-25322

Conversation

@a-robinson
Copy link
Copy Markdown
Contributor

Backport 2/2 commits from #25322.

/cc @cockroachdb/release


This mechanism won't scale incredibly well to proactively renewing all
expiration-based leases since it'd require a goroutine per lease. It
should be fine if we only auto-renew r1's lease, though.

Fixes #24753

Release note: None


To fix #27731 in v2.0.

Run a store-level background worker that proactively renews the lease
for r1. It could easily be extended to also renew other expiration-based
leases if we so desire in the future.

Fixes cockroachdb#24753

Release note: None
This builds off the previous commit which proactively renewed
only the first range's lease.

Release note: None
@a-robinson a-robinson requested review from a team and petermattis August 22, 2018 04:48
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/replica_proposal.go, line 272 at r1 (raw file):

		// Notify the lease renewer worker that we want it to renew the first
		// range's expiration-based lease.
		r.store.expirationBasedLeaseChan <- r

This can block if the channel isn't drained promptly. Is that a problem? Can we ever get into a situation where range 1 is blocked below Raft because the lease renewer goroutine is acquiring a lease which depends on range 1 processing requests. Since the lease renewer contains a map[*Replica]struct{}, perhaps we should elevate that map to a member of Store and protect it with a lock and make expirationBasedLeaseChan a signal.

Apologies if this was already discussed on the original PR.


pkg/storage/replica_proposal.go, line 278 at r2 (raw file):

		// It's not worth blocking on a full channel here since the worst that can
		// happen is the lease times out and has to be reacquired when needed, but
		// log an error since it's pretty unexpected.

This feels like more evidence for a Store.renewLeases map.


pkg/storage/store.go, line 1541 at r2 (raw file):

				// holding a bunch of meta2 ranges just failed), so keep pulling until
				// we can't get any more.
			continuepulling:

Thinking about failure scenarios here. Refreshing of the range 1 lease is critical to avoid gossip partitions. Are any of the other expiration-based leases as important? What happens if lease acquisition of a non-range 1 lease blocks prevent acquisition of the range 1 lease? Could that lead to a gossip partition and cascading badness?

Perhaps we should leave off the second commit for this backport. Perhaps I am being overly paranoid right now.

Copy link
Copy Markdown
Contributor Author

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/replica_proposal.go, line 272 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This can block if the channel isn't drained promptly. Is that a problem? Can we ever get into a situation where range 1 is blocked below Raft because the lease renewer goroutine is acquiring a lease which depends on range 1 processing requests. Since the lease renewer contains a map[*Replica]struct{}, perhaps we should elevate that map to a member of Store and protect it with a lock and make expirationBasedLeaseChan a signal.

Apologies if this was already discussed on the original PR.

Huh? Isn't that what the select statement with a default case is for? It could get into a state where we can't add new replicas because the channel is full, but it would never block.

Using a shared map with a signal would be reasonable, though.


pkg/storage/replica_proposal.go, line 278 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This feels like more evidence for a Store.renewLeases map.

Agreed. I'll make the change on master and cherry pick here once we've agreed on the other open question.


pkg/storage/store.go, line 1541 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Thinking about failure scenarios here. Refreshing of the range 1 lease is critical to avoid gossip partitions. Are any of the other expiration-based leases as important? What happens if lease acquisition of a non-range 1 lease blocks prevent acquisition of the range 1 lease? Could that lead to a gossip partition and cascading badness?

Perhaps we should leave off the second commit for this backport. Perhaps I am being overly paranoid right now.

No other leases are as important. The liveness range is close, but it's getting pretty much constantly refreshed anyways by liveness heartbeats.

We'd have to fail to auto-refresh the r1 lease a lot of times in a row on all of its replicas in order to hit a gossip partition. It might be possible, though, if one of the other replicas' calls to redirectOnOrAcquireLease blocks forever on each of the nodes. In the common case it doesn't block, because when the lease is already valid it just kicks off an async renewal and immediately returns. But it does look possible for it to block forever if the lease is expired.

Putting a timeout on the context may be enough to get such bad replicas out of the queue (since they'd be deleted from repls and wouldn't be re-added unless their lease gets reacquired, meaning they're healthy again). Using multiple goroutines for the redirectOnOrAcquireLease calls would make such blockages less likely, but wouldn't inherently prevent them unless we allowed for unbounded goroutines (1 per replica).

I'd rather use the same code for 2.0 as for 2.1, particularly given that the two commits together have seen infinitely more testing than just the first commit by itself. If that means we have to add a little more logic, so be it.

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/replica_proposal.go, line 272 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Huh? Isn't that what the select statement with a default case is for? It could get into a state where we can't add new replicas because the channel is full, but it would never block.

Using a shared map with a signal would be reasonable, though.

I made this comment on the first commit where the code is r.store.expirationBasedLeaseChan <- r which might block forever. The next commit uses a select with a default case which avoids the blocking, but has the downside that we might never notify the lease renewer to renew the lease if the channel is full for some reason.


pkg/storage/replica_proposal.go, line 278 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

Agreed. I'll make the change on master and cherry pick here once we've agreed on the other open question.

Ack.


pkg/storage/store.go, line 1541 at r2 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

No other leases are as important. The liveness range is close, but it's getting pretty much constantly refreshed anyways by liveness heartbeats.

We'd have to fail to auto-refresh the r1 lease a lot of times in a row on all of its replicas in order to hit a gossip partition. It might be possible, though, if one of the other replicas' calls to redirectOnOrAcquireLease blocks forever on each of the nodes. In the common case it doesn't block, because when the lease is already valid it just kicks off an async renewal and immediately returns. But it does look possible for it to block forever if the lease is expired.

Putting a timeout on the context may be enough to get such bad replicas out of the queue (since they'd be deleted from repls and wouldn't be re-added unless their lease gets reacquired, meaning they're healthy again). Using multiple goroutines for the redirectOnOrAcquireLease calls would make such blockages less likely, but wouldn't inherently prevent them unless we allowed for unbounded goroutines (1 per replica).

I'd rather use the same code for 2.0 as for 2.1, particularly given that the two commits together have seen infinitely more testing than just the first commit by itself. If that means we have to add a little more logic, so be it.

You've convinced me that we should use the same code for 2.0 and 2.1. I think the auto-refresh loop can remain as-is for now, modulo switching to using a map with lock + channel for signaling.

Copy link
Copy Markdown
Contributor Author

@a-robinson a-robinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/storage/replica_proposal.go, line 272 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I made this comment on the first commit where the code is r.store.expirationBasedLeaseChan <- r which might block forever. The next commit uses a select with a default case which avoids the blocking, but has the downside that we might never notify the lease renewer to renew the lease if the channel is full for some reason.

Ack, I didn't realize that.


pkg/storage/replica_proposal.go, line 278 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Ack.

Done in #28975. I'll backport after you've reviewed it there. Tested locally on a fresh cluster to confirm expiration-based leases stay in place.


pkg/storage/store.go, line 1541 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

You've convinced me that we should use the same code for 2.0 and 2.1. I think the auto-refresh loop can remain as-is for now, modulo switching to using a map with lock + channel for signaling.

Ack, made the change to a map.

a-robinson added a commit to a-robinson/cockroach that referenced this pull request Aug 22, 2018
a-robinson added a commit to a-robinson/cockroach that referenced this pull request Aug 27, 2018
@a-robinson
Copy link
Copy Markdown
Contributor Author

Added the commit from #28975

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@a-robinson
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 27, 2018

Build failed

craig bot pushed a commit that referenced this pull request Aug 27, 2018
28912: kv: teach DistSender about RangeFeeds, use for changefeeds r=nvanbenschoten a=nvanbenschoten

This change adds a RangeFeed method to DistSender and teaches it how
to split rangefeeds across ranges and how to manage their individual
lifecycle events. It then adjusts the changefeed poller to use RangeFeed
requests when the corresponding cluster setting is enabled.

28975: storage: Track renewable expiration-based leases in map instead of chan r=a-robinson a=a-robinson

As discussed on #28931

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
@a-robinson
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 28, 2018

Build failed

@a-robinson
Copy link
Copy Markdown
Contributor Author

It looks like a retry should work judging by the build error:

[10:03:46][Build test binary] error An unexpected error occurred: "https://registry.yarnpkg.com/@protobufjs/aspromise/-/aspromise-1.1.1.tgz: Request failed \"500 Internal Server Error\"".

I've confirmed that the download works now. I'll try one more time, but it's weird to see two consecutive errors just building the test binary and verifying the archive...

bors r+

craig bot pushed a commit that referenced this pull request Aug 28, 2018
28931: backport-2.0: storage: proactively renew expiration-based leases r=a-robinson a=a-robinson

Backport 2/2 commits from #25322.

/cc @cockroachdb/release

---

This mechanism won't scale incredibly well to proactively renewing all
expiration-based leases since it'd require a goroutine per lease. It
should be fine if we only auto-renew r1's lease, though.

Fixes #24753

Release note: None

----------------

To fix #27731 in v2.0.

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 28, 2018

Build succeeded

@craig craig bot merged commit 388eb8a into cockroachdb:release-2.0 Aug 28, 2018
craig bot pushed a commit that referenced this pull request Aug 30, 2018
29296: backport-2.0: Fix test flakes caused by automatic lease renewal r=a-robinson a=a-robinson

Backport:
  * 1/1 commits from "storage: Adding testing knob to disable automatic lease renewals" (#25625)
  * 1/1 commits from "storage: Fix flaky TestReplicaNotLeaseHolderError" (#25676)
  * 1/1 commits from "storage: Deflake TestReplicaLease" (#25781)

Please see individual PRs for details.

/cc @cockroachdb/release

Fixes #29189. The flakiness was introduced to the 2.0 branch by #28931.

Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
Co-authored-by: Ben Darnell <ben@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants