Skip to content

kvserver: actually renew expiration-based leases#65653

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:repl-renewal
Jun 10, 2021
Merged

kvserver: actually renew expiration-based leases#65653
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:repl-renewal

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented May 25, 2021

We were never looping around more than once as a result of
never resetting the timer. This was noticed by contributor
@llllash in #65393. Thank you!

Closes #65393.

It seems as though the mechanism to prevent ping-ponging of the r1 lease never quite worked, i.e. this issue wasn't actually closed: #24753

It's unclear why this hasn't caused more trouble over the last three years. However, looking at the original issue, it seems like you need very precise timing to get the problem to appear. Maybe that is a good enough explanation.

Release note: None

@tbg tbg requested a review from a team May 25, 2021 10:02
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I wonder why no existing tests caught this. We surely have something the exercises this logic, right? If we disable this entirely, does any test fail? Were the tests passing because we did renew the expiration-based lease, but only once?

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvserver/store.go, line 1730 at r1 (raw file):

		renewalDuration := s.cfg.RangeLeaseActiveDuration() / 5
		for {
			var n int

nit: the purpose of n wasn't immediately clear to me. Consider numRenewableLeases. Then it's easier to understand that we only reset the timer if we have at least one renewable lease, without having to read the logic in the loop.


pkg/kv/kvserver/store.go, line 1732 at r1 (raw file):

			var n int
			s.renewableLeases.Range(func(k int64, v unsafe.Pointer) bool {
				n++

Do we want to want to add to the counter if we hit the s.renewableLeases.Delete(k) path?

Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and the kvserver... tests do pass with the loop deactivated. This makes sense when I look at the original PR, which did not add a test: https://github.com/cockroachdb/cockroach/pull/25322/files
I added a test, PTAL.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvserver/store.go, line 1732 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we want to want to add to the counter if we hit the s.renewableLeases.Delete(k) path?

I think that's fine - it'll give us an extra loop but that's basically free and keeps the code simpler.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 4 of 4 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/kv/kvserver/testing_knobs.go, line 340 at r2 (raw file):

	// LeaseRenewalSignalChan populates `Store.renewableLeasesSignal`.
	LeaseRenewalSignalChan chan struct{}
	// LeaseRenewalOnPostCycle is invoked after each lease renewal cycle.

We should mention what the meaning is of the provided map.

We were never looping around more than once as a result of
never resetting the timer. This was noticed by contributor
@llllash in cockroachdb#65393. Thank you!

Closes cockroachdb#65393.
Co-authored-by: linyimin <linyimin@baidu.com>

Release note: None
@tbg tbg requested a review from nvb June 10, 2021 10:36
Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r=nvanbenschoten
TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jun 10, 2021

Build succeeded:

@craig craig bot merged commit fcf1eda into cockroachdb:master Jun 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

stability: partitioned gossip network caused by ping-ponging of r1's lease

4 participants