Skip to content

kvserver: ignore lease validity when checking lease preferences#107967

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
kvoli:230801.lease-preferences-unknown-enqueue-option-2
Aug 2, 2023
Merged

kvserver: ignore lease validity when checking lease preferences#107967
craig[bot] merged 1 commit intocockroachdb:masterfrom
kvoli:230801.lease-preferences-unknown-enqueue-option-2

Conversation

@kvoli
Copy link
Copy Markdown
Contributor

@kvoli kvoli commented Aug 1, 2023

In #107507, we began eagerly enqueuing into the replicate queue, when
acquiring a replica acquired a new lease which violated lease
preferences. Lease preferences were only considered violated when the
lease itself was valid. In #107507, we saw that it is uncommon, but
possible for an invalid lease to be acquired, violate lease preferences
and not be enqueued as a result. The end result was a leaseholder
violating the applied lease preferences which would not be resolved
until the next scanner cycle.

Update the eager enqueue check to only check that the replica is the
incoming leaseholder when applying the lease, and that the replica
violates the applied lease preferences.

Note the purgatory error introduced in #107507, still checks that the
lease is valid and owned by the store before proceeding. It is a
condition that the lease must be valid+owned by the store to have a
change planned, so whilst it is possible the lease becomes invalid
somewhere in-between planning, when the replica applies a valid lease,
it will still be enqueued so purgatory is unnecessary.

Fixes: #107862
Release note: None

@kvoli kvoli added the backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only label Aug 1, 2023
@kvoli kvoli self-assigned this Aug 1, 2023
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 1, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@kvoli kvoli force-pushed the 230801.lease-preferences-unknown-enqueue-option-2 branch 2 times, most recently from fbe6f62 to 1bb0d44 Compare August 1, 2023 20:15
@kvoli kvoli marked this pull request as ready for review August 1, 2023 20:25
@kvoli kvoli requested a review from a team as a code owner August 1, 2023 20:25
@kvoli kvoli requested review from a team, andrewbaptist and erikgrinaker August 1, 2023 20:25
Copy link
Copy Markdown

@andrewbaptist andrewbaptist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@kvoli
Copy link
Copy Markdown
Contributor Author

kvoli commented Aug 1, 2023

Thanks for the review! I'll wait for @erikgrinaker to take a look before merging.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/replica_proposal.go line 525 at r1 (raw file):

	// lag, or previously needing a snapshot. The replicate queue will ensure the
	// lease is valid and owned by the replica before processing.
	if iAmTheLeaseHolder &&

Removing the leaseChangingHands condition here seems problematic. With e.g. expiration-based leases, this will cause us to evaluate and enqueue thousands of times per second, rather than once per range on the initial acquisition.

Since lease preferences are unlikely to change, it should be sufficient to enqueue once when we acquire, and let the replicate queue deal with retries or async updates.

@kvoli kvoli force-pushed the 230801.lease-preferences-unknown-enqueue-option-2 branch from 1bb0d44 to 632748d Compare August 2, 2023 13:32
Copy link
Copy Markdown
Contributor Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker and @kvoli)


pkg/kv/kvserver/replica_proposal.go line 525 at r1 (raw file):

moving the leaseChangingHands condition here seems problematic. With e.g. expiration-based leases, this will cause us to evaluate and enqueue thousands of times per second, rather than once per range on the initial acquisition.

Yeah, this is a good point. I was a bit worried, however, unsure what the frequency would look like. I added back the leaseChangingHands check.

@kvoli kvoli requested a review from erikgrinaker August 2, 2023 14:21
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 4 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)


pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):

		// not, as the store either doesn't own the lease, or doesn't own a valid
		// lease.
		return true

Why do we return true here when there's a valid lease elsewhere, and in the common case that lease will satisfy the preferences? Should this just be false?

In cockroachdb#107507, we began eagerly enqueuing into the replicate queue, when
acquiring a replica acquired a new lease which violated lease
preferences. Lease preferences were only considered violated when the
lease itself was valid. In cockroachdb#107507, we saw that it is uncommon, but
possible for an invalid lease to be acquired, violate lease preferences
and not be enqueued as a result. The end result was a leaseholder
violating the applied lease preferences which would not be resolved
until the next scanner cycle.

Update the eager enqueue check to only check that the replica is the
incoming leaseholder when applying the lease, and that the replica
violates the applied lease preferences. The check now applies on any
lease acquisition, where previously it only occurred on the leaseholder
changing.

Note the purgatory error introduced in cockroachdb#107507, still checks that the
lease is valid and owned by the store before proceeding. It is a
condition that the lease must be valid+owned by the store to have a
change planned, so whilst it is possible the lease becomes invalid
somewhere in-between planning, when the replica applies a valid lease,
it will still be enqueued, so purgatory is unnecessary.

Fixes: cockroachdb#107862
Release note: None
@kvoli kvoli force-pushed the 230801.lease-preferences-unknown-enqueue-option-2 branch from 632748d to 92f268a Compare August 2, 2023 18:02
Copy link
Copy Markdown
Contributor Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker)


pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Why do we return true here when there's a valid lease elsewhere, and in the common case that lease will satisfy the preferences? Should this just be false?

Woops... This should be false. You're right.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)

@kvoli
Copy link
Copy Markdown
Contributor Author

kvoli commented Aug 2, 2023

TYFTRs!

bors r=erikgrinaker,andrewbaptist

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 2, 2023

Build succeeded:

@craig craig bot merged commit 46255c8 into cockroachdb:master Aug 2, 2023
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 2, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 92f268a to blathers/backport-release-23.1-107967: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: lease-preferences/full-first-preference-down failed

4 participants