kvserver: ignore lease validity when checking lease preferences#107967
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
fbe6f62 to
1bb0d44
Compare
|
Thanks for the review! I'll wait for @erikgrinaker to take a look before merging. |
erikgrinaker
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/replica_proposal.go line 525 at r1 (raw file):
// lag, or previously needing a snapshot. The replicate queue will ensure the // lease is valid and owned by the replica before processing. if iAmTheLeaseHolder &&
Removing the leaseChangingHands condition here seems problematic. With e.g. expiration-based leases, this will cause us to evaluate and enqueue thousands of times per second, rather than once per range on the initial acquisition.
Since lease preferences are unlikely to change, it should be sufficient to enqueue once when we acquire, and let the replicate queue deal with retries or async updates.
1bb0d44 to
632748d
Compare
kvoli
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker and @kvoli)
pkg/kv/kvserver/replica_proposal.go line 525 at r1 (raw file):
moving the
leaseChangingHandscondition here seems problematic. With e.g. expiration-based leases, this will cause us to evaluate and enqueue thousands of times per second, rather than once per range on the initial acquisition.
Yeah, this is a good point. I was a bit worried, however, unsure what the frequency would look like. I added back the leaseChangingHands check.
erikgrinaker
left a comment
There was a problem hiding this comment.
Reviewed 3 of 4 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)
pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):
// not, as the store either doesn't own the lease, or doesn't own a valid // lease. return true
Why do we return true here when there's a valid lease elsewhere, and in the common case that lease will satisfy the preferences? Should this just be false?
In cockroachdb#107507, we began eagerly enqueuing into the replicate queue, when acquiring a replica acquired a new lease which violated lease preferences. Lease preferences were only considered violated when the lease itself was valid. In cockroachdb#107507, we saw that it is uncommon, but possible for an invalid lease to be acquired, violate lease preferences and not be enqueued as a result. The end result was a leaseholder violating the applied lease preferences which would not be resolved until the next scanner cycle. Update the eager enqueue check to only check that the replica is the incoming leaseholder when applying the lease, and that the replica violates the applied lease preferences. The check now applies on any lease acquisition, where previously it only occurred on the leaseholder changing. Note the purgatory error introduced in cockroachdb#107507, still checks that the lease is valid and owned by the store before proceeding. It is a condition that the lease must be valid+owned by the store to have a change planned, so whilst it is possible the lease becomes invalid somewhere in-between planning, when the replica applies a valid lease, it will still be enqueued, so purgatory is unnecessary. Fixes: cockroachdb#107862 Release note: None
632748d to
92f268a
Compare
kvoli
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @erikgrinaker)
pkg/kv/kvserver/replica_range_lease.go line 1586 at r2 (raw file):
Previously, erikgrinaker (Erik Grinaker) wrote…
Why do we return true here when there's a valid lease elsewhere, and in the common case that lease will satisfy the preferences? Should this just be false?
Woops... This should be false. You're right.
erikgrinaker
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @kvoli)
|
TYFTRs! bors r=erikgrinaker,andrewbaptist |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 92f268a to blathers/backport-release-23.1-107967: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
In #107507, we began eagerly enqueuing into the replicate queue, when
acquiring a replica acquired a new lease which violated lease
preferences. Lease preferences were only considered violated when the
lease itself was valid. In #107507, we saw that it is uncommon, but
possible for an invalid lease to be acquired, violate lease preferences
and not be enqueued as a result. The end result was a leaseholder
violating the applied lease preferences which would not be resolved
until the next scanner cycle.
Update the eager enqueue check to only check that the replica is the
incoming leaseholder when applying the lease, and that the replica
violates the applied lease preferences.
Note the purgatory error introduced in #107507, still checks that the
lease is valid and owned by the store before proceeding. It is a
condition that the lease must be valid+owned by the store to have a
change planned, so whilst it is possible the lease becomes invalid
somewhere in-between planning, when the replica applies a valid lease,
it will still be enqueued so purgatory is unnecessary.
Fixes: #107862
Release note: None