storage: prevent concurrent proposals and lease transfers#41473
Conversation
06ae164 to
bf195f7
Compare
|
I've now typed the latching code and I think it is just simpler and better than the first commit. |
nvb
left a comment
There was a problem hiding this comment.
I agree with your preference for the second commit. However, I think the first commit does still serve a purpose. By placing the failure code in leasePostApply, we also avoid ambiguous errors on non-cooperative leaseholder changes. Weren't you were seeing a case where a leaseholder had its lease revoked without it immediately knowing (due to a disjoint raft leader) and only later finding this out when it applied the raft entry with the new lease?
Reviewed 2 of 2 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
pkg/storage/client_replica_test.go, line 2824 at r2 (raw file):
// but it potentially still has pending commands which it thinks might // have been proposed. This can occur if there are commands which are // proposed after lease transfer has been proposed but before the lease
"a lease transfer" or "the lease transfer"
pkg/storage/batcheval/cmd_lease_transfer.go, line 33 at r2 (raw file):
// current range descriptor (desc) but it could potentially change due to an // as of yet unapplied merge. spans.Add(spanset.SpanReadOnly, roachpb.Span{Key: keys.LocalMax, EndKey: keys.MaxKey})
This will play well with #39765. With that change, you could declare this span as non-MVCC so that it will interfere with writes at a higher timestamp.
The case where we'll avoid ambiguous errors in a non-cooperative lease transfer case requires the old leaseholder to append and apply the lease transfer before it realizes that its been removed. The only source of ambiguity is when the replica has been removed. I suspect this case is pretty rare as generally if a leaseholder has had its lease taken non-cooperatively it probably was partitioned. If it was removed while it was partitioned it seems pretty likely that it will receive a |
947304e to
52e7747
Compare
ajwerner
left a comment
There was a problem hiding this comment.
This is RFAL
cc @irfansharif
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/batcheval/cmd_lease_transfer.go, line 33 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This will play well with #39765. With that change, you could declare this span as non-MVCC so that it will interfere with writes at a higher timestamp.
Done.
irfansharif
left a comment
There was a problem hiding this comment.
Reviewed 4 of 4 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained
irfansharif
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/storage/client_replica_test.go, line 2824 at r3 (raw file):
// but it potentially still has pending commands which it thinks might // have been proposed. This can occur if there are commands which are // proposed after thje lease transfer has been proposed but before the lease
"thje" -> "the".
…sfers This commit prevents ambiguous response errors which were previously possible due replicas which were formerly lease holders being removed while holding pending proposals. The change ensures that there cannot be pending proposals concurrent with a lease transfer. Release note: None
52e7747 to
ab9f433
Compare
ajwerner
left a comment
There was a problem hiding this comment.
@nvanbenschoten do you agree that this commit should not be backported? How do you feel about backporting the logic in the old first commit?
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif)
pkg/storage/client_replica_test.go, line 2824 at r3 (raw file):
Previously, irfansharif (irfan sharif) wrote…
"thje" -> "the".
Done.
irfansharif
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
nvb
left a comment
There was a problem hiding this comment.
do you agree that this commit should not be backported? How do you feel about backporting the logic in the old first commit?
I don't think we should backport any of this. The potential for stalls or other fallout seems too high to me and this isn't addressing a new issue.
Reviewed 1 of 4 files at r3, 1 of 1 files at r4.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale)
|
TFTR! bors r+ |
41473: storage: prevent concurrent proposals and lease transfers r=ajwerner a=ajwerner Before this PR we would not fail proposals when we learned about a lease transfer. Instead we waited for a raft election before attempting to repropose. Furthermore we generally just waited for those proposals to be committed but fail to apply before notifying clients. This lazy failure detection is problematic when the proposing replica is removed before the entries are committed as it can lead to an ambiguous result. Informs #37815. Release justification: low risk, high benefit changes to existing functionality. Release note: None Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
Before this PR we would not fail proposals when we learned about a lease
transfer. Instead we waited for a raft election before attempting to repropose.
Furthermore we generally just waited for those proposals to be committed but
fail to apply before notifying clients. This lazy failure detection is
problematic when the proposing replica is removed before the entries are
committed as it can lead to an ambiguous result.
Informs #37815.
Release justification: low risk, high benefit changes to existing functionality.
Release note: None