kvserver: replica might crash evaluating a condition as leaseholder, …#78438
Conversation
|
nvm, I see that the restore roachtest isn't actually hitting the |
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @shralex)
pkg/kv/kvserver/replica_command.go, line 1213 at r1 (raw file):
if voterIncomingTarget == (roachpb.ReplicaDescriptor{}) { // Couldn't find a VOTER_INCOMING target. This should not happen since we only go into the
We'll want to update this comment with a discussion on how we can reach this point and what we expect to happen after we return nil in the case where we aren't actually the leaseholder and in the case where we are (mistakenly).
pkg/kv/kvserver/replica_command.go, line 1216 at r1 (raw file):
// JOINT config if the leaseholder is being removed, when there is a VOTER_INCOMING replica. // Killing the leaseholder to force lease transfer. log.Warningf(ctx, "no VOTER_INCOMING to transfer lease to. "+
Let's actually make this an Infof since it's not unexpected to end up here. And can we update the message to describe the situation more precisely? A rule of thumb for log messages is to ask the question: will logging this lead to L2 pages from users of Cockroach who closely monitor their logs for "concerning" looking logs?
…when it isn't In cockroachdb#77841, we only remove the leaseholder if there is a VOTER_INCOMING replica in the same JOINT config. But it is also possible that a node A was the lesaeholder, and then the lease gets transferred to node B, who tries to remove A. Since B isn't getting removed, it is fine to enter a JOINT config without a VOTER_INCOMING node. But A might still believe its the leaseholder, and this might cause it to crash when it finds out that its in a JOINT config and there is no INCOMING node. In this case, we can just continue, since an error will be thrown in replica_raft.go Release note: None Release justification: fixes flakiness caused by cockroachdb#77841
2a04669 to
7cb3bf0
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @shralex)
|
bors r+ |
|
Build succeeded: |
…when it isn't
In #77841, we only remove the
leaseholder if there is a VOTER_INCOMING replica in the same JOINT config.
But it is also possible that a node A was the lesaeholder, and then the lease
gets transferred to node B, who tries to remove A. Since B isn't getting removed,
it is fine to enter a JOINT config without a VOTER_INCOMING node. But A
might still believe its the leaseholder, and this might cause it to
crash when it finds out that its in a JOINT config and there is no INCOMING node.
In this case, we can just continue, since an error will be thrown in replica_raft.go
This resolves #77937
Release note: None
Release justification: fixes flakiness caused by #77841