KAFKA-10634: Adding LeaderId to Voters list in LeaderChangeMessage#9539
Conversation
jsancio
left a comment
There was a problem hiding this comment.
Thanks for the PR @vamossagar12
a6d873c to
2689cb7
Compare
hachikuji
left a comment
There was a problem hiding this comment.
Thanks for the patch. Left a few minor comments.
2689cb7 to
cc9c163
Compare
Thanks @hachikuji , i have replied on the comments.. |
cc9c163 to
b44f931
Compare
jsancio
left a comment
There was a problem hiding this comment.
Thanks for the additional changes @vamossagar12 . Left a few comments.
Thanks @jsancio , I had a couple of basic questions.. |
944f9af to
ce275e2
Compare
|
@jsancio , made the changes.. plz review when you get a chance. |
|
@jsancio , I see some timeout related errors in the CI/CD checks. I am not sure if these are relevant to the changes I have made? |
There was a problem hiding this comment.
I think the terminology may be causing some confusion here. The "granting" voters are those who have voted for the leader and which the leader is using as the basis of its election. The "endorsing" voters are those who have acknowledged the leader. The set of endorsing voters changes over time as the voters convert to followers and begin fetching. We use nonEndorsingVoters to know which voters still need to be sent BeginQuorumEpoch. Note that it is possible for a voter to have granted a vote, but not yet endorsed the election (because it does not know the vote succeeded).
We should not conflate these sets, but maybe we can come up with better terms to avoid the confusion. I would stick with "granting" voters as the set of voters who granted the leader's candidacy. However, maybe we can change "endorsing" to "acknowledged"? These are the voters who have acknowledged the election result. What do you think?
There was a problem hiding this comment.
Yeah that makes sense. As you explained, endorsing are those voters who have acknowledged the BeginQuorumEpoch request from the leader, so eventually if the leader is accepted, this set will be all the voters who voted.
Regarding endorsing v/s acknowledged, can we also add what has been acknowledged? Like, endoresedQuorumEpoch or endorsedLeaderCandidacy or something like this. Just adding context, might clear the confusion. WDYT @hachikuji , @jsancio
There was a problem hiding this comment.
Good to know the subtle difference. "acknowledged" sounds good to me.
There was a problem hiding this comment.
Ok. I will change it to acknowledged then..
There was a problem hiding this comment.
@jsancio , going by all the discussions here, I needed to revert some of the changes you had suggested in #9539 (comment) as a couple of test cases were failing. The issue is on this line:
Note that, this is a deviation from the original code which was:
boolean hasEndorsedLeader = voterId == localId;
and the test case which was failing because of this was this line in particular:
Reason being, when transitionToLeader is invoked from QuorumState upon becoming the leader, it passes in the set of granting voters. Keeping in mind the explanation provided by @hachikuji above, these are all the voters who voted for the current leader.
If you check the above erroneous condition, we are checking if a voter has endorsed the voting by checking if it is present in the granting voters list:
boolean hasEndorsedLeader = grantingVoters.contains(voterId);
So, the problem is, this condition will force all voters to be considered as having endorsed the Leader even before the BeginQuorumEpoch request response dance has begun.
The test cases in question fails because the pollLeader() method in KafkaRaftClient, checks the nonEndorsingVoters and sends the BeginQuorumEpoch request to nonEndorsed ones here:
long timeUntilSend = maybeSendRequests(
currentTimeMs,
state.nonEndorsingVoters(),
this::buildBeginQuorumEpochRequest
);
but it never gets to do that because every granting voter has been marked as an endorsed voter.
Lastly, we don't really need the acknowledgedVoters here because all we need are the granting voters in the LeaderChangeMessage i.e the voters who voted for the Leader in the current epoch.
So, I am reverting the changes and now, the test cases have all passed as well. There were some timeout related errors as well which I believe which got resolved as well.
ce275e2 to
f57c6c5
Compare
f57c6c5 to
16432f8
Compare
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patience on this PR. I think we should merge this and leave the discussed name change for a follow-up. I have filed https://issues.apache.org/jira/browse/KAFKA-10828.
|
Thanks @hachikuji . I have assigned the name change issue to myself. I see 1 test failures but those don't seem to be connected to this issue. and the other one being: |
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)