Skip to content

KAFKA-10634: Adding LeaderId to Voters list in LeaderChangeMessage#9539

Merged
hachikuji merged 1 commit into
apache:trunkfrom
vamossagar12:KAFKA-10634
Dec 9, 2020
Merged

KAFKA-10634: Adding LeaderId to Voters list in LeaderChangeMessage#9539
hachikuji merged 1 commit into
apache:trunkfrom
vamossagar12:KAFKA-10634

Conversation

@vamossagar12

Copy link
Copy Markdown
Contributor

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@jsancio jsancio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @vamossagar12

Comment thread raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java Outdated
@vamossagar12 vamossagar12 force-pushed the KAFKA-10634 branch 2 times, most recently from a6d873c to 2689cb7 Compare November 6, 2020 04:11

@hachikuji hachikuji left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. Left a few minor comments.

Comment thread clients/src/main/resources/common/message/LeaderChangeMessage.json Outdated
Comment thread raft/src/main/java/org/apache/kafka/raft/QuorumState.java Outdated
Comment thread raft/src/test/java/org/apache/kafka/raft/RaftClientTestContext.java Outdated
@vamossagar12

Copy link
Copy Markdown
Contributor Author

Thanks for the patch. Left a few minor comments.

Thanks @hachikuji , i have replied on the comments..

@jsancio jsancio left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional changes @vamossagar12 . Left a few comments.

Comment thread raft/src/main/java/org/apache/kafka/raft/KafkaRaftClient.java Outdated
Comment thread clients/src/main/resources/common/message/LeaderChangeMessage.json Outdated
Comment thread clients/src/main/resources/common/message/LeaderChangeMessage.json Outdated
Comment thread raft/src/main/java/org/apache/kafka/raft/LeaderState.java Outdated
@vamossagar12

Copy link
Copy Markdown
Contributor Author

Thanks for the additional changes @vamossagar12 . Left a few comments.

Thanks @jsancio , I had a couple of basic questions..

@vamossagar12 vamossagar12 force-pushed the KAFKA-10634 branch 2 times, most recently from 944f9af to ce275e2 Compare November 27, 2020 18:06
@vamossagar12

Copy link
Copy Markdown
Contributor Author

@jsancio , made the changes.. plz review when you get a chance.

@vamossagar12

Copy link
Copy Markdown
Contributor Author

@jsancio , I see some timeout related errors in the CI/CD checks. I am not sure if these are relevant to the changes I have made?

Comment thread clients/src/main/resources/common/message/LeaderChangeMessage.json Outdated

@hachikuji hachikuji Dec 1, 2020

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the terminology may be causing some confusion here. The "granting" voters are those who have voted for the leader and which the leader is using as the basis of its election. The "endorsing" voters are those who have acknowledged the leader. The set of endorsing voters changes over time as the voters convert to followers and begin fetching. We use nonEndorsingVoters to know which voters still need to be sent BeginQuorumEpoch. Note that it is possible for a voter to have granted a vote, but not yet endorsed the election (because it does not know the vote succeeded).

We should not conflate these sets, but maybe we can come up with better terms to avoid the confusion. I would stick with "granting" voters as the set of voters who granted the leader's candidacy. However, maybe we can change "endorsing" to "acknowledged"? These are the voters who have acknowledged the election result. What do you think?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. As you explained, endorsing are those voters who have acknowledged the BeginQuorumEpoch request from the leader, so eventually if the leader is accepted, this set will be all the voters who voted.
Regarding endorsing v/s acknowledged, can we also add what has been acknowledged? Like, endoresedQuorumEpoch or endorsedLeaderCandidacy or something like this. Just adding context, might clear the confusion. WDYT @hachikuji , @jsancio

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know the subtle difference. "acknowledged" sounds good to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I will change it to acknowledged then..

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsancio , going by all the discussions here, I needed to revert some of the changes you had suggested in #9539 (comment) as a couple of test cases were failing. The issue is on this line:

https://github.com/apache/kafka/blob/ce275e29beb48611425708fbe67d28d17329845a/raft/src/main/java/org/apache/kafka/raft/LeaderState.java#L52

Note that, this is a deviation from the original code which was:

boolean hasEndorsedLeader = voterId == localId;

and the test case which was failing because of this was this line in particular:

https://github.com/apache/kafka/pull/9539/files#diff-01c9950503611c357b85ef5b8850d2fbc6060075ce028bf4f86dfc6cbd63fd3cR255

Reason being, when transitionToLeader is invoked from QuorumState upon becoming the leader, it passes in the set of granting voters. Keeping in mind the explanation provided by @hachikuji above, these are all the voters who voted for the current leader.

If you check the above erroneous condition, we are checking if a voter has endorsed the voting by checking if it is present in the granting voters list:

boolean hasEndorsedLeader = grantingVoters.contains(voterId);

So, the problem is, this condition will force all voters to be considered as having endorsed the Leader even before the BeginQuorumEpoch request response dance has begun.

The test cases in question fails because the pollLeader() method in KafkaRaftClient, checks the nonEndorsingVoters and sends the BeginQuorumEpoch request to nonEndorsed ones here:

long timeUntilSend = maybeSendRequests(
            currentTimeMs,
            state.nonEndorsingVoters(),
            this::buildBeginQuorumEpochRequest
        );

but it never gets to do that because every granting voter has been marked as an endorsed voter.

Lastly, we don't really need the acknowledgedVoters here because all we need are the granting voters in the LeaderChangeMessage i.e the voters who voted for the Leader in the current epoch.

So, I am reverting the changes and now, the test cases have all passed as well. There were some timeout related errors as well which I believe which got resolved as well.

@hachikuji hachikuji left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the patience on this PR. I think we should merge this and leave the discussed name change for a follow-up. I have filed https://issues.apache.org/jira/browse/KAFKA-10828.

@vamossagar12

vamossagar12 commented Dec 9, 2020

Copy link
Copy Markdown
Contributor Author

Thanks @hachikuji . I have assigned the name change issue to myself. I see 1 test failures but those don't seem to be connected to this issue.
1 of them is

kafka.network.ConnectionQuotasTest > testListenerConnectionRateLimitWhenActualRateAboveLimit FAILED
08:54:04      java.util.concurrent.ExecutionException: java.lang.AssertionError: Expected rate (30 +- 7), but got 39.12107974180088 (600 connections / 15.337 sec) expected:<30.0> but was:<39.12107974180088>
08:54:04          at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
08:54:04          at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)
08:54:04          at kafka.network.ConnectionQuotasTest.$anonfun$testListenerConnectionRateLimitWhenActualRateAboveLimit$3(ConnectionQuotasTest.scala:414)
08:54:04          at scala.collection.immutable.List.foreach(List.scala:333)
08:54:04          at kafka.network.ConnectionQuotasTest.testListenerConnectionRateLimitWhenActualRateAboveLimit(ConnectionQuotasTest.scala:414)
08:54:04  
08:54:04          Caused by:
08:54:04          java.lang.AssertionError: Expected rate (30 +- 7), but got 39.12107974180088 (600 connections / 15.337 sec) expected:<30.0> but was:<39.12107974180088>
08:54:04              at org.junit.Assert.fail(Assert.java:89)
08:54:04              at org.junit.Assert.failNotEquals(Assert.java:835)
08:54:04              at org.junit.Assert.assertEquals(Assert.java:555)
08:54:04              at kafka.network.ConnectionQuotasTest.acceptConnectionsAndVerifyRate(ConnectionQuotasTest.scala:871)
08:54:04              at kafka.network.ConnectionQuotasTest.$anonfun$testListenerConnectionRateLimitWhenActualRateAboveLimit$2(ConnectionQuotasTest.scala:412)
08:54:04

and the other one being:

10:12:53  kafka.server.ScramServerStartupTest > testAuthentications FAILED
10:12:53      org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = NoAuth for /config/changes
10:12:53          at org.apache.zookeeper.KeeperException.create(KeeperException.java:120)
10:12:53          at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
10:12:53          at kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:583)
10:12:53          at kafka.zk.KafkaZkClient.createRecursive(KafkaZkClient.scala:1731)

@hachikuji hachikuji merged commit 99b5e4f into apache:trunk Dec 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants