KAFKA-9212; Ensure LeaderAndIsr state updated in controller context during reassignment#7795
Conversation
junrao
left a comment
There was a problem hiding this comment.
@hachikuji : Thanks for the PR. LGTM
| for (MetadataResponse.PartitionMetadata partitionMetadata : metadata.partitionMetadata()) { | ||
|
|
||
| Consumer<PartitionInfo> addToPartitions = partitionInfo -> { | ||
| int epoch = partitionMetadata.leaderEpoch().orElse(RecordBatch.NO_PARTITION_LEADER_EPOCH); |
There was a problem hiding this comment.
Note this was a bug. We were using the leader epoch from the response even if it was stale and we had taken the PartitionInfo from the previous update.
There was a problem hiding this comment.
Do we have a test that covers this bug too?
There was a problem hiding this comment.
I believe testStaleMetadata() does that
ijuma
left a comment
There was a problem hiding this comment.
Thanks for the fix. Looks good overall, just a couple of comments below.
| assertEquals(-1, info.epoch()); | ||
| } | ||
|
|
||
| for (short version = 9; version <= ApiKeys.METADATA.oldestVersion(); version++) { |
There was a problem hiding this comment.
ApiKeys.METADATA.oldestVersion() -> ApiKeys.METADATA.latestVersion()?
| finishedUpdates.headOption.map { | ||
| case (partition, Right(leaderAndIsr)) => | ||
| finalLeaderIsrAndControllerEpoch = Some(LeaderIsrAndControllerEpoch(leaderAndIsr, epoch)) | ||
| finishedUpdates.get(partition).exists { |
There was a problem hiding this comment.
Nit: I think it would probably be clearer to just go with a Some/None pattern match given the side-effects we're doing below (throwing exceptions, mutations, etc).
|
One more question, did the reassignment test fail without the fix? |
| } | ||
|
|
||
| @Test | ||
| public void testIgnoreLeaderEpochInOlderMetadataResponse() { |
There was a problem hiding this comment.
nit: Could we also briefly explain the issue in the tests? Personally, I tend to read tests to understand the expected behavior and the issue with versions earlier than 9 is not immediately apparent
| } | ||
|
|
||
| @Test | ||
| def testProduceAndConsumeWithReassignmentInProgress(): Unit = { |
Yes, it does |
|
I pushed a commit with the test fix and a couple of other minor change. |
|
Merged to trunk. I think we should cherry-pick to 2.4 and 2.3 as well, but there are some conflicts. Will check if they're trivial or require a PR. |
|
Submitted #7800 for the 2.4 cherry-pick. That version should hopefully apply cleanly to 2.3 as well. |
…uring reassignment (apache#7795) KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync. This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing. Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests. Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication. Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
…uring reassignment (apache#7795) KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync. This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing. Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests. Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication. Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
…uring reassignment (#7795) (#7805) KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync. This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing. Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests. Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication. Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>
KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync.
This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing.
Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests.
Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication.
Committer Checklist (excluded from commit message)