KAFKA-9212; Ensure LeaderAndIsr state updated in controller context during reassignment by hachikuji · Pull Request #7795 · apache/kafka

hachikuji · 2019-12-07T02:28:49Z

KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync.

This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing.

Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests.

Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…ssignment

junrao

@hachikuji : Thanks for the PR. LGTM

hachikuji · 2019-12-07T21:30:22Z

                for (MetadataResponse.PartitionMetadata partitionMetadata : metadata.partitionMetadata()) {
-
-                    Consumer<PartitionInfo> addToPartitions = partitionInfo -> {
-                        int epoch = partitionMetadata.leaderEpoch().orElse(RecordBatch.NO_PARTITION_LEADER_EPOCH);


Note this was a bug. We were using the leader epoch from the response even if it was stale and we had taken the PartitionInfo from the previous update.

Do we have a test that covers this bug too?

I believe testStaleMetadata() does that

ijuma

Thanks for the fix. Looks good overall, just a couple of comments below.

ijuma · 2019-12-08T07:18:08Z

+            assertEquals(-1, info.epoch());
+        }
+
+        for (short version = 9; version <= ApiKeys.METADATA.oldestVersion(); version++) {


ApiKeys.METADATA.oldestVersion() -> ApiKeys.METADATA.latestVersion()?

ijuma · 2019-12-08T07:23:42Z

-          finishedUpdates.headOption.map {
-            case (partition, Right(leaderAndIsr)) =>
-              finalLeaderIsrAndControllerEpoch = Some(LeaderIsrAndControllerEpoch(leaderAndIsr, epoch))
+          finishedUpdates.get(partition).exists {


Nit: I think it would probably be clearer to just go with a Some/None pattern match given the side-effects we're doing below (throwing exceptions, mutations, etc).

ijuma · 2019-12-08T07:59:15Z

One more question, did the reassignment test fail without the fix?

stanislavkozlovski · 2019-12-08T11:32:14Z

    }

+    @Test
+    public void testIgnoreLeaderEpochInOlderMetadataResponse() {


nit: Could we also briefly explain the issue in the tests? Personally, I tend to read tests to understand the expected behavior and the issue with versions earlier than 9 is not immediately apparent

Is it OK now?

stanislavkozlovski · 2019-12-08T11:37:11Z

  }

+  @Test
+  def testProduceAndConsumeWithReassignmentInProgress(): Unit = {


stanislavkozlovski · 2019-12-08T11:38:00Z

@ijuma

One more question, did the reassignment test fail without the fix?

Yes, it does

ijuma · 2019-12-08T17:33:08Z

I pushed a commit with the test fix and a couple of other minor change.

ijuma

LGTM

ijuma · 2019-12-08T19:40:59Z

Merged to trunk. I think we should cherry-pick to 2.4 and 2.3 as well, but there are some conflicts. Will check if they're trivial or require a PR.

ijuma · 2019-12-09T02:25:40Z

Submitted #7800 for the 2.4 cherry-pick. That version should hopefully apply cleanly to 2.3 as well.

…uring reassignment (apache#7795) KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync. This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing. Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests. Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication. Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>

…uring reassignment (#7795) (#7805) KIP-320 improved fetch semantics by adding leader epoch validation. This relies on reliable propagation of leader epoch information from the controller. Unfortunately, we have encountered a bug during partition reassignment in which the leader epoch in the controller context does not get properly updated. This causes UpdateMetadata requests to be sent with stale epoch information which results in the metadata caches on the brokers falling out of sync. This bug has existed for a long time, but it is only a problem due to the new epoch validation done by the client. Because the client includes the stale leader epoch in its requests, the leader rejects them, yet the stale metadata cache on the brokers prevents the consumer from getting the latest epoch. Hence the consumer cannot make progress while a reassignment is ongoing. Although it is straightforward to fix this problem in the controller for the new releases (which this patch does), it is not so easy to fix older brokers which means new clients could still encounter brokers with this bug. To address this problem, this patch also modifies the client to treat the leader epoch returned from the Metadata response as "unreliable" if it comes from an older version of the protocol. The client in this case will discard the returned epoch and it won't be included in any requests. Also, note that the correct epoch is still forwarded to replicas correctly in the LeaderAndIsr request, so this bug does not affect replication. Reviewers: Jun Rao <junrao@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>

KAFKA-9212; Update LeaderAndIsr state in controller context after rea…

f21698b

…ssignment

hachikuji force-pushed the KAFKA-9212 branch from 6863afc to f21698b Compare December 7, 2019 02:43

hachikuji changed the title ~~KAFKA-9212; Update LeaderAndIsr state in controller context after reassignment~~ KAFKA-9212; Ensure LeaderAndIsr state updated in controller context during reassignment Dec 7, 2019

junrao approved these changes Dec 7, 2019

View reviewed changes

Add Metadata test cases and fix bug handling stale updates

1616db1

hachikuji commented Dec 7, 2019

View reviewed changes

ijuma reviewed Dec 8, 2019

View reviewed changes

stanislavkozlovski reviewed Dec 8, 2019

View reviewed changes

Fix test and minor tweaks

cfb88f4

ijuma approved these changes Dec 8, 2019

View reviewed changes

ijuma merged commit 5d0cb14 into apache:trunk Dec 8, 2019

Conversation

hachikuji commented Dec 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

junrao left a comment

Choose a reason for hiding this comment

Uh oh!

hachikuji Dec 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ijuma Dec 8, 2019

Choose a reason for hiding this comment

Uh oh!

stanislavkozlovski Dec 8, 2019

Choose a reason for hiding this comment

Uh oh!

ijuma left a comment

Choose a reason for hiding this comment

Uh oh!

ijuma Dec 8, 2019

Choose a reason for hiding this comment

Uh oh!

ijuma Dec 8, 2019

Choose a reason for hiding this comment

Uh oh!

ijuma commented Dec 8, 2019

Uh oh!

stanislavkozlovski Dec 8, 2019

Choose a reason for hiding this comment

Uh oh!

ijuma Dec 8, 2019

Choose a reason for hiding this comment

Uh oh!

stanislavkozlovski Dec 8, 2019

Choose a reason for hiding this comment

Uh oh!

stanislavkozlovski commented Dec 8, 2019

Uh oh!

ijuma commented Dec 8, 2019

Uh oh!

ijuma left a comment

Choose a reason for hiding this comment

Uh oh!

ijuma commented Dec 8, 2019

Uh oh!

ijuma commented Dec 9, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hachikuji commented Dec 7, 2019 •

edited

Loading

hachikuji Dec 7, 2019 •

edited

Loading