KAFKA-9069: Flaky Test AdminClientIntegrationTest.testCreatePartitions#7619
KAFKA-9069: Flaky Test AdminClientIntegrationTest.testCreatePartitions#7619huxihx wants to merge 1 commit into
Conversation
|
@guozhangwang @hachikuji Please review this patch. Thanks. |
There was a problem hiding this comment.
DescribeTopicsOptions is a public API so if we do want to add more interfaces we need to do a small KIP..
On the other hand, I'm wondering how certain we are that the root cause of the flaky test failure is due to not reading from the controller? Eventually the metadata should be propagated to other nodes as well and our default waitUntilTrue is taking 15 secs, so not sure if we should fix the test if it just taking more than 15 secs by changing the behavior of the admin client to always talk to controller (since it would also introduce large amount of loads on the controller).
There was a problem hiding this comment.
Regardless of whether failing reading from controller is the root cause, do you think it makes sense to offer an alternative option for users to let them select which nodes to read the metadata?
There was a problem hiding this comment.
I agree that the current metadata propagation staleness could be an issue, in fact one of the major motivations of KIP-500 is to make the metadata propagation more controllable and up-to-date.
However, I think let users select if they should read from controller or not could be a risky option, since once opens up this opportunity I'd predict everyone would select to read from controller to get most up-to-date metadata which would bring a large load on the controller.
There was a problem hiding this comment.
So, what about retrofitting numPartitions to accept an expected partition number and having getTopicMetadata more time to await?
There was a problem hiding this comment.
That sounds promising. Could you provide a revision of the PR so I can take a closer look at the proposal?
There was a problem hiding this comment.
Seems getTopicMetadata is only called at the beginning of testCreatePartitions, but from the reported trace it is not thrown from here.
|
retest this please. |
There was a problem hiding this comment.
On line 580, I think we should change it to assertEqual(3, numParititons(topic1, expectedNumPartitionsOpt = Some(3))).
There was a problem hiding this comment.
On line 612, I think we should change it to assertEqual(3, numParititons(topic1, expectedNumPartitionsOpt = Some(3))). Even though we know that it had to have succeeded once, future describe may go to a different broker which may return a different value.
In other words tests should always wait for describe to return the expected value an not assume that all describes will return the expected value.
There was a problem hiding this comment.
There are a few more cases. I think every call to numPartitions should be changed to pass the expected number of partitions. Having said that maybe we should change this signature to always take an Int. E.g.
def assertNumPartitions(topic: String, expectedNumPartitions: Int): Unit|
Merging to trunk and 2.4 (copy the changes to old classes). |
https://issues.apache.org/jira/browse/KAFKA-9069 Make `getTopicMetadata` in AdminClientIntegrationTest always read metadata from controller to get a consistent view. Reviewers: Guozhang Wang <wangguoz@gmail.com>, José Armando García Sancio <jsancio@gmail.com> Closes #7619 from huxihx/KAFKA-9069
https://issues.apache.org/jira/browse/KAFKA-9069
Make
getTopicMetadatain AdminClientIntegrationTest always read metadata from controller to get a consistent view.More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)