Skip to content

KAFKA-8503: Ignore retries config if a custom timeout is provided#6913

Closed
huxihx wants to merge 4 commits into
apache:trunkfrom
huxihx:KAFKA-8503
Closed

KAFKA-8503: Ignore retries config if a custom timeout is provided#6913
huxihx wants to merge 4 commits into
apache:trunkfrom
huxihx:KAFKA-8503

Conversation

@huxihx

@huxihx huxihx commented Jun 11, 2019

Copy link
Copy Markdown
Contributor

https://issues.apache.org/jira/browse/KAFKA-8503

When custom timeout is provided, retries config could be ignored for individual APIs in KafkaAdminClient. Tweaked code path for Call.fail to pass NPathComplexity check.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@huxihx

huxihx commented Jun 12, 2019

Copy link
Copy Markdown
Contributor Author

@hachikuji Please review this patch. Thanks.

@huxihx

huxihx commented Sep 6, 2019

Copy link
Copy Markdown
Contributor Author

retest this please.

@huxihx

huxihx commented Sep 12, 2019

Copy link
Copy Markdown
Contributor Author

@hachikuji Could you take some time to review this patch? Thanks.

@hachikuji

Copy link
Copy Markdown
Contributor

@huxihx I wrote this KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-533%3A+Add+default+api+timeout+to+AdminClient. Once this is approved, we can complete this work.

@hachikuji

Copy link
Copy Markdown
Contributor

@huxihx KIP-533 has been approved: https://cwiki.apache.org/confluence/display/KAFKA/KIP-533%3A+Add+default+api+timeout+to+AdminClient. Would you like to update this patch with the proposal?

@huxihx

huxihx commented Nov 29, 2019

Copy link
Copy Markdown
Contributor Author

@hachikuji Please review this patch. Thanks.

@hachikuji hachikuji left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Left a few comments. It would be helpful to have a basic unit or integration test.

+ "than 1/3 of that value. It can be adjusted even lower to control the expected time for normal rebalances.";

public static final String DEFAULT_API_TIMEOUT_MS_CONFIG = "default.api.timeout.ms";
public static final String DEFAULT_API_TIMEOUT_MS_DOC = "Specifies the timeout (in milliseconds) for client APIs that could block. " +

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need to generalize this description a little bit. The AdminClient APIs do not block.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply dropping could block is a good option?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that sounds fine.

*/
public Integer timeoutMs() {
return timeoutMs;
public Integer apiTimeoutMs() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are incompatible; we need to keep the existing API. I think it's fine to update the javadoc.

this.defaultTimeoutMs = config.getInt(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG);
this.defaultTimeoutMs = config.getInt(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG);
// Normally, the default api timeout should be equal or larger than the request timeout.
this.requestTimeoutMs = Math.min(defaultTimeoutMs, config.getInt(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So our options here are either to raise an error to the user or adjust one of the configurations. Since default.api.timeout.ms is a new configuration, it is possible that a user has explicitly provided a request.timeout.ms which conflicts with the default default.api.timeout.ms. I think the logic should be something like the following:

  1. If a default.api.timeout.ms has been explicitly specified, raise an error if it conflicts with request.timeout.ms.
  2. If no default.api.timeout.ms has been configured, then set its value as the max of the default and request.timeout.ms. Also we should probably log a warning.
  3. Otherwise, use the provided values for both configurations.

@huxihx

huxihx commented Dec 6, 2019

Copy link
Copy Markdown
Contributor Author

retest this please.

@hachikuji hachikuji left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. Left one more comment. Note we still need test a test case which verifies basic timeout behavior.

}
Call call = calls.remove(0);
int timeoutMs = calcTimeoutMsRemainingAsInt(now, call.deadlineMs);
int timeoutMs = Math.min(requestTimeoutMs, calcTimeoutMsRemainingAsInt(now, call.deadlineMs));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to passing through the timeout to createRequest, we also need to pass it through in newClientRequest. Only some of the APIs support an explicit request timeout. This appears to have been a pre-existing bug.

@huxihx

huxihx commented Dec 11, 2019

Copy link
Copy Markdown
Contributor Author

testHandleTimeout which is now disabled due to flaky failures seems to be a good starting point for testing the timeout behavior. Do we need a test case which should cover the new-added config or the request timeout or even both?

@hachikuji

hachikuji commented Dec 12, 2019

Copy link
Copy Markdown
Contributor

Mainly we need a test case which verifies that the config is actually respected. We don't have to do this for every API. I would say pick one and verify 1) that the default api timeout works, and 2) that it can be overridden by the Options.

@huxihx

huxihx commented Dec 13, 2019

Copy link
Copy Markdown
Contributor Author

retest this please.

3 similar comments
@huxihx

huxihx commented Dec 16, 2019

Copy link
Copy Markdown
Contributor Author

retest this please.

@huxihx

huxihx commented Dec 17, 2019

Copy link
Copy Markdown
Contributor Author

retest this please.

@huxihx

huxihx commented Dec 19, 2019

Copy link
Copy Markdown
Contributor Author

retest this please.

@hachikuji

Copy link
Copy Markdown
Contributor

@huxihx I opened a PR into your branch with some of the missing test cases. Please take a look: huxihx#1.

@hachikuji

Copy link
Copy Markdown
Contributor

@huxihx Do you have time to follow up here? I would like to get this merged for 2.5 if possible.

@hachikuji

Copy link
Copy Markdown
Contributor

@huxihx Closing this. We will merge #8011 with co-author attribution. Thanks for the contribution!

@hachikuji hachikuji closed this Jan 30, 2020
hachikuji added a commit that referenced this pull request Jan 31, 2020
This PR implements `default.api.timeout.ms` as documented by KIP-533. This is a rebased version of #6913 with some additional test cases and small cleanups.

Reviewers: David Arthur <mumrah@gmail.com>

Co-authored-by: huxi <huxi_2b@hotmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants