[LI-HOTFIX] back-port the fix for KAFKA-8950 - KafkaConsumer stops fetching#79
Merged
Conversation
…tching TICKET=KAFKA-8950 EXIT_CRITERIA = when code is rebased on top of 2.4+ to get the upstream fix below is the original commit from commit c8676c9: KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (apache#7511) The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again. This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added. Reviewers: Tom Lee, Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Author
|
upstream commit cleanly applied, tests included |
jonlee2
approved these changes
Apr 15, 2020
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TICKET=KAFKA-8950
EXIT_CRITERIA = when code is rebased on top of 2.4+ to get the upstream fix
below is the original commit from commit c8676c9:
KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (apache#7511)
The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again.
This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added.
Reviewers: Tom Lee, Jason Gustafson jason@confluent.io, Rajini Sivaram rajinisivaram@googlemail.com
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)