Skip to content

[LI-HOTFIX] back-port the fix for KAFKA-8950 - KafkaConsumer stops fetching#79

Merged
radai-rosenblatt merged 1 commit into
linkedin:2.3-li-1from
radai-rosenblatt:2.3-li-1
Apr 15, 2020
Merged

[LI-HOTFIX] back-port the fix for KAFKA-8950 - KafkaConsumer stops fetching#79
radai-rosenblatt merged 1 commit into
linkedin:2.3-li-1from
radai-rosenblatt:2.3-li-1

Conversation

@radai-rosenblatt

Copy link
Copy Markdown

TICKET=KAFKA-8950

EXIT_CRITERIA = when code is rebased on top of 2.4+ to get the upstream fix

below is the original commit from commit c8676c9:
KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (apache#7511)

The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again.

This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added.

Reviewers: Tom Lee, Jason Gustafson jason@confluent.io, Rajini Sivaram rajinisivaram@googlemail.com

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

…tching

   TICKET=KAFKA-8950

   EXIT_CRITERIA = when code is rebased on top of 2.4+ to get the upstream fix

   below is the original commit from commit c8676c9:
KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (apache#7511)

The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again.

This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added.

Reviewers: Tom Lee, Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
@radai-rosenblatt

Copy link
Copy Markdown
Author

upstream commit cleanly applied, tests included

@gitlw gitlw left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@radai-rosenblatt radai-rosenblatt merged commit 368244a into linkedin:2.3-li-1 Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants