KAFKA-9184 (port on 2.3): Redundant task creation and periodic rebalances after zombie Connect worker rejoins the group #7783
Merged
Conversation
Contributor
Author
…nces after zombie Connect worker rejoins the group (apache#7771) Check connectivity with broker coordinator in intervals and stop tasks if coordinator is unreachable by setting `assignmentSnapshot` to null and resetting rebalance delay when there are no lost tasks. And, because we're now sometimes setting `assignmentSnapshot` to null and reading it from other methods and thread, made this member volatile and used local references to ensure consistent reads. Adapted existing unit tests to verify additional debug calls, added more specific log messages to `DistributedHerder`, and added a new integration test that verifies the behavior when the brokers are stopped and restarted only after the workers lose their heartbeats with the broker coordinator. Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Greg Harris <gregh@confluent.io>, Randall Hauch <rhauch@gmail.com>
386ba23 to
766336e
Compare
Contributor
Author
|
I think the only notable changes besides what's getting ported from #7771 |
rhauch
approved these changes
Dec 5, 2019
rhauch
left a comment
Contributor
There was a problem hiding this comment.
LGTM. Thanks, @kkonstantine ! This made backporting #7771 much easier!!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Check connectivity with broker coordinator in intervals and stop tasks if coordinator is unreachable by setting
assignmentSnapshotto null and resetting rebalance delay when there are no lost tasks. And, because we're now sometimes settingassignmentSnapshotto null and reading it from other methods and thread, made this member volatile and used local references to ensure consistent reads.Adapted existing unit tests to verify additional debug calls, added more specific log messages to
DistributedHerder, and added a new integration test that verifies the behavior when the brokers are stopped and restarted only after the workers lose their heartbeats with the broker coordinator.Committer Checklist (excluded from commit message)