Skip to content

KAFKA-9184 (port on 2.3): Redundant task creation and periodic rebalances after zombie Connect worker rejoins the group #7783

Merged
rhauch merged 1 commit into
apache:2.3from
kkonstantine:kafka-9184-port-on-2.3
Dec 5, 2019
Merged

KAFKA-9184 (port on 2.3): Redundant task creation and periodic rebalances after zombie Connect worker rejoins the group #7783
rhauch merged 1 commit into
apache:2.3from
kkonstantine:kafka-9184-port-on-2.3

Conversation

@kkonstantine

Copy link
Copy Markdown
Contributor

Check connectivity with broker coordinator in intervals and stop tasks if coordinator is unreachable by setting assignmentSnapshot to null and resetting rebalance delay when there are no lost tasks. And, because we're now sometimes setting assignmentSnapshot to null and reading it from other methods and thread, made this member volatile and used local references to ensure consistent reads.

Adapted existing unit tests to verify additional debug calls, added more specific log messages to DistributedHerder, and added a new integration test that verifies the behavior when the brokers are stopped and restarted only after the workers lose their heartbeats with the broker coordinator.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@kkonstantine

Copy link
Copy Markdown
Contributor Author

@rhauch this PR ports #7771 on top of 2.3 and its meant to be merged only on that branch. Please take a look when you get the chance. Cheers

…nces after zombie Connect worker rejoins the group (apache#7771)

Check connectivity with broker coordinator in intervals and stop tasks if coordinator is unreachable by setting `assignmentSnapshot` to null and resetting rebalance delay when there are no lost tasks. And, because we're now sometimes setting `assignmentSnapshot` to null and reading it from other methods and thread, made this member volatile and used local references to ensure consistent reads.

Adapted existing unit tests to verify additional debug calls, added more specific log messages to `DistributedHerder`, and added a new integration test that verifies the behavior when the brokers are stopped and restarted only after the workers lose their heartbeats with the broker coordinator.

Author: Konstantine Karantasis <konstantine@confluent.io>
Reviewers: Greg Harris <gregh@confluent.io>, Randall Hauch <rhauch@gmail.com>
@kkonstantine kkonstantine force-pushed the kafka-9184-port-on-2.3 branch from 386ba23 to 766336e Compare December 5, 2019 01:21
@kkonstantine

Copy link
Copy Markdown
Contributor Author

I think the only notable changes besides what's getting ported from #7771
is that now, generationId needs to guard against null with
return generation == null ? OffsetCommitRequest.DEFAULT_GENERATION_ID : generation.generationId; and that some test code had to be ported as well to make ConnectWorkerIntegrationTest#testRestartFailedTask in 2.3 as in more recent branches.

@rhauch rhauch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @kkonstantine ! This made backporting #7771 much easier!!

@rhauch rhauch merged commit e851d5b into apache:2.3 Dec 5, 2019
@kkonstantine kkonstantine deleted the kafka-9184-port-on-2.3 branch December 5, 2019 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants