KAFKA-8869: Remove task configs for deleted connectors from config snapshot#8444
Merged
Conversation
Contributor
Author
|
@ncliang, @gharris1727, @MichaelDrogalis, would you mind taking a look at this when you have a chance? |
Contributor
Author
|
@rhauch, @kkonstantine would one of you mind taking a look when you have a chance? |
Contributor
|
ok to test |
Contributor
|
jdk8: success |
kkonstantine
pushed a commit
that referenced
this pull request
May 21, 2020
…apshot (#8444) Currently, if a connector is deleted, its task configurations will remain in the config snapshot tracked by the KafkaConfigBackingStore. This causes issues with incremental cooperative rebalancing, which utilizes that config snapshot to determine which connectors and tasks need to be assigned across the cluster. Specifically, it first checks to see which connectors are present in the config snapshot, and then, for each of those connectors, queries the snapshot for that connector's task configs. The lifecycle of a connector is for its configuration to be written to the config topic, that write to be picked up by the workers in the cluster and trigger a rebalance, the connector to be assigned to and started by a worker, task configs to be generated by the connector and then written to the config topic, that write to be picked up by the workers in the cluster and trigger a second rebalance, and finally, the tasks to be assigned to and started by workers across the cluster. There is a brief period in between the first time the connector is started and when the second rebalance has completed during which those stale task configs from a previously-deleted version of the connector will be used by the framework to start tasks for that connector. This fix aims to eliminate that window by preemptively clearing the task configs from the config snapshot for a connector whenever it has been deleted. An existing unit test is modified to verify this behavior, and should provide sufficient guarantees that the bug has been fixed. Reviewers: Nigel Liang <nigel@nigelliang.com>, Konstantine Karantasis <konstantine@confluent.io>
kkonstantine
pushed a commit
that referenced
this pull request
May 21, 2020
…apshot (#8444) Currently, if a connector is deleted, its task configurations will remain in the config snapshot tracked by the KafkaConfigBackingStore. This causes issues with incremental cooperative rebalancing, which utilizes that config snapshot to determine which connectors and tasks need to be assigned across the cluster. Specifically, it first checks to see which connectors are present in the config snapshot, and then, for each of those connectors, queries the snapshot for that connector's task configs. The lifecycle of a connector is for its configuration to be written to the config topic, that write to be picked up by the workers in the cluster and trigger a rebalance, the connector to be assigned to and started by a worker, task configs to be generated by the connector and then written to the config topic, that write to be picked up by the workers in the cluster and trigger a second rebalance, and finally, the tasks to be assigned to and started by workers across the cluster. There is a brief period in between the first time the connector is started and when the second rebalance has completed during which those stale task configs from a previously-deleted version of the connector will be used by the framework to start tasks for that connector. This fix aims to eliminate that window by preemptively clearing the task configs from the config snapshot for a connector whenever it has been deleted. An existing unit test is modified to verify this behavior, and should provide sufficient guarantees that the bug has been fixed. Reviewers: Nigel Liang <nigel@nigelliang.com>, Konstantine Karantasis <konstantine@confluent.io>
kkonstantine
pushed a commit
that referenced
this pull request
May 21, 2020
…apshot (#8444) Currently, if a connector is deleted, its task configurations will remain in the config snapshot tracked by the KafkaConfigBackingStore. This causes issues with incremental cooperative rebalancing, which utilizes that config snapshot to determine which connectors and tasks need to be assigned across the cluster. Specifically, it first checks to see which connectors are present in the config snapshot, and then, for each of those connectors, queries the snapshot for that connector's task configs. The lifecycle of a connector is for its configuration to be written to the config topic, that write to be picked up by the workers in the cluster and trigger a rebalance, the connector to be assigned to and started by a worker, task configs to be generated by the connector and then written to the config topic, that write to be picked up by the workers in the cluster and trigger a second rebalance, and finally, the tasks to be assigned to and started by workers across the cluster. There is a brief period in between the first time the connector is started and when the second rebalance has completed during which those stale task configs from a previously-deleted version of the connector will be used by the framework to start tasks for that connector. This fix aims to eliminate that window by preemptively clearing the task configs from the config snapshot for a connector whenever it has been deleted. An existing unit test is modified to verify this behavior, and should provide sufficient guarantees that the bug has been fixed. Reviewers: Nigel Liang <nigel@nigelliang.com>, Konstantine Karantasis <konstantine@confluent.io>
Contributor
|
Merged to |
Kvicii
pushed a commit
to Kvicii/kafka
that referenced
this pull request
May 22, 2020
* 'trunk' of github.com:apache/kafka: KAFKA-9980: Fix bug where alterClientQuotas could not set default client quotas (apache#8658) KAFKA-9780: Deprecate commit records without record metadata (apache#8379) MINOR: Deploy VerifiableClient in constructor to avoid test timeouts (apache#8651) MINOR: Added unit tests for ConnectionQuotas (apache#8650) MINOR: Correct MirrorMaker2 integration test configs for Connect internal topics (apache#8653) KAFKA-9855 - return cached Structs for Schemas with no fields (apache#8472) KAFKA-9950: Construct new ConfigDef for MirrorTaskConfig before defining new properties (apache#8608) KAFKA-8869: Remove task configs for deleted connectors from config snapshot (apache#8444) KAFKA-9409: Supplement immutability of ClusterConfigState class in Connect (apache#7942)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Jira
Currently, if a connector is deleted, its task configurations will remain in the config snapshot tracked by the
KafkaConfigBackingStore. This causes issues with incremental cooperative rebalancing, which utilizes that config snapshot to determine which connectors and tasks need to be assigned across the cluster. Specifically, it first checks to see which connectors are present in the config snapshot, and then, for each of those connectors, queries the snapshot for that connector's task configs.The lifecycle of a connector is for its configuration to be written to the config topic, that write to be picked up by the workers in the cluster and trigger a rebalance, the connector to be assigned to and started by a worker, task configs to be generated by the connector and then written to the config topic, that write to be picked up by the workers in the cluster and trigger a second rebalance, and finally, the tasks to be assigned to and started by workers across the cluster.
There is a brief period in between the first time the connector is started and when the second rebalance has completed during which those stale task configs from a previously-deleted version of the connector will be used by the framework to start tasks for that connector.
This fix aims to eliminate that window by preemptively clearing the task configs from the config snapshot for a connector whenever it has been deleted.
An existing unit test is modified to verify this behavior, and should provide sufficient guarantees that the bug has been fixed, since the cause of the behavior has been narrowed down to incorrect values in the
taskConfigsfield for theClusterConfigStateprovided by theKafkaConfigBackingStore.Committer Checklist (excluded from commit message)