Skip to content

KAFKA-8069: Setting expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets#3

Merged
hzxa21 merged 2 commits into
linkedin:2.0-lifrom
hzxa21:KAFKA-8069
Mar 8, 2019
Merged

Conversation

@hzxa21

@hzxa21 hzxa21 commented Mar 8, 2019

Copy link
Copy Markdown

After the 2.1 release, if the broker hasn't been upgrade to the latest inter-broker protocol version, the committed offsets stored in the __consumer_offset topic will get cleaned up way earlier than it should be when the offsets are loaded back from the __consumer_offset topic in GroupCoordinator, which will happen during leadership transition or after broker bounce. This patch fixes the bug by setting expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets.

Details for the bug can be found in https://issues.apache.org/jira/browse/KAFKA-8069

…e after loading v1 offset records from __consumer_offsets

After the 2.1 release, if the broker hasn't been upgrade to the latest inter-broker protocol version,
the committed offsets stored in the __consumer_offset topic will get cleaned up way earlier than it
should be when the offsets are loaded back from the __consumer_offset topic in GroupCoordinator, which
will happen during leadership transition or after broker bounce. This patch fixes the bug by setting
expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets
@hzxa21 hzxa21 requested a review from jonlee2 March 8, 2019 05:51
@hzxa21 hzxa21 changed the title KAFKA-8069: Setting expireTimestamp to None if it is the default valu… KAFKA-8069: Setting expireTimestamp to None if it is the default value after loading v1 offset records from __consumer_offsets Mar 8, 2019
@hzxa21 hzxa21 requested review from jjkoshy, onurkaraman and xiowu0 March 8, 2019 05:52

@xiowu0 xiowu0 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through upstream ticket. fix LGTM.

Comment thread core/src/test/scala/unit/kafka/server/OffsetCommitTest.scala

@jonlee2 jonlee2 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@hzxa21 hzxa21 merged commit 8d1fddd into linkedin:2.0-li Mar 8, 2019
kidkun pushed a commit to kidkun/kafka that referenced this pull request Dec 14, 2019
…pache#7305)

A partition log in initialized in following steps:

1. Fetch log config from ZK
2. Call LogManager.getOrCreateLog which creates the Log object, then
3. Registers the Log object

Step linkedin#3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step linkedin#1 and linkedin#3
then that update is missed. It breaks following use case:

1. Create a topic with default configuration, and immediately after that
2. Update the configuration of topic

There is a race condition here and in random cases update made in
second step will get dropped.

This change fixes it by tracking updates arriving between step linkedin#1 and linkedin#3
Once a Partition is done initializing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.

Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.

Reviewers: Jason Gustafson <jason@confluent.io>
gitlw pushed a commit to gitlw/kafka that referenced this pull request May 4, 2020
…pache#7305)

A partition log in initialized in following steps:

1. Fetch log config from ZK
2. Call LogManager.getOrCreateLog which creates the Log object, then
3. Registers the Log object

Step linkedin#3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step linkedin#1 and linkedin#3
then that update is missed. It breaks following use case:

1. Create a topic with default configuration, and immediately after that
2. Update the configuration of topic

There is a race condition here and in random cases update made in
second step will get dropped.

This change fixes it by tracking updates arriving between step linkedin#1 and linkedin#3
Once a Partition is done initializing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.

Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.

Reviewers: Jason Gustafson <jason@confluent.io>
earlcoder added a commit that referenced this pull request Apr 28, 2026
…e (audit P0 #3)

The 'maintenance.broker.list' dynamic config still parsed but had no
runtime effect: KafkaConfig.getMaintenanceBrokerList was 'Seq.empty // LI
stub', so the controller no longer factored maintenance brokers into its
decisions, and the MaintenanceBrokerCount gauge was missing.

Restoration scope (this commit):
- DynamicConfig.Broker: register MaintenanceBrokerListProp via
  brokerConfigDef.define(...), restore DefaultMaintenanceBrokerList,
  MaintenanceBrokerListDoc, ClusterLevelConfigs, the
  getMaintenanceBrokerListFromString helper, and
  MaintenanceBrokerListValidator.
- DynamicBrokerConfig: restore getMaintenanceBrokerList that reads
  the prop from dynamicDefaultConfigs.
- KafkaConfig.getMaintenanceBrokerList: replace stub with delegation
  to dynamicConfig.
- KafkaController: restore MaintenanceBrokerCount gauge so LI ops
  dashboards see active-controller maintenance broker count.

ControllerContext.partitionUnassignableBrokerIds(Seq[Int]) is already
present in 3.6-li, so the helper is intact.

Out-of-scope follow-up: the rearrangePartitionReplicaAssignmentForNewTopics
LI method (which used to consume getMaintenanceBrokerList at the topic
auto-creation path) was removed entirely from 3.6-li KafkaController.
Restoring that requires re-integrating the LI logic into the new 3.6
auto-create flow and is filed as a separate audit P2 item.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants