Skip to content

fix: use least loaded broker to refresh metadata#2645

Merged
dnwe merged 1 commit intoIBM:mainfrom
HaoSun1993:fix-stale-broker-issue
Sep 12, 2023
Merged

fix: use least loaded broker to refresh metadata#2645
dnwe merged 1 commit intoIBM:mainfrom
HaoSun1993:fix-stale-broker-issue

Conversation

@HaoSun1993
Copy link
Copy Markdown
Contributor

@HaoSun1993 HaoSun1993 commented Sep 7, 2023

Seed brokers never change after client initialization. If the first seed
broker became stale (still online, but moved to other Kafka cluster),
Sarama client may use this stale broker to get the wrong metadata. To
avoid using the stale broker to do metadata refresh, we will choose the
least loaded broker in the cached broker list which is the similar to
how the Java client implementation works:

https://github.com/apache/kafka/blob/7483991a/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736

Contributes-to: #2637

@HaoSun1993 HaoSun1993 force-pushed the fix-stale-broker-issue branch from 8d06ab9 to ede5508 Compare September 8, 2023 02:24
for _, broker := range brokers {
err := broker.Open(client.Config())
if err != nil {
if err != nil && err != ErrAlreadyConnected {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the functional tests to skip the ErrAlreadyConnected error because this change will use one broker from the broker list to refresh metadata, which is already connected when reaching the following validation.

@HaoSun1993 HaoSun1993 closed this Sep 8, 2023
@HaoSun1993 HaoSun1993 deleted the fix-stale-broker-issue branch September 8, 2023 02:27
@HaoSun1993 HaoSun1993 restored the fix-stale-broker-issue branch September 8, 2023 02:28
@HaoSun1993 HaoSun1993 reopened this Sep 8, 2023
@HaoSun1993
Copy link
Copy Markdown
Contributor Author

@dnwe Could you please review my change? Thanks

@dnwe dnwe force-pushed the fix-stale-broker-issue branch from 19b1221 to f5c9e47 Compare September 12, 2023 08:17
@dnwe dnwe changed the title Fix Sarama client using a stale broker to refresh metadata #2637 fix: use least loaded broker to refresh metadata Sep 12, 2023
Seed brokers never change after client initialization. If the first seed
broker became stale (still online, but moved to other Kafka cluster),
Sarama client may use this stale broker to get the wrong metadata. To
avoid using the stale broker to do metadata refresh, we will choose the
least loaded broker in the cached broker list which is the similar to
how the Java client implementation works:

https://github.com/apache/kafka/blob/7483991a/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L671-L736

Contributes-to: IBM#2637

Signed-off-by: Hao Sun <haos@uber.com>
@dnwe dnwe force-pushed the fix-stale-broker-issue branch from f5c9e47 to 98ec384 Compare September 12, 2023 08:18
Copy link
Copy Markdown
Collaborator

@dnwe dnwe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me. One minor query on the existing deregisterBroker behaviour for the seedBrokers list, but happy to approve and merge as-is

Comment on lines +766 to 780
// deregisterBroker removes a broker from the broker list, and if it's
// not in the broker list, removes it from seedBrokers.
func (client *client) deregisterBroker(broker *Broker) {
client.lock.Lock()
defer client.lock.Unlock()

_, ok := client.brokers[broker.ID()]
if ok {
Logger.Printf("client/brokers deregistered broker #%d at %s", broker.ID(), broker.Addr())
delete(client.brokers, broker.ID())
return
}
if len(client.seedBrokers) > 0 && broker == client.seedBrokers[0] {
client.deadSeeds = append(client.deadSeeds, broker)
client.seedBrokers = client.seedBrokers[1:]
Copy link
Copy Markdown
Collaborator

@dnwe dnwe Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only seem to deregister the broker from the seedBrokers list if it's the first element in the list (after the most recent shuffle) — is that still the desired behaviour?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is expected. seed brokers only shuffle during client initialization or hard RefreshBrokers. After that, the cached broker list is empty, and the client will use the first seed broker to fetch metadata. deregisterBroker method deregisters the first seed broker will apply that moment when this first seed broker is unavailable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know when we have a new release including this change so that our side can use it? @dnwe

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants