Skip to content

Conversation

@zhaizhibo
Copy link
Contributor

Motivation

2025-08-17T18:30:56,880+0800 [metadata-store-10-1] INFO  org.apache.pulsar.broker.resources.MetadataStoreCacheLoader.lambda$init$0(MetadataStoreCacheLoader.java:68) - Successfully updated broker info [xx.xxx.238.222:24010, xx.xxx.238.226:24010, xx.xxx.64.175:24010, xx.xxx.86.50:24010, xx.xxx.38.249:24010, xx.xxx.168.210:24010]
2025-08-17T18:30:56,880+0800 [metadata-store-10-1] INFO  org.apache.pulsar.broker.resources.MetadataStoreCacheLoader.lambda$init$0(MetadataStoreCacheLoader.java:68) - Successfully updated broker info [xx.xxx.238.222:24010, xx.xxx.238.226:24010, xx.xxx.86.50:24010, xx.xxx.38.249:24010, xx.xxx.168.210:24010]

The second update (with 5 brokers) was actually processed before the first update (with 6 brokers), but due to concurrent operations, the incorrect result was cached.

When updating the availableBrokers in MetadataStoreCacheLoader, there is a possibility that the actual processing order of ZooKeeper updates may become inconsistent. This could occur due to:

  1. Uncertain callback execution order after loadReportResources.getChildrenAsync(LOADBALANCE_BROKERS_ROOT)
  2. Uncertain execution sequence in the common-forkjoin thread pool via .thenApplyAsync

Modifications

  1. ensure sequential processing – Only one update operation proceeds at a time, with subsequent updates waiting until the current completes
  2. Added scheduled update for availableBrokers list.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Aug 18, 2025
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @zhaizhibo! Please check the review comments.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good now. I think preserving the original logic to do the initial fetch of brokers could be useful. There's a comment about that.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the contribution @zhaizhibo

@lhotari lhotari changed the title [Bug] Race condition in MetadataStoreCacheLoader causes inconsistent availableBroker list caching [fix][broker] Fix race condition in MetadataStoreCacheLoader causing inconsistent availableBroker list caching Aug 19, 2025
@lhotari
Copy link
Member

lhotari commented Aug 19, 2025

There's a checkstyle error:

[INFO] There is 1 error reported by Checkstyle 10.14.2 with /home/runner/work/pulsar/pulsar/buildtools/src/main/resources/pulsar/checkstyle.xml ruleset.
Error:  src/main/java/org/apache/pulsar/broker/resources/MetadataStoreCacheLoader.java:[30,1] (imports) ImportOrder: Extra separation in import group before 'org.apache.bookkeeper.common.util.OrderedScheduler'

If you are using IntelliJ IDE, here are instructions how to setup the correct code style: https://pulsar.apache.org/contribute/setup-ide/#configure-code-style
Organizing imports in IntelliJ after using the correct code style would fix the issue.

Before pushing changes, this is the fastest command to validate that the project compiles and checkstyle passes for the project:

mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true -DnarPluginPhase=none

@zhaizhibo
Copy link
Contributor Author

Looks pretty good now. I think preserving the original logic to do the initial fetch of brokers could be useful. There's a comment about that.

There's a checkstyle error:

[INFO] There is 1 error reported by Checkstyle 10.14.2 with /home/runner/work/pulsar/pulsar/buildtools/src/main/resources/pulsar/checkstyle.xml ruleset.
Error:  src/main/java/org/apache/pulsar/broker/resources/MetadataStoreCacheLoader.java:[30,1] (imports) ImportOrder: Extra separation in import group before 'org.apache.bookkeeper.common.util.OrderedScheduler'

If you are using IntelliJ IDE, here are instructions how to setup the correct code style: https://pulsar.apache.org/contribute/setup-ide/#configure-code-style Organizing imports in IntelliJ after using the correct code style would fix the issue.

Before pushing changes, this is the fastest command to validate that the project compiles and checkstyle passes for the project:

mvn -Pcore-modules,-main -T 1C clean install -DskipTests -Dspotbugs.skip=true -DnarPluginPhase=none

done. The original log message 'Successfully updated xxx' was useful, so I added it back.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 93.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.25%. Comparing base (19b7c27) to head (9034a4a).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...sar/broker/resources/MetadataStoreCacheLoader.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24639      +/-   ##
============================================
- Coverage     74.48%   74.25%   -0.24%     
+ Complexity    33277    32787     -490     
============================================
  Files          1882     1882              
  Lines        146854   146859       +5     
  Branches      16866    16868       +2     
============================================
- Hits         109389   109047     -342     
- Misses        28861    29130     +269     
- Partials       8604     8682      +78     
Flag Coverage Δ
inttests 26.58% <93.33%> (-0.12%) ⬇️
systests 23.31% <73.33%> (-0.07%) ⬇️
unittests 73.74% <86.66%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...sar/broker/resources/MetadataStoreCacheLoader.java 77.04% <93.33%> (+3.36%) ⬆️

... and 92 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dao-jun dao-jun merged commit d3c615c into apache:master Aug 21, 2025
96 of 98 checks passed
lhotari pushed a commit that referenced this pull request Aug 26, 2025
…inconsistent availableBroker list caching (#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
lhotari pushed a commit that referenced this pull request Aug 26, 2025
…inconsistent availableBroker list caching (#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
lhotari pushed a commit that referenced this pull request Aug 26, 2025
…inconsistent availableBroker list caching (#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Aug 26, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 28, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
(cherry picked from commit a05e727)
ganesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Aug 29, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
(cherry picked from commit b2f12a2)
@zhaizhibo zhaizhibo deleted the fix_incorrect_availableBrokers_in_metadataStoreCacheLoader branch September 2, 2025 11:39
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 3, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
(cherry picked from commit b2f12a2)
Technoboy- pushed a commit to Technoboy-/pulsar that referenced this pull request Sep 10, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 12, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
(cherry picked from commit d3c615c)
(cherry picked from commit a05e727)
@lhotari lhotari added this to the 4.1.0 milestone Sep 17, 2025
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
…inconsistent availableBroker list caching (apache#24639)

Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants