-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] fix prepareInitPoliciesCacheAsync in SystemTopicBasedTopicPoliciesService #24980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] fix prepareInitPoliciesCacheAsync in SystemTopicBasedTopicPoliciesService #24980
Conversation
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the great work @TakaHiR07
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #24980 +/- ##
=========================================
Coverage 74.29% 74.29%
- Complexity 34026 34066 +40
=========================================
Files 1920 1920
Lines 150252 150252
Branches 17428 17428
=========================================
+ Hits 111634 111636 +2
- Misses 29706 29735 +29
+ Partials 8912 8881 -31
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes issues in SystemTopicBasedTopicPoliciesService.prepareInitPoliciesCacheAsync related to duplicate cleanup execution and recursive update errors when exceptions occur during policy cache initialization.
Key Changes:
- Replaced
computeIfAbsentwithputIfAbsentto avoid recursive update errors when modifying the map during computation - Consolidated exception handling to ensure
cleanPoliciesCacheInitMapis called only once per exception - Removed redundant cleanup calls from
initPolicesCachemethod to prevent double cleanup - Simplified reader creation logic in
newReaderby removing special exception handling that is now redundant
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| SystemTopicBasedTopicPoliciesService.java | Refactored prepareInitPoliciesCacheAsync to use putIfAbsent instead of computeIfAbsent, consolidated exception handling to prevent double cleanup, simplified newReader logic, and changed cleanPoliciesCacheInitMap visibility for testing |
| SystemTopicBasedTopicPoliciesServiceTest.java | Added two comprehensive test cases to verify correct cleanup behavior when exceptions occur during reader creation and policy cache initialization |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...src/test/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesServiceTest.java
Show resolved
Hide resolved
...src/test/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesServiceTest.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesServiceTest.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesServiceTest.java
Outdated
Show resolved
Hide resolved
...src/test/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesServiceTest.java
Outdated
Show resolved
Hide resolved
|
Which logic could cause this issue ? |
Is this bug existed in the 3.0.x , not the latest version ? |
@Technoboy- restart broker with version-3.0.x. Restart broker-1, and after a few time restart broker-2. When load topic and getTopicPolicy on broker-1, the corresponding __change_event topic on broker-2 is unload. I don't use the latest version. Maybe in latest version, this concurrent case is avoid by pr-24658. But it still catch the exception and cleanCacheAndPolicyMap twice, this is dangerous. |
How could the latest code cause the issue ? I'm not understand |
You can see the code in branch-3.0. Latest code is a bit different, the concurrent case is found on branch-3.0 |
So it's better to fix it to branch-3.0, for the master branch, I don't think it's needed. |
@Technoboy- I think it is better to also fix in master branch. Since the current code in master branch is for improvement, not a true fix, and still have risk. |
I think that this problem exists also in master branch and therefore merging this PR and cherry-picking it to maintenance branches makes sense. |
|
Depends on #24658 for branch-4.0 and branch-4.1 |
|
Flaky test #25081, please take a look |
…picPoliciesService (apache#24980) Co-authored-by: fanjianye <fanjianye@bigo.sg> (cherry picked from commit 47b8d5d) (cherry picked from commit 9ca5241)
…picPoliciesService (apache#24980) Co-authored-by: fanjianye <fanjianye@bigo.sg> (cherry picked from commit 47b8d5d) (cherry picked from commit 9ca5241)
…picPoliciesService (apache#24980) Co-authored-by: fanjianye <fanjianye@bigo.sg> (cherry picked from commit 47b8d5d) (cherry picked from commit 9ca5241)
Fixes #24977
Motivation
As shown in the issue, fix two problem: 1. cleanCacheAndCloseReader() executed twice cause concurrent error, which result in too many orphan reader remain in SystemTopicBasedTopicPoliciesService 2. double update in policyCacheInitMap cause recursive update error
Modifications
There is one point should be consider in this pr
Besides, this case still exist: if failed to close reader in cleanCacheAndCloseReader(), this closing reader maybe have chance to reconnect and become orphan reader. This is not this pr's work.
Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-complete