-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][broker] Optimize Reader creation in TopicPoliciesService #24658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...ker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
Show resolved
Hide resolved
@dao-jun Do you see the logs from https://github.com/apache/pulsar/pull/24658/files#diff-9d2948d863c111e4be6d508a1c573667a1326b98c4314e917ba9e344bb61dc27L546 ? or what is the reason for the reader close failure? The behavior you mentioned seems not expected. If user triggered the reader.close(), it should not reconnect again for any reason. |
I don't have a chance to find out the pulsar client's bug, @nodece will handle it. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24658 +/- ##
============================================
+ Coverage 74.01% 74.20% +0.19%
+ Complexity 33224 32832 -392
============================================
Files 1858 1885 +27
Lines 146500 146977 +477
Branches 16880 16930 +50
============================================
+ Hits 108425 109070 +645
+ Misses 29394 29209 -185
- Partials 8681 8698 +17
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
...ker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
Outdated
Show resolved
Hide resolved
...ker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
Outdated
Show resolved
Hide resolved
BewareMyPower
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding a new parameter and keeping two overloads of cleanCacheAndCloseReader (now cleanCache) makes code hard to read.
Method invocations like clean(ns, true, false) or clean(ns, true, true) really harms the readability. Since you're already touching this part, could you do a further refactoring by splitting the original cleanCacheAndCloseReader method into three methods? For example:
cleanWriterCachecleanReaderCachecleanPolicyInitCache
...ker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
Outdated
Show resolved
Hide resolved
...ker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
Show resolved
Hide resolved
...ker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
Outdated
Show resolved
Hide resolved
|
@Technoboy- Makes sense. I agree this is an improvement to avoid creating the subscription multiple times, but we still need to find the root cause of the issue. |
# Conflicts: # pulsar-broker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
|
There seems to be a remaining race condition issue: a namespace might be getting removed while it is being initialized again. One possible solution to this would be to break down the logic into another class. In the map for namespaces, the value could be an instance of this class that is in different states. This way it would be possible to properly handle different cases and avoid race conditions. |
...ker/src/main/java/org/apache/pulsar/broker/service/SystemTopicBasedTopicPoliciesService.java
Show resolved
Hide resolved
The concurrency control is based on |
This is an improvement, not a bugfix.
…pache#24658) Co-authored-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 0cda4f4) Signed-off-by: Zixuan Liu <nodeces@gmail.com>
…pache#24658) Co-authored-by: Zixuan Liu <nodeces@gmail.com>
…pache#24658) Co-authored-by: Zixuan Liu <nodeces@gmail.com>
|
Cherry-picking to branch-4.0 and branch-4.1 since #24980 depends on this PR. |
…pache#24658) Co-authored-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 0cda4f4) (cherry picked from commit 0868e21)
…pache#24658) Co-authored-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 0cda4f4) (cherry picked from commit 0868e21)
…pache#24658) Co-authored-by: Zixuan Liu <nodeces@gmail.com> (cherry picked from commit 0cda4f4) (cherry picked from commit 0868e21)


Motivation
In our cluster, we encountered an exception on a

change_eventspartition:Thousands of
get last message id failedwas thrown due toNo such ledgerexception.In the
SystemTopicBasedTopicPoliciesService, it will create thousands ofReaderinstances.For some reason, all the

Readers didn't closed successfully(maybe we were unload the ns at the time), after thechange_eventspartition transferred to a new broker, all theReaders was reconnected to the new broker, and never be closed.There was more than 20k consumers of the partition of

change_events:Modifications
Don't clean the
readerCacheand close theReaderunless:Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-complete