-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Fix incorrect topic loading latency metric and timeout might not be respected #24785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] Fix incorrect topic loading latency metric and timeout might not be respected #24785
Conversation
…might not be respected
5365999 to
a67cfca
Compare
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/TopicLoadingContext.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/main/java/org/apache/pulsar/broker/service/TopicLoadingContext.java
Show resolved
Hide resolved
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #24785 +/- ##
=============================================
+ Coverage 38.67% 74.20% +35.53%
- Complexity 13237 33387 +20150
=============================================
Files 1850 1912 +62
Lines 144540 149072 +4532
Branches 16763 17299 +536
=============================================
+ Hits 55895 110620 +54725
+ Misses 81156 29608 -51548
- Partials 7489 8844 +1355
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@BewareMyPower There are some merge conflicts when cherry-picking this to branch-4.0. Could you please handle cherry-picking to branch-4.0? Perhaps there are some dependent PRs that need to be cherry-pick or there's a need to backport. |
|
Sure, I will do that |
|
The cherry-picking is done. But when I cherry-picked this PR, I found a regression of this PR that if the topic policies failed to get, the topic loading future won't be complete. I will push a fix soon. |
…might not be respected (apache#24785)
…might not be respected (apache#24785) (cherry picked from commit 5b2562d) (cherry picked from commit a399b5c)
…might not be respected (apache#24785) (cherry picked from commit 5b2562d) (cherry picked from commit a399b5c)
Motivation
The topic loading timeout starts at
loadOrCreatePersistentTopic:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Lines 1661 to 1663 in 46a76e9
However, there could be two asynchronous tasks before that:
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1165 in 46a76e9
__change_eventstopic will be replayed)pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1171 in 46a76e9
The 1st task should be okay in most time, but the 2nd task could be time-consuming and risky. I have seen when replaying
__change_eventstopic was stuck in production env. If this case happens, there is no metric or log and it can be only aware from client side.P.S. as the comment shows, it actually does not make sense to execute the 2nd task here but I want to improve the topic loading process in other PRs.
The topic loading metric (
pulsar_topic_load_times) starts even later atcreatePersistent0:pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java
Line 1769 in 46a76e9
It does not even include the time when the topic loading is in
BrokerService#pendingTopicLoadingQueue, which happens when the number of pending topic loading tasks exceeds themaxConcurrentTopicLoadRequestconfig.Modifications
TopicLoadingContextto add latency info and create this context from the beginning beforecheckNonPartitionedTopicExists, so the latency could include the time beforecreatePersistent0maxConcurrentTopicLoadRequestlimitation, log the pending time as wellVerifying this change
Here are example outputs from
BrokerServiceTest.testConcurrentLoadTopicExceedLimitShouldNotBeAutoCreatedqueuedmeans how long it stays inpendingTopicLoadingQueue.Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: