Skip to content

Conversation

@BewareMyPower
Copy link
Contributor

Motivation

The topic loading timeout starts at loadOrCreatePersistentTopic:

final CompletableFuture<Optional<Topic>> topicFuture = FutureUtil.createFutureWithTimeout(
Duration.ofSeconds(pulsar.getConfiguration().getTopicLoadTimeoutSeconds()), executor(),
() -> FAILED_TO_LOAD_TOPIC_TIMEOUT_EXCEPTION);

However, there could be two asynchronous tasks before that:

  1. Check if the topic exists

return checkNonPartitionedTopicExists(topicName).thenCompose(exists -> {

  1. Get the topic policies (__change_events topic will be replayed)

return getTopicPoliciesBypassSystemTopic(topicName, TopicPoliciesService.GetType.LOCAL_ONLY)

The 1st task should be okay in most time, but the 2nd task could be time-consuming and risky. I have seen when replaying __change_events topic was stuck in production env. If this case happens, there is no metric or log and it can be only aware from client side.

P.S. as the comment shows, it actually does not make sense to execute the 2nd task here but I want to improve the topic loading process in other PRs.

The topic loading metric (pulsar_topic_load_times) starts even later at createPersistent0:

final long topicCreateTimeMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime());

It does not even include the time when the topic loading is in BrokerService#pendingTopicLoadingQueue, which happens when the number of pending topic loading tasks exceeds the maxConcurrentTopicLoadRequest config.

Modifications

  • Refactor TopicLoadingContext to add latency info and create this context from the beginning before checkNonPartitionedTopicExists, so the latency could include the time before createPersistent0
  • Log latency when the topic is loaded successfully or fails to load, specially, if the topic has been added to the pending queue due to maxConcurrentTopicLoadRequest limitation, log the pending time as well

Verifying this change

  • Make sure that the change passes the CI checks.

Here are example outputs from BrokerServiceTest.testConcurrentLoadTopicExceedLimitShouldNotBeAutoCreated

Created topic persistent://prop/concurrentLoad/my-topic_0 - dedup is disabled (latency: 156 ms)
Created topic persistent://prop/concurrentLoad/my-topic_1 - dedup is disabled (latency: 20 ms)
Created topic persistent://prop/concurrentLoad/my-topic_2 - dedup is disabled (latency: 20 ms)
Created topic persistent://prop/concurrentLoad/my-topic_0 - dedup is disabled (latency: 4 ms)
Created topic persistent://prop/concurrentLoad/my-topic_1 - dedup is disabled (latency: 7 ms (queued: 3))
Created topic persistent://prop/concurrentLoad/my-topic_2 - dedup is disabled (latency: 12 ms (queued: 7))

queued means how long it stays in pendingTopicLoadingQueue.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@BewareMyPower BewareMyPower force-pushed the bewaremypower/delay-topic-policies-load branch from 5365999 to a67cfca Compare September 26, 2025 02:20
@codecov-commenter
Copy link

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.20%. Comparing base (1050f48) to head (8e50c15).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@              Coverage Diff              @@
##             master   #24785       +/-   ##
=============================================
+ Coverage     38.67%   74.20%   +35.53%     
- Complexity    13237    33387    +20150     
=============================================
  Files          1850     1912       +62     
  Lines        144540   149072     +4532     
  Branches      16763    17299      +536     
=============================================
+ Hits          55895   110620    +54725     
+ Misses        81156    29608    -51548     
- Partials       7489     8844     +1355     
Flag Coverage Δ
inttests 26.60% <83.58%> (-0.09%) ⬇️
systests 22.75% <73.13%> (-0.06%) ⬇️
unittests 73.73% <100.00%> (+38.86%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...rg/apache/pulsar/broker/service/BrokerService.java 83.28% <100.00%> (+25.36%) ⬆️
...che/pulsar/broker/service/TopicLoadingContext.java 100.00% <100.00%> (ø)

... and 1410 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Technoboy- Technoboy- added this to the 4.2.0 milestone Sep 30, 2025
@Technoboy- Technoboy- merged commit 5b2562d into apache:master Sep 30, 2025
112 of 118 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/delay-topic-policies-load branch September 30, 2025 06:21
lhotari pushed a commit that referenced this pull request Oct 4, 2025
…might not be respected (#24785)

(cherry picked from commit 5b2562d)
@lhotari
Copy link
Member

lhotari commented Oct 8, 2025

@BewareMyPower There are some merge conflicts when cherry-picking this to branch-4.0. Could you please handle cherry-picking to branch-4.0? Perhaps there are some dependent PRs that need to be cherry-pick or there's a need to backport.

@BewareMyPower
Copy link
Contributor Author

Sure, I will do that

BewareMyPower added a commit that referenced this pull request Oct 8, 2025
…might not be respected (#24785)

(cherry picked from commit 5b2562d)
@BewareMyPower
Copy link
Contributor Author

The cherry-picking is done. But when I cherry-picked this PR, I found a regression of this PR that if the topic policies failed to get, the topic loading future won't be complete. I will push a fix soon.

walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Oct 15, 2025
…might not be respected (apache#24785)

(cherry picked from commit 5b2562d)
(cherry picked from commit a399b5c)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 6, 2025
…might not be respected (apache#24785)

(cherry picked from commit 5b2562d)
(cherry picked from commit a399b5c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants