[fix][broker] Fix the issue of topics possibly being deleted.#21704
Conversation
|
|
||
| List<String> configuredClusters = topicPolicies.getReplicationClusters().get(); | ||
| if (CollectionUtils.isEmpty(configuredClusters)) { | ||
| log.warn("[{}] No replication clusters configured", name); |
There was a problem hiding this comment.
It should be an abnormal situation, please add some explanation here to indicate what happens here. :)
There was a problem hiding this comment.
Referring to the above flow diagram, method onUpdate()->checkReplication() is called before method this.updateTopicPolicyByNamespacePolicy(policies).
There was a problem hiding this comment.
Yes, maybe we can refine the log to let other users know about the possibility of a problem. :)
mattisonchao
left a comment
There was a problem hiding this comment.
LGTM, Is it possible to add a test? Maybe running it many times can reproduce it, which is also acceptable. :)
Its difficult to reproduce with unit tests, manually |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #21704 +/- ##
=============================================
+ Coverage 36.75% 73.37% +36.62%
- Complexity 12271 32767 +20496
=============================================
Files 1717 1893 +176
Lines 131197 140761 +9564
Branches 14339 15509 +1170
=============================================
+ Hits 48220 103290 +55070
+ Misses 76596 29360 -47236
- Partials 6381 8111 +1730
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
@crossoverJie, can we just put a time-consuming task in the thread pool to simulate the It's a very critical fix, if don't have a test to protect the change, things will go wrong again in the future. Here is an example: #21686 |
Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 84ea1ca)
Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 84ea1ca)
Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 84ea1ca)
Co-authored-by: Jiwe Guo <technoboy@apache.org> (cherry picked from commit 84ea1ca)
|
Sorry, to clear the root cause of the issue, I left a comment: The
|
| time | task: topic loading |
task: modifying topic policies |
|---|---|---|
| 1 | Registered the topic to accept new changes of Topic Policies; the attribute replicationClusters is empty now |
|
| 2 | Modify topic policies | |
| 3 | Trigger a task checkReplication, and then it will try to delete the topic due to replicationClusters being empty |
|
| 4 | Initialize the attribute replicationClusters from Namespace level Replication settings |
Detail steps of the issue: conditions-2
| time | task: topic loading |
task: TopicPoliciesService initialize after brokers restart |
|---|---|---|
| 1 | Registered the topic to accept new changes of Topic Policies; the attribute replicationClusters is empty now |
|
| 2 | Read all topic policies and try to update topic-level policies for every topic under this broker after the Broker restart | |
| 3 | Trigger a task checkReplication, and then it will try to delete the topic due to replicationClusters being empty |
|
| 4 | Initialize the attribute replicationClusters from Namespace level Replication settings |
When there is more than one broker under your cluster, the probability of shedding topics to a new broker(when it has just started right now) is low, so the probability is high if there is only 1 broker.
Workaround-1
Before you upgrade to the new version(which contains this fix), disable the feature topicLevelPoliciesEnabled, but it also makes all topic-level policies invalidate you set before.
Workaround-2
Before you upgrade to the new version(which contains this fix), do not restart brokers or modify any topic-level policies.
|
@poorbarcode But based on the background of |
Ah, I see. Pulsar reads all topic policies and calls @crossoverJie I updated the comment above, please review it, thanks |
|
@poorbarcode Great job, thanks for your supplement👍. |


Fixes #21653
Motivation
Reproduce process:
#21653 (comment)
Modifications
The
registerTopicPolicyListener()function was moved fromPersistentTopic()toinitTopicPolicy().Flow diagram:
Current:

After:

Verifying this change
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: crossoverJie#19