Skip to content

Conversation

@mattisonchao
Copy link
Member

@mattisonchao mattisonchao commented Oct 26, 2023

Motivation

PR #21231 made user topic creation rely on system topic __change_event if the user is enabling topicLevelPoliciesEnabled.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create __change_event reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

Modifications

  • Get the topic policy before loading.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@mattisonchao mattisonchao self-assigned this Oct 26, 2023
@mattisonchao mattisonchao marked this pull request as draft October 26, 2023 02:14
@mattisonchao mattisonchao added this to the 3.2.0 milestone Oct 26, 2023
@mattisonchao mattisonchao changed the title [fix][broker] Fix bundle stuck in unloading status [fix][broker] Fix namespace bundle stuck in unloading status Oct 26, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 26, 2023
@mattisonchao mattisonchao added the category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost label Oct 26, 2023
@mattisonchao mattisonchao added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Oct 26, 2023
@mattisonchao mattisonchao marked this pull request as ready for review October 26, 2023 10:12
@mattisonchao mattisonchao reopened this Oct 26, 2023
@mattisonchao mattisonchao marked this pull request as draft November 1, 2023 23:04
@mattisonchao mattisonchao marked this pull request as ready for review November 2, 2023 11:14
@mattisonchao mattisonchao reopened this Nov 2, 2023
});
}
return loadOrCreatePersistentTopic(tpName, createIfMissing, properties);
final CompletableFuture<Optional<TopicPolicies>> topicPoliciesFuture =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test to cover the getTopic method and the TopicPolicies are loaded?

Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good.
Just left a few minor comments.

@codecov-commenter
Copy link

codecov-commenter commented Nov 8, 2023

Codecov Report

Attention: Patch coverage is 94.33962% with 9 lines in your changes missing coverage. Please review.

Project coverage is 73.24%. Comparing base (eebd821) to head (3bda31b).
Report is 1377 commits behind head on master.

Files with missing lines Patch % Lines
...rg/apache/pulsar/broker/service/BrokerService.java 95.45% 5 Missing and 2 partials ⚠️
.../service/SystemTopicBasedTopicPoliciesService.java 60.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #21445      +/-   ##
============================================
+ Coverage     73.23%   73.24%   +0.01%     
+ Complexity    32674    32648      -26     
============================================
  Files          1892     1892              
  Lines        140632   140645      +13     
  Branches      15467    15467              
============================================
+ Hits         102986   103010      +24     
+ Misses        29565    29519      -46     
- Partials       8081     8116      +35     
Flag Coverage Δ
inttests 24.14% <69.18%> (-0.03%) ⬇️
systests 24.86% <64.77%> (+0.13%) ⬆️
unittests 72.52% <94.33%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../service/SystemTopicBasedTopicPoliciesService.java 73.10% <60.00%> (-1.13%) ⬇️
...rg/apache/pulsar/broker/service/BrokerService.java 80.89% <95.45%> (-0.04%) ⬇️

... and 89 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@codelipenghui codelipenghui merged commit 428c18c into apache:master Nov 8, 2023
Technoboy- pushed a commit that referenced this pull request Nov 10, 2023
PR #21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

- Get the topic policy before loading.
@mattisonchao mattisonchao deleted the fix/unloading_stuck branch November 10, 2023 13:12
nborisov pushed a commit to nborisov/pulsar that referenced this pull request Nov 13, 2023
…21445)

### Motivation

PR apache#21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.


### Modifications

- Get the topic policy before loading.
poorbarcode pushed a commit that referenced this pull request Nov 14, 2023
Technoboy- added a commit that referenced this pull request Nov 16, 2023
coderzc pushed a commit to coderzc/pulsar that referenced this pull request Nov 21, 2023
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Dec 2, 2023
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
…21445)

PR apache#21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

- Get the topic policy before loading.
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Dec 20, 2023
…21445)

PR apache#21231 made user topic creation rely on system topic `__change_event` if the user is enabling `topicLevelPoliciesEnabled`.

It will introduce a race condition with namespace bundle unloading. All creating topics want to create `__change_event` reader but constantly fail by namespace-bundle inactive and retry mechanism. Unfortunately, the namespace-bundle unloading operation is waiting for all the topics to be completed and then release inactive status. Therefore, they will be stuck in a deadlock until one gets a timeout.

- Get the topic policy before loading.
BewareMyPower added a commit to BewareMyPower/pulsar that referenced this pull request Feb 2, 2024
…licies before loading topics

### Motivation

apache#21445 introduces an implicit
requirement that if the topic policies cannot be loaded,
`BrokerService#getTopic` will fail and then the client will retry
loading the topic.

It could break some existing usages like the tests in the C++
client: apache/pulsar-client-cpp#394

This change is applied only to avoid the race condition when unloading a
namespace bundle. However, the client should not fail if the topic-level
policies are not available.

### Modifications

If the topic policies cannot be loaded due to a non-retryable error, we
should not fail the `getTopic`. For retryable errors, the client will
still retry until the broker gets the topic-level policies successfully
after some attempts.

Modify `TokenAuthenticatedProducerConsumerTest` to protect the change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost cherry-picked/branch-2.10 cherry-picked/branch-2.11 cherry-picked/branch-3.0 cherry-picked/branch-3.1 doc-not-needed Your PR changes do not impact docs ready-to-test release/blocker Indicate the PR or issue that should block the release until it gets resolved release/2.10.6 release/2.11.3 release/3.0.2 release/3.1.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants