Skip to content

[Broker] Fix Broker HealthCheck Endpoint Exposes Race Conditions#14367

Merged
codelipenghui merged 17 commits into
apache:masterfrom
mattisonchao:fix_14362
Mar 4, 2022
Merged

[Broker] Fix Broker HealthCheck Endpoint Exposes Race Conditions#14367
codelipenghui merged 17 commits into
apache:masterfrom
mattisonchao:fix_14362

Conversation

@mattisonchao

@mattisonchao mattisonchao commented Feb 18, 2022

Copy link
Copy Markdown
Member

Master Issue: #14362

Motivation

See #14362

According to relative PR #7724, we will force delete all subscriptions when calling healthCheck REST API. but it has a race condition when two threads call this API.
Please consider this case:

Thread A: Clean up all subscriptions, then create a reader.
Thread B: Prepare to force delete all subscriptions.

So, in this case, the reader of thread A is deleted and then an NPE or other exception occurs.

Modifications

  • Use Completable#handle to fix problem 1, the reader needs to be closed regardless of whether there is an exception.
  • Recheck the subscription after closing reading, and force deletion if it still exists after closing reading.
  • Added multi-threaded tests for health checks.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

  • no-need-doc

@github-actions github-actions Bot added the doc-not-needed Your PR changes do not impact docs label Feb 18, 2022
@mattisonchao

Copy link
Copy Markdown
Member Author

I need more eyes to help me review this change to avoid regression.

@codelipenghui @michaeljmarshall @Technoboy-

@michaeljmarshall

Copy link
Copy Markdown
Member

Thanks for working on this @mattisonchao!

@michaeljmarshall

Copy link
Copy Markdown
Member

@mattisonchao - I should have added in my review that your tests look good.

@michaeljmarshall michaeljmarshall added area/broker type/bug The PR fixed a bug or issue reported a bug labels Feb 18, 2022
@michaeljmarshall michaeljmarshall modified the milestones: 2.10.0, 2.11.0 Feb 18, 2022
@mattisonchao mattisonchao changed the title [Issue 14362] Fix Broker HealthCheck Endpoint Exposes Race Conditions [WIP][Issue 14362] Fix Broker HealthCheck Endpoint Exposes Race Conditions Feb 18, 2022
@mattisonchao

Copy link
Copy Markdown
Member Author

@michaeljmarshall @Technoboy-
I changed logic of this PR, PTAL :)

@mattisonchao mattisonchao changed the title [WIP][Issue 14362] Fix Broker HealthCheck Endpoint Exposes Race Conditions [Broker] Fix Broker HealthCheck Endpoint Exposes Race Conditions Feb 21, 2022
@mattisonchao

Copy link
Copy Markdown
Member Author

/pulsarbot rerun-failure-checks

@mattisonchao

Copy link
Copy Markdown
Member Author

@codelipenghui @Technoboy- @eolivelli @lhotari @michaeljmarshall @Jason918

PTAL, when you have time ~

@Technoboy-

Copy link
Copy Markdown
Contributor

@codelipenghui @Technoboy- @eolivelli @lhotari @michaeljmarshall @Jason918

PTAL, when you have time ~

LGTM.

@eolivelli eolivelli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

great work !

@michaeljmarshall

Copy link
Copy Markdown
Member

I will review this within the next couple hours, thank you for your work @mattisonchao!

@michaeljmarshall michaeljmarshall left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codelipenghui codelipenghui merged commit 4f1e39b into apache:master Mar 4, 2022
@eolivelli

Copy link
Copy Markdown
Contributor

@codelipenghui if we cut a new rc for 2.10 is would be great to have this patch.
It is a big problem for k8s users

@eolivelli

Copy link
Copy Markdown
Contributor

@mattisonchao
can you please port this patch to branch-2.8 and branch-2.9 ?
the patch does not apply cleanly

@codelipenghui codelipenghui added cherry-picked/branch-2.8 Archived: 2.8 is end of life and removed cherry-picked/branch-2.8 Archived: 2.8 is end of life labels Mar 9, 2022
gaoran10 pushed a commit to gaoran10/pulsar that referenced this pull request Mar 11, 2022
gaoran10 pushed a commit to gaoran10/pulsar that referenced this pull request Mar 11, 2022
codelipenghui pushed a commit that referenced this pull request Mar 12, 2022
@codelipenghui codelipenghui modified the milestones: 2.11.0, 2.10.0 Mar 12, 2022
codelipenghui pushed a commit that referenced this pull request Mar 14, 2022
…#14658)

Original PR #14367

Because the PR #14367 is based on PR #14091, so I want to cherry-pick these two PRs to branch-2.9, the PR #13525 is also needed.

---

Fix Issue: #14362

### Motivation

According to relative PR #7724, we will force delete all subscriptions when calling ``healthCheck`` REST API. but it has a race condition when two threads call this API.
Please consider this case:

> Thread A: Clean up all subscriptions, then create a reader.
> Thread B: Prepare to force delete all subscriptions.

So, in this case, the reader of thread A is deleted and then an NPE or other exception occurs.

### Modifications

- Use ``Completable#handle`` to fix problem 1, the reader needs to be closed regardless of whether there is an exception.
- Recheck the subscription after closing reading, and force deletion if it still exists after closing reading.
- Added multi-threaded tests for health checks.
@gaoran10

Copy link
Copy Markdown
Contributor

The #14658 cherry-pick this PR to branch-2.9.

@gaoran10 gaoran10 added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Mar 14, 2022
@codelipenghui codelipenghui added the cherry-picked/branch-2.8 Archived: 2.8 is end of life label Mar 18, 2022
Nicklee007 pushed a commit to Nicklee007/pulsar that referenced this pull request Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker cherry-picked/branch-2.8 Archived: 2.8 is end of life cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.8.4 release/2.9.2 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants