Skip to content

Healthcheck v2 failed#13525

Merged
massakam merged 2 commits into
apache:masterfrom
gaozhangmin:healthcheck-v2-failed
Jan 7, 2022
Merged

Healthcheck v2 failed#13525
massakam merged 2 commits into
apache:masterfrom
gaozhangmin:healthcheck-v2-failed

Conversation

@gaozhangmin

@gaozhangmin gaozhangmin commented Dec 27, 2021

Copy link
Copy Markdown
Contributor

fix #13520

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

  • doc-required

    (If you need help on updating docs, create a doc issue)

  • no-need-doc

    (Please explain why)

  • doc

    (If this PR contains doc changes)

@github-actions github-actions Bot added the doc-not-needed Your PR changes do not impact docs label Dec 27, 2021

@liudezhi2098 liudezhi2098 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test?

@gaozhangmin gaozhangmin force-pushed the healthcheck-v2-failed branch 3 times, most recently from c32a2ef to b50c1ab Compare December 28, 2021 07:03
@gaozhangmin

gaozhangmin commented Dec 28, 2021

Copy link
Copy Markdown
Contributor Author

@liudezhi2098 The test is already covered by org.apache.pulsar.broker.admin.AdminApiHealthCheckTest#testHealthCheckupV2
The previous unit test testHealthCheckupV2 created a heartbeat namespace Manually before running health check.
But, In fact, the heartbeat namespace would not be created in cluster,so, no policies znode under admin. that's why this issue happened

@eolivelli eolivelli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks like adding some tweaks specific for this special topic.

Did you check if there is a better way?
(I am sorry I don't have time today to send you a suggestion)

@gaozhangmin gaozhangmin force-pushed the healthcheck-v2-failed branch 4 times, most recently from 49c2da7 to e1c4aec Compare December 30, 2021 11:54
@gaozhangmin

Copy link
Copy Markdown
Contributor Author

This PR looks like adding some tweaks specific for this special topic.

@eolivelli The direct reason is that heartbeat namespace was not created. just registered by registerBootstrapNamespaces during startup. The optional fix maybe, we can create this heartbeat namespace during startup.

@gaozhangmin

Copy link
Copy Markdown
Contributor Author

/pulsarbot run-failure-checks

@massakam

massakam commented Jan 5, 2022

Copy link
Copy Markdown
Contributor

There may also be namespaces under the pulsar tenant that are not for healthcheck. For example, it is possible for a user to create a namespace named pulsar/test. For such namespaces, we should not skip these checks.

@gaozhangmin gaozhangmin force-pushed the healthcheck-v2-failed branch from e1c4aec to b168fa6 Compare January 5, 2022 06:51
@gaozhangmin

Copy link
Copy Markdown
Contributor Author

There may also be namespaces under the pulsar tenant that are not for healthcheck. For example, it is possible for a user to create a namespace named pulsar/test. For such namespaces, we should not skip these checks.

Fixed in the last commit.

@massakam

massakam commented Jan 5, 2022

Copy link
Copy Markdown
Contributor

I think NonPersistentTopic.java should be modified as well as PersistentTopic.java.

public CompletableFuture<Void> checkReplication() {
TopicName name = TopicName.get(topic);
if (!name.isGlobal()) {
return CompletableFuture.completedFuture(null);
}

@gaozhangmin gaozhangmin requested a review from eolivelli January 6, 2022 02:18
@gaozhangmin gaozhangmin requested a review from massakam January 6, 2022 04:36

@massakam massakam left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@massakam massakam added this to the 2.10.0 milestone Jan 6, 2022

@eolivelli eolivelli left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gaozhangmin

Copy link
Copy Markdown
Contributor Author

/pulsarbot run-failure-checks

@gaozhangmin gaozhangmin force-pushed the healthcheck-v2-failed branch from 5539b5c to edde84b Compare January 7, 2022 03:50

@Jason918 Jason918 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@massakam massakam merged commit b38d850 into apache:master Jan 7, 2022
gaoran10 pushed a commit to gaoran10/pulsar that referenced this pull request Mar 11, 2022
* fix healthcheck v2

* fix failed testHealthCheckupV2 because of error web port

Co-authored-by: gavingaozhangmin <gavingaozhangmin@didiglobal.com>
(cherry picked from commit b38d850)
gaoran10 pushed a commit to gaoran10/pulsar that referenced this pull request Mar 13, 2022
* fix healthcheck v2

* fix failed testHealthCheckupV2 because of error web port

Co-authored-by: gavingaozhangmin <gavingaozhangmin@didiglobal.com>
(cherry picked from commit b38d850)
codelipenghui pushed a commit that referenced this pull request Mar 14, 2022
…#14658)

Original PR #14367

Because the PR #14367 is based on PR #14091, so I want to cherry-pick these two PRs to branch-2.9, the PR #13525 is also needed.

---

Fix Issue: #14362

### Motivation

According to relative PR #7724, we will force delete all subscriptions when calling ``healthCheck`` REST API. but it has a race condition when two threads call this API.
Please consider this case:

> Thread A: Clean up all subscriptions, then create a reader.
> Thread B: Prepare to force delete all subscriptions.

So, in this case, the reader of thread A is deleted and then an NPE or other exception occurs.

### Modifications

- Use ``Completable#handle`` to fix problem 1, the reader needs to be closed regardless of whether there is an exception.
- Recheck the subscription after closing reading, and force deletion if it still exists after closing reading.
- Added multi-threaded tests for health checks.
@gaoran10

Copy link
Copy Markdown
Contributor

The #14658 cherry-pick this PR to branch-2.9.

@gaoran10 gaoran10 added the cherry-picked/branch-2.9 Archived: 2.9 is end of life label Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/2.9.2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broker health check with TopicVersion.V2 failed

6 participants