-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][client]PIP-425:Support connecting with next available endpoint for multi-endpoint serviceUrls #24387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fbf6613 to
e510a8c
Compare
…nt for multi-endpoint serviceUrls (#24394) Fixes #22934 (comment) Main Issue: #22934 (comment) Implementation: #24387 ### Motivation As #22934 and #22933 mentioned, when most of the nodes in serviceurl are down (but there is at least one available node), creating consumers and producers through PulsarClient will most likely fail. I think this is not as expected. If the code is robust enough, as long as there is one available node, it should be accessible normally. Therefore, this pip is going to optimize the logic, remove unavailable nodes through the feedback mechanism, and improve the success rate of PulsarClient requests. By the way, #22935 removes faulty nodes through a regular health check mechanism, but this brings new problems (frequent creation of connections and increased system load), so this solution is abandoned. See #22934 (comment) for more details!
72f52aa to
5d408c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the Pulsar client’s handling of multi-endpoint service URLs by implementing endpoint quarantine and host availability tracking, ensuring higher connection success when some nodes are unavailable.
- Introduces quarantine logic with configurable initial and maximum durations in the ServiceNameResolver.
- Adjusts connection acquisition in multiple components (HTTP client, connection pool, lookup service) to mark host availability.
- Updates tests and client builder APIs to verify and configure these new quarantine behaviors.
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pulsar-client/src/test/java/org/apache/pulsar/client/impl/PulsarServiceNameResolverTest.java | Added tests to verify host quarantine and recovery behavior. |
| pulsar-client/src/test/java/org/apache/pulsar/client/impl/ClientTestFixtures.java | Updated connection mocks to support the new resolver API. |
| pulsar-client/src/test/java/org/apache/pulsar/client/impl/BinaryProtoLookupServiceTest.java | Adjusted tests for resolver changes and added additional connection pathways. |
| pulsar-client/src/main/java/org/apache/pulsar/client/impl/conf/ClientConfigurationData.java | Introduced new configuration properties for quarantine durations. |
| pulsar-client/src/main/java/org/apache/pulsar/client/impl/ServiceNameResolver.java | Added default method for marking host availability. |
| pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java | Integrated quarantine logic, host tracking, and synchronized service URL updates. |
| pulsar-client/src/main/java/org/apache/pulsar/client/impl/HttpClient.java | Integrated host availability notifications during HTTP failures. |
| pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionPool.java | Added a getConnection method that leverages the new resolver interface. |
| pulsar-client/src/main/java/org/apache/pulsar/client/impl/ClientBuilderImpl.java and pulsar-client-api/src/main/java/org/apache/pulsar/client/api/ClientBuilder.java | Exposed client builder options for configuring quarantine durations. |
| pulsar-broker/src/test/java/org/apache/pulsar/client/impl/CreateConsumerProducerTest.java | Added extensive integration tests to validate unhealthy endpoint removal and reconnection behavior. |
Comments suppressed due to low confidence (1)
pulsar-broker/src/test/java/org/apache/pulsar/client/impl/CreateConsumerProducerTest.java:238
- [nitpick] The method name 'creatConsumerAndProducers' appears to contain a typo. Consider renaming it to 'createConsumerAndProducers' for clarity.
private int creatConsumerAndProducers(PulsarClientImpl pulsarClient, int createCount, String topic) {
|
The CI tests have passed in the fork repository AuroraTwinkle#8 |
nodece
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I've completed a partial review of the code.
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/PulsarServiceNameResolver.java
Show resolved
Hide resolved
…arServiceNameResolver.java Co-authored-by: Zixuan Liu <nodeces@gmail.com>
…arServiceNameResolver.java Co-authored-by: Zixuan Liu <nodeces@gmail.com>
…arServiceNameResolver.java Co-authored-by: Zixuan Liu <nodeces@gmail.com>
…arServiceNameResolverTest.java Co-authored-by: Zixuan Liu <nodeces@gmail.com>
…arServiceNameResolverTest.java Co-authored-by: Zixuan Liu <nodeces@gmail.com>
…teConsumerProducerTest.java Co-authored-by: Zixuan Liu <nodeces@gmail.com>
ed18237 to
26ce95e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24387 +/- ##
============================================
+ Coverage 73.57% 74.34% +0.77%
+ Complexity 32624 32545 -79
============================================
Files 1877 1869 -8
Lines 139502 146034 +6532
Branches 15299 16747 +1448
============================================
+ Hits 102638 108569 +5931
+ Misses 28908 28880 -28
- Partials 7956 8585 +629
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
liangyepianzhou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…nt for multi-endpoint serviceUrls (apache#24394) Fixes apache#22934 (comment) Main Issue: apache#22934 (comment) Implementation: apache#24387 ### Motivation As apache#22934 and apache#22933 mentioned, when most of the nodes in serviceurl are down (but there is at least one available node), creating consumers and producers through PulsarClient will most likely fail. I think this is not as expected. If the code is robust enough, as long as there is one available node, it should be accessible normally. Therefore, this pip is going to optimize the logic, remove unavailable nodes through the feedback mechanism, and improve the success rate of PulsarClient requests. By the way, apache#22935 removes faulty nodes through a regular health check mechanism, but this brings new problems (frequent creation of connections and increased system load), so this solution is abandoned. See apache#22934 (comment) for more details!
…int for multi-endpoint serviceUrls (apache#24387) Co-authored-by: duanlinlin <duanlinllin@xiaohongshu.com> Co-authored-by: Zixuan Liu <nodeces@gmail.com>
…nt for multi-endpoint serviceUrls (apache#24394) Fixes apache#22934 (comment) Main Issue: apache#22934 (comment) Implementation: apache#24387 ### Motivation As apache#22934 and apache#22933 mentioned, when most of the nodes in serviceurl are down (but there is at least one available node), creating consumers and producers through PulsarClient will most likely fail. I think this is not as expected. If the code is robust enough, as long as there is one available node, it should be accessible normally. Therefore, this pip is going to optimize the logic, remove unavailable nodes through the feedback mechanism, and improve the success rate of PulsarClient requests. By the way, apache#22935 removes faulty nodes through a regular health check mechanism, but this brings new problems (frequent creation of connections and increased system load), so this solution is abandoned. See apache#22934 (comment) for more details!
…int for multi-endpoint serviceUrls (apache#24387) Co-authored-by: duanlinlin <duanlinllin@xiaohongshu.com> Co-authored-by: Zixuan Liu <nodeces@gmail.com>
Fixes #22934 (comment)
Main Issue: #22934 (comment)
PIP: #24394
Motivation
As #22934 and #22933 mentioned, when most of the nodes in serviceurl are down (but there is at least one available node), creating consumers and producers through PulsarClient will most likely fail. I think this is not as expected. If the code is robust enough, as long as there is one available node, it should be accessible normally. Therefore, this PR is going to optimize the code logic, remove unavailable nodes through the feedback mechanism, and improve the success rate of PulsarClient requests.
By the way, #22935 removes faulty nodes through a regular health check mechanism, but this brings new problems (frequent creation of connections and increased system load), so this solution is abandoned. See #22934 (comment) for more details!
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository:
AuroraTwinkle#8