[improve][broker] Prevent range conflicts with Key Shared sticky consumers when TCP/IP connections get orphaned #20026

lhotari · 2023-04-06T08:23:19Z

Motivation

In Kubernetes upgrades, it's possible to get into a state where a TCP/IP connection seems to be alive on the broker. The connection is orphaned and therefore consuming resources and causing resource conflicts with producers and consumers.
This PR intends to target the issue with consumers that are using the Key shared subscription type with sticky hash ranges. The prevents the issue of "Range conflict with consumer ..." which happens when the previous connection from the client is still active on the broker while the client is reconnecting on a new connection.

Modifications

make Dispatch.addConsumer and StickyKeyConsumerSelector.addConsumer asynchronous
add connectionLivenessCheckTimeoutMillis setting that defaults to 5000ms
when a conflicting consumer is found, first do an active check with Pulsar's Ping command to see if the connection is alive.
resume the attempt to add the consumer after the check completes

Documentation

doc
doc-required
doc-not-needed
doc-complete

…umers when TCP/IP connections hang - make Dispatch.addConsumer and StickyKeyConsumerSelector.addConsumer asynchronous - add connectionLivenessCheckTimeoutMillis setting that defaults to 5000ms - when a conflicting consumer is found, first do an active check with Pulsar's Ping command to see if the connection is alive. - resume the attempt to add the consumer after the check completes

michaeljmarshall

LGTM, great work @lhotari!

michaeljmarshall · 2023-04-10T19:30:04Z

.../main/java/org/apache/pulsar/broker/service/HashRangeExclusiveStickyKeyConsumerSelector.java

-    public void addConsumer(Consumer consumer) throws BrokerServiceException.ConsumerAssignException {
-        validateKeySharedMeta(consumer);
+    public synchronized CompletableFuture<Void> addConsumer(Consumer consumer) {
+        return validateKeySharedMeta(consumer).thenRun(() -> {


Nit: a minor optimization could assign validateKeySharedMeta to a local variable, and then check if that future is already completed. When it is, we know we had the synchronized lock when the validation was done, which would allow us to skip the secondary call to findConflictingConsumer. I am not sure how expensive that call is, but I assume it has some cost that adds up with many key shared consumers.

Since it is an optimization, I don't think we should hold up this PR for that.

…umers when TCP/IP connections get orphaned (apache#20026)

…umers when TCP/IP connections get orphaned (#174) upstream PR apache#20026

poorbarcode · 2023-09-14T04:11:13Z

@lhotari

The PR #21155 fixes an issue in which the producer sends messages timeout due to the inability to reconnect successfully. The root cause is the client created a new connection to reregister the producer after it assumed the old client was invalidated, but at the same time, the broker assumed the old connection was still validated, so the client got an error "Producer with name 'st-0-5' is already connected to topic".

The fix in the PR #21155 tries to start a new heartbeat after the broker receives different connections for the same producer registration.

In this fix, the PR #21155 uses a tool method Servercnx.checkConnectionLiveness( introduced in the current PR).
I want to cherry-pick the tool method Servercnx.checkConnectionLiveness into branch-2.10 and branch-2.11 along with the PR #21155

I also send a discuss to do this.

lhotari added area/broker ready-to-test labels Apr 6, 2023

lhotari requested review from eolivelli, merlimat, michaeljmarshall and nicoloboschi April 6, 2023 08:23

lhotari self-assigned this Apr 6, 2023

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Apr 6, 2023

lhotari mentioned this pull request Apr 6, 2023

[improve][broker] Prevent range conflicts with Key Shared sticky consumers when TCP/IP connections get orphaned datastax/pulsar#174

Merged

lhotari force-pushed the lh-broker-dead-connection-detection branch from d9c6aef to 4203566 Compare April 6, 2023 11:45

lhotari force-pushed the lh-broker-dead-connection-detection branch from 4203566 to ef9698e Compare April 6, 2023 15:21

lhotari requested review from codelipenghui and massakam April 8, 2023 16:45

michaeljmarshall approved these changes Apr 10, 2023

View reviewed changes

lhotari merged commit 08b28f5 into apache:master Apr 10, 2023

lhotari added this to the 3.0.0 milestone Apr 10, 2023

Demogorgon314 pushed a commit to Demogorgon314/pulsar that referenced this pull request Apr 11, 2023

[improve][broker] Prevent range conflicts with Key Shared sticky cons…

e8e7326

…umers when TCP/IP connections get orphaned (apache#20026)

lhotari added a commit to datastax/pulsar that referenced this pull request Apr 13, 2023

[improve][broker] Prevent range conflicts with Key Shared sticky cons…

167ff8b

…umers when TCP/IP connections get orphaned (#174) upstream PR apache#20026

poorbarcode mentioned this pull request May 28, 2023

[fix][broker] Fix recentlyJoinedConsumers to address the out-of-order issue #20179

Closed

15 tasks

equanz mentioned this pull request Jul 24, 2023

[improve][pip] PIP-282: Change definition of the recently joined consumers position #20776

Merged

14 tasks

This was referenced Sep 14, 2023

[fix] [broker] Make specified producer could override the previous one #21155

Merged

[fix] [broker] Make the new exclusive consumer instead the inactive one faster #21183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[improve][broker] Prevent range conflicts with Key Shared sticky consumers when TCP/IP connections get orphaned #20026

[improve][broker] Prevent range conflicts with Key Shared sticky consumers when TCP/IP connections get orphaned #20026

Uh oh!

lhotari commented Apr 6, 2023

Uh oh!

michaeljmarshall left a comment

Uh oh!

michaeljmarshall Apr 10, 2023

Uh oh!

poorbarcode commented Sep 14, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[improve][broker] Prevent range conflicts with Key Shared sticky consumers when TCP/IP connections get orphaned #20026

[improve][broker] Prevent range conflicts with Key Shared sticky consumers when TCP/IP connections get orphaned #20026

Uh oh!

Conversation

lhotari commented Apr 6, 2023

Motivation

Modifications

Documentation

Uh oh!

michaeljmarshall left a comment

Choose a reason for hiding this comment

Uh oh!

michaeljmarshall Apr 10, 2023

Choose a reason for hiding this comment

Uh oh!

poorbarcode commented Sep 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

poorbarcode commented Sep 14, 2023 •

edited

Loading