Skip to content

[Bug] Subscription consumption stuck on consumer reconnect #21199

@ghost

Description

Search before asking

  • I searched in the issues and found nothing similar.

Version

Pulsar broker: 2.8.4
Java Pulsar client: 2.8.4

Minimal reproduce step

Non-partitioned topic. Batching is disabled on both producer and consumer. No acknowledge timeout. 5 subscriptions, each has 12 consumers.

One consumer of one subscription fails to process a message and doesn't ack it.
On a fail, I give the consumer a minute more to try to process other messages and ack them, if they are processed successfully. After a minute, I recreate the consumer and try to reprocess the messages, which would help if the error was transient.

What did you expect to see?

I expected to see the subscription backlog consumed further by the consumer with 1 failed message and by the other 11 consumers.

What did you see instead?

If a consumer fails to process one message, processing of all other messages with other keys is also stalled.
Including the other 11 consumers of the subscription.
All the other subscriptions and their consumers of the topic continue processing as expected.

As a symptom, I see the stuck subscription has "waitingReadOp" : false and "subscriptionHavePendingRead" : false, while the other subscription has these fields at true.

stats.txt
stats-internal.txt

Anything else?

The message rate is about 50 messages per second. The same scenario with a few (1-2-5) messages per minute works as expected. So, I believe there might be some race condition.

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThe PR fixed a bug or issue reported a bug

    Type

    No type

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions