Skip to content

Conversation

@nborisov
Copy link
Contributor

Fixes #23845

Motivation

There is a not covered scenario for draining hashes and key shared subscriptions at PersistentStickyKeyDispatcherMultipleConsumers. The detailed scenario described at #23845 (comment)
As a result draining hashes could contain entries with consumer which was stopped. This leads consuming to get stuck.

Modifications

Decrease draining hash entry refCount in case its hash range returned to the initial owner which holds the entry in pending acks.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added integration test org.apache.pulsar.client.api.KeySharedSubscriptionTest#testMessageDeliveredFromDrainingHashes to verify the scenario
  • Existing unit tests were modified to check changes applied in PR: org.apache.pulsar.broker.service.ConsistentHashingStickyKeyConsumerSelectorTest#testShouldNotSwapExistingConsumers, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerRemovedHashRanges_NoChanges, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerUpdatedHashRanges_RangeAdded, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerRemovedHashRanges_RangeUpdated, org.apache.pulsar.broker.service.ConsumerHashAssignmentsSnapshotTest#testResolveConsumerUpdatedHashRanges_OverlappingRanges

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: nborisov#6

Nikolai Borisov added 2 commits September 12, 2025 13:29
…deliver messages from the replay queue after a consumer disconnects and leaves a backlog unless new messages are produced
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎯 perfect job, @nborisov

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 92.30769% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.14%. Comparing base (da0d116) to head (9e6ef28).
⚠️ Report is 55 commits behind head on master.

Files with missing lines Patch % Lines
...ersistentStickyKeyDispatcherMultipleConsumers.java 71.42% 0 Missing and 2 partials ⚠️
...e/pulsar/broker/service/DrainingHashesTracker.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24736      +/-   ##
============================================
- Coverage     74.25%   74.14%   -0.11%     
- Complexity    33174    33595     +421     
============================================
  Files          1884     1900      +16     
  Lines        146943   148396    +1453     
  Branches      16882    17208     +326     
============================================
+ Hits         109110   110029     +919     
- Misses        29163    29574     +411     
- Partials       8670     8793     +123     
Flag Coverage Δ
inttests 26.35% <71.79%> (-0.32%) ⬇️
systests 22.74% <71.79%> (-0.56%) ⬇️
unittests 73.68% <92.30%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...roker/service/ConsumerHashAssignmentsSnapshot.java 86.40% <100.00%> (+0.69%) ⬆️
...pulsar/broker/service/ImpactedConsumersResult.java 92.30% <100.00%> (+4.80%) ⬆️
...pache/pulsar/broker/service/UpdatedHashRanges.java 80.00% <100.00%> (ø)
...e/pulsar/broker/service/DrainingHashesTracker.java 83.90% <75.00%> (+1.55%) ⬆️
...ersistentStickyKeyDispatcherMultipleConsumers.java 84.61% <71.42%> (-1.01%) ⬇️

... and 218 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nborisov
Copy link
Contributor Author

@lhotari Could you please clarify if I need to add more coverage for changed files? Not sure how to deal with the report: it shows lines from PersistentStickyKeyDispatcherMultipleConsumers which were covered by org.apache.pulsar.client.api.KeySharedSubscriptionTest#testMessageDeliveredFromDrainingHashes

@lhotari lhotari merged commit 80beab6 into apache:master Sep 12, 2025
54 of 55 checks passed
@lhotari
Copy link
Member

lhotari commented Sep 12, 2025

@lhotari Could you please clarify if I need to add more coverage for changed files? Not sure how to deal with the report: it shows lines from PersistentStickyKeyDispatcherMultipleConsumers which were covered by org.apache.pulsar.client.api.KeySharedSubscriptionTest#testMessageDeliveredFromDrainingHashes

The coverage report isn't very accurate. Coverage reports get merged from a large amount of build jobs and there seems to be gaps. We currently use it as a tool to get some level of visibility in code coverage.

lhotari pushed a commit that referenced this pull request Sep 12, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (#24736)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
(cherry picked from commit 80beab6)
lhotari pushed a commit that referenced this pull request Sep 12, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (#24736)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
(cherry picked from commit 80beab6)
ganesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (apache#24736)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
(cherry picked from commit 80beab6)
(cherry picked from commit 63c25e6)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (apache#24736)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
(cherry picked from commit 80beab6)
(cherry picked from commit 63c25e6)
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (apache#24736)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
… from the replay queue after a consumer disconnects and leaves a backlog (apache#24736)

Co-authored-by: Nikolai Borisov <nikolai.borisov@onde.app>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants