Skip to content

Conversation

@shibd
Copy link
Member

@shibd shibd commented Sep 11, 2025

Motivation

This PR fixes a critical bug where the KeyShared sticky mode consumer in Pulsar would incorrectly consume messages from hash ranges not explicitly assigned to it.

Specifically, if a single KeyShared consumer is configured with non-contiguous sticky ranges, e.g.:

  • [0, 9999]
  • [20000, 29999]
  • [40000, 49999]

The consumer would incorrectly receive messages for keys falling into the gaps (e.g., 15000 which is between 9999 and 20000).

The root cause lies in the HashRangeExclusiveStickyKeyConsumerSelector's select method, which previously stored only the start/end points of ranges in its rangeMap. This led to an erroneous interpretation where any hash between a range's end and the next range's start would be assigned to the previous range's consumer.

Map.Entry<Integer, Consumer> ceilingEntry = rangeMap.ceilingEntry(hash);
Map.Entry<Integer, Consumer> floorEntry = rangeMap.floorEntry(hash);
Consumer ceilingConsumer = ceilingEntry != null ? ceilingEntry.getValue() : null;
Consumer floorConsumer = floorEntry != null ? floorEntry.getValue() : null;
if (floorConsumer != null && floorConsumer.equals(ceilingConsumer)) {
return ceilingConsumer;
} else {
return null;
}

In the scenario above, this consumer would actually receive messages in the range of 0 ~ 49999 (when there is only one consumer).

You can reproduce by testing from this PR:

  • The testConsumerSelectWithMultipRanges unit test in this PR.
  • The testCustomStickyRange integration test in this PR.

BTW: v3.0 has the same issue.

Modifications

  • Core Data Structure Refactor: The rangeMap now stores the complete Range object instead of just the range's start key. This provides a foundation for precise containment checks and conflict detection.
  • Conflict Detection: The algorithm for detecting range conflicts between different consumers has been optimized for better accuracy and efficiency.

Verifying this change

  • Make sure that the change passes the CI checks.

This change added new unit tests and an integration test and can be verified as follows:

  • HashRangeExclusiveStickyKeyConsumerSelectorTest (Unit Tests):
    • testConsumerSelect: Updated to verify select(hash) returns the correct consumer only when hash is within an explicitly defined range, and null for hashes in gaps.
    • testConsumerSelectWithMultipRanges: Added to confirm a single consumer with multiple distinct sticky ranges correctly selects messages only within those ranges and returns null for hashes in gaps.
    • testOneConsumerRangeConflict: Added to verify that a consumer cannot be added if its own KeySharedMeta contains internally conflicting or invalid (start > end) ranges.
    • testSingleRangeConflict and testMultipleRangeConflict: Updated to correctly assert expected conflicts and non-conflicts based on the new strict range overlap detection logic.
  • KeySharedSubscriptionTest.testCustomStickyRange (Integration Test):
    • A new end-to-end integration test has been added. It simulates the reported scenario with a partitioned topic and two consumers assigned non-overlapping sticky ranges. This test verifies that each consumer only receives messages for keys falling within its explicitly assigned ranges, confirming the fix for the unintended auto-split behavior.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: ```

@github-actions github-actions bot added doc-required Your PR changes impact docs and you will update later. doc-not-needed Your PR changes do not impact docs and removed doc-required Your PR changes impact docs and you will update later. labels Sep 11, 2025
@shibd shibd self-assigned this Sep 11, 2025
@shibd
Copy link
Member Author

shibd commented Sep 11, 2025

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Sep 11, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.17%. Comparing base (f1b66ae) to head (3a34f8f).
⚠️ Report is 29 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24730      +/-   ##
============================================
- Coverage     74.21%   74.17%   -0.04%     
- Complexity    33463    33604     +141     
============================================
  Files          1895     1900       +5     
  Lines        147954   148401     +447     
  Branches      17130    17206      +76     
============================================
+ Hits         109805   110082     +277     
- Misses        29387    29539     +152     
- Partials       8762     8780      +18     
Flag Coverage Δ
inttests 26.42% <0.00%> (+0.13%) ⬆️
systests 22.71% <0.00%> (+0.01%) ⬆️
unittests 73.71% <100.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...e/HashRangeExclusiveStickyKeyConsumerSelector.java 100.00% <100.00%> (+5.40%) ⬆️

... and 145 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@shibd shibd requested a review from Copilot September 11, 2025 14:41
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical bug in the KeyShared sticky mode consumer where it incorrectly consumed messages from hash ranges not explicitly assigned to it. The issue was that non-contiguous sticky ranges would cause consumers to receive messages from gaps between their assigned ranges.

Key changes:

  • Refactored the rangeMap data structure to store complete Range objects instead of just start/end points
  • Enhanced conflict detection algorithm for better accuracy in detecting range overlaps
  • Added comprehensive validation for consumer's own ranges to prevent internal conflicts

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
HashRangeExclusiveStickyKeyConsumerSelector.java Core fix - refactored data structure and algorithms for precise range handling
HashRangeExclusiveStickyKeyConsumerSelectorTest.java Updated unit tests to verify the fix and added new test cases
KeySharedSubscriptionTest.java Added integration test to verify end-to-end behavior

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @shibd! A few minor comments. I guess checking whether overlapped ranges at the boundary are allowed or not is something to test and check the backwards compatibility. That was already pointed by @poorbarcode in a previous comment.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, great job @shibd

@shibd shibd merged commit e73532a into apache:master Sep 15, 2025
52 checks passed
shibd added a commit that referenced this pull request Sep 15, 2025
shibd added a commit that referenced this pull request Sep 15, 2025
lhotari pushed a commit that referenced this pull request Sep 16, 2025
lhotari pushed a commit that referenced this pull request Sep 16, 2025
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Sep 16, 2025
@lhotari lhotari added this to the 4.2.0 milestone Sep 17, 2025
ganesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
… ranges (apache#24730)

(cherry picked from commit e73532a)
(cherry picked from commit 86502cf)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
… ranges (apache#24730)

(cherry picked from commit e73532a)
(cherry picked from commit 86502cf)
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 19, 2025
… ranges (apache#24730)

(cherry picked from commit e73532a)
(cherry picked from commit 712b2a8)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 19, 2025
… ranges (apache#24730)

(cherry picked from commit e73532a)
(cherry picked from commit 712b2a8)
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants