-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][client] Avoid recycling the same ConcurrentBitSetRecyclable among different threads #24725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][client] Avoid recycling the same ConcurrentBitSetRecyclable among different threads #24725
Conversation
…oss different threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a concurrency issue where ConcurrentBitSetRecyclable objects were being recycled incorrectly when multiple threads called flush() simultaneously. The fix changes the data structure from ConcurrentHashMap to ConcurrentSkipListMap and uses atomic polling operations to prevent race conditions.
- Replace
ConcurrentHashMapwithConcurrentSkipListMapforpendingIndividualBatchIndexAcks - Use
pollFirstEntry()instead of iterator-based removal to ensure atomic operations - Add null check to handle concurrent removal scenarios
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
...nt/src/main/java/org/apache/pulsar/client/impl/PersistentAcknowledgmentsGroupingTracker.java
Outdated
Show resolved
Hide resolved
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #24725 +/- ##
============================================
- Coverage 74.27% 74.14% -0.14%
+ Complexity 33569 33117 -452
============================================
Files 1896 1896
Lines 148111 148109 -2
Branches 17164 17163 -1
============================================
- Hits 110009 109814 -195
- Misses 29370 29519 +149
- Partials 8732 8776 +44
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
…ng different threads (apache#24725) (cherry picked from commit 14543d3) (cherry picked from commit ba50b60)
…ng different threads (apache#24725) (cherry picked from commit 14543d3) (cherry picked from commit ba50b60)
…ng different threads (apache#24725) (cherry picked from commit 14543d3) (cherry picked from commit 498571d)
…ng different threads (apache#24725) (cherry picked from commit 14543d3) (cherry picked from commit 498571d)
…ng different threads (apache#24725) (cherry picked from commit 14543d3) (cherry picked from commit 498571d)
…ng different threads (apache#24725) (cherry picked from commit 14543d3)
…ng different threads (apache#24725)
…ng different threads (apache#24725)
Fixes #24724
Modifications
Change
pendingIndividualBatchIndexAcksto aConcurrentSkipListMapand callpollFirst()in the loop to ensure concurrentflushcall won't copy the reference to theConcurrentBitSetRecyclabletwice tonewMultiMessageAckCommonin different threads, which recycles the object.Verifying this change
It's hard to write a unit test because this race condition is hard to simulate.
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository:
Additional information
This PR is a simple fix on the issue that avoids refactorings. However, it might not be worth allowing
flushto be called concurrently. It's hard to test the race condition and could sendThe advantage is just to avoid coarse grained lock on all methods that access the following fields:
pendingIndividualAckspendingIndividualBatchIndexAckslastCumulativeAckHowever, from my perspective, it's a pre-mature optimization that makes code error-prone. Because acquiring a lock for all
acknowledgeAsynccalls should not be a bottle neck of consumer side, whose time consuming tasks are mainly the business logic that handles messages andreceivecalls that have many lock acquirements as well.The use of
ConcurrentBitSetRecyclableis also a pre-mature optimization that results to the bug described in #24724.As I've shared in https://lists.apache.org/thread/b5r13oz24y935p6o8tfwf578xk35wwpf, sharing a recyclable object among different threads is not a good practice, which introduces more overhead and bypasses the performance benefits of the thread-local stacks. There might be still other potential issues with the use of
ConcurrentBitSetRecyclable, I might open another PR for the code refactoring.