-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][ml] Fix race conditions in RangeCache #22789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][ml] Fix race conditions in RangeCache #22789
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #22789 +/- ##
============================================
- Coverage 73.57% 73.27% -0.31%
- Complexity 32624 32648 +24
============================================
Files 1877 1889 +12
Lines 139502 141659 +2157
Branches 15299 15543 +244
============================================
+ Hits 102638 103800 +1162
- Misses 28908 29844 +936
- Partials 7956 8015 +59
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/util/RangeCache.java
Outdated
Show resolved
Hide resolved
merlimat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work
|
This PR contained a few issues. I have a follow up PR #22814 to address the issues. Please review |
(cherry picked from commit c39f9f8)
(cherry picked from commit c39f9f8)
(cherry picked from commit c39f9f8)
Motivation
The RangeCache class contains several race conditions which cause instability.
When one thread removes the entry and another one uses it, that will become a problem.
The
cacheEvictionIntervalMssetting is 10 ms by default. This results in theRangeCache.evictLEntriesBeforeTimestampmethod getting called about 100 times per second.The default expiration is
managedLedgerCacheEvictionTimeThresholdMilliswhich is 1000 ms by default.It's also possible that 2 threads remove the entry at the same time.
ManagedLedgerImpl.invalidateEntriesUpToSlowestReaderPositionwill result in calls toRangeCache.removeRangemethod. These calls happen independently of theRangeCache.evictLEntriesBeforeTimestampcalls so there's a chance for race conditions.Modifications
.retain()and.release()callsremove(key, value)method so that removals remove the correct value exactly onceDocumentation
docdoc-requireddoc-not-neededdoc-complete