-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] fix bug caused by optimistic locking #18390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][broker] fix bug caused by optimistic locking #18390
Conversation
|
@thetumbled Please add the following content to your PR description and select a checkbox: |
|
This seems to be a bug in changes introduced by #14515 . @lordcheng10 Please review |
...n/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongLongPairHashMap.java
Outdated
Show resolved
Hide resolved
...n/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongLongPairHashMap.java
Outdated
Show resolved
Hide resolved
|
There's a checkstyle failure in https://github.com/thetumbled/pulsar/actions/runs/3418835699/jobs/5691621649#step:8:996 |
|
But I'm still not sure about the following logic // First try optimistic locking
long storedKey1 = table[bucket];
long storedKey2 = table[bucket + 1];
long storedValue1 = table[bucket + 2];
long storedValue2 = table[bucket + 3];
if (!acquiredLock && validate(stamp)) {
// The values we have read are consistent
if (key1 == storedKey1 && key2 == storedKey2) {
return new LongPair(storedValue1, storedValue2);Even if storedKey1 and storeKey2 matched the expected keys, are storedValue1 and storedValue2 guaranteed to be exactly the associated values? This PR covered the case that table is shrinking, which makes the index invalid. However, could it happen that when retrieving |
8290967 to
5c73403
Compare
done |
if |
|
And |
i don't find that |
|
Yes, only |
|
Oh my fault, here is a writeLock in the |
|
The Maybe there is the same bug in bookkeeper, or it only occur in getting stored value? |
...n/src/main/java/org/apache/pulsar/common/util/collections/ConcurrentLongLongPairHashMap.java
Outdated
Show resolved
Hide resolved
@thetumbled It's not true. |
In the For example: |
|
Oh I see, thanks for your explanation. |
it seems that there is the same bug in bookkeeper. |
This inspires me. Can we fix the boolean acquiredLock = false;
// add local variable here, so OutOfBound won't happen
long[] table = this.table;
// calculate table.length / 4 as capacity to avoid rehash changing capacity
int bucket = signSafeMod(keyHash, table.length / 4);
try {
while (true) {
long storedKey1 = table[bucket];
long storedKey2 = table[bucket + 1];
long storedValue1 = table[bucket + 2];
long storedValue2 = table[bucket + 3];
// compare table to ensure not rehash
if (!acquiredLock && validate(stamp) && table == this.table) {
if (key1 == storedKey1 && key2 == storedKey2) {
return new LongPair(storedValue1, storedValue2);
} else if (storedKey1 == EmptyKey) {
return null;
}
} else {
if (!acquiredLock) {
stamp = readLock();
acquiredLock = true;
// update local variable
table = this.table;
bucket = signSafeMod(keyHash, capacity);
storedKey1 = table[bucket];
storedKey2 = table[bucket + 1];
storedValue1 = table[bucket + 2];
storedValue2 = table[bucket + 3];
} |
|
/pulsarbot run-failure-checks |
|
/pulsarbot run-failure-checks |
(cherry picked from commit 2aa8c3b)
(cherry picked from commit 2aa8c3b)


Fixes #18388
Motivation
ConcurrentLongLongPairHashMap use optimistic locking to read data in Section, but if there is another thread that is removing data from the same Section, and trigger the shrink process, then the index calculated basing on dirty capacity exceed the array size, an ArrayIndexOut0fBoundsException will be throw, which is not handled now. Eventually the connection with client will be closed.

There are same problems with
org.apache.pulsar.common.util.collections.ConcurrentLongHashMap, which is referenced in many places.Modifications
use local backup variable, so OutOfBound won't happen.
Verifying this change
This change is already covered by existing tests, such as (please describe tests).
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: thetumbled#5