-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Fix Broker OOM due to too many waiting cursors and reuse a recycled OpReadEntry incorrectly #24551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/client/api/FailoverSubscriptionTest.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Show resolved
Hide resolved
...broker/src/main/java/org/apache/pulsar/broker/service/persistent/PersistentSubscription.java
Show resolved
Hide resolved
|
/pulsarbot rerun-failure-checks |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24551 +/- ##
============================================
+ Coverage 73.57% 74.31% +0.74%
- Complexity 32624 32967 +343
============================================
Files 1877 1874 -3
Lines 139502 146282 +6780
Branches 15299 16777 +1478
============================================
+ Hits 102638 108710 +6072
- Misses 28908 28939 +31
- Partials 7956 8633 +677
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java
Show resolved
Hide resolved
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…e a recycled OpReadEntry incorrectly (apache#24551) (cherry picked from commit 48dc9ec)
…e a recycled OpReadEntry incorrectly (apache#24551) (cherry picked from commit 48dc9ec)
…e a recycled OpReadEntry incorrectly (apache#24551) (cherry picked from commit 48dc9ec) (cherry picked from commit f68bf2b)
…e a recycled OpReadEntry incorrectly (apache#24551) (cherry picked from commit 48dc9ec) (cherry picked from commit abf511b)
…e a recycled OpReadEntry incorrectly (apache#24551) (cherry picked from commit 48dc9ec) (cherry picked from commit abf511b)
…e a recycled OpReadEntry incorrectly (apache#24551) (cherry picked from commit 48dc9ec) (cherry picked from commit abf511b)
…e a recycled OpReadEntry incorrectly (apache#24551) (cherry picked from commit 48dc9ec) (cherry picked from commit f68bf2b)
…e a recycled OpReadEntry incorrectly (apache#24551)
…e a recycled OpReadEntry incorrectly (apache#24551)
Motivation
Background
1. Pending read if no entries to read.
managedledger.waitingCursorsif there are no entries to read.cursor.notifyEntriesAvailableonce new entries were added.2, Double-check that it has more entries.
newEntriesCheckDelayInMillisis larger than0[code-1].3. The variable waitingReadOp of cursor
OpReadEntryto trigger a new read after getting notified that new entries were added.4. The variable waitingCursors of Managed Cursor
Issue 1: The same cursor was added to
managedledger.waitingCursorsthousands of times.FailoverC1); it becomes an active consumer.C2); C2 becomes the active consumer after a choice by the dispatcherscheduleReadOnActiveConsumerscheduleReadOnActiveConsumerresets the variablecursor.waitingReadOp, but it forgets to remove the cursor frommanagedledger.waitingCursorstestWaitingCursorsCountAfterSwitchingActiveConsumersIssue 2: Reused a recycled OpReadEntry, which causes repeated delivery and other unexpected issues
OpReadEntryand set tocursor.waitingReadOpcursor.waitingReadOpOpReadEntry, which was created by the taskread entriesOpReadEntryOpReadEntryto read entries, but it has been recycled and reused by other tasksIssue 3: The method
cursor.checkForNewEntriesforgets to remove cursor frommanagedledger.waitingCursors.cursor.checkForNewEntriesimplements the double-check that checks whether there are entries to read.cursor.waitingReadOpand trigger a reading if there are entries to read after the first check.managedledger.waitingCursorsafter clearing the variablecursor.waitingReadOpModifications
3issuesmanagedledger.waitingCursorsandcursor.waitingReadOpeverywhere, Managed Cursor handles the variables itselfDocumentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: x