Skip to content

Conversation

@poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Feb 20, 2025

Motivation

Background: the normal steps of adding an entry

  • Gains BKC's IO thread
  • Triggers the callback after gaining the IO thread
    • Write data to servers
  • Triggers PendingAddOp.writeComplete after receiving the response from BK servers.
  • Triggers all succeed callbacks in the pending queue

Background: the steps of disconnection

  • Gains BKC's IO thread
  • Triggers a failed callback after gaining the IO thread
    • Triggers a failed PendingAddOp.writeComplete. You can reproduce this flow by the new test testAddEntriesCallbackWithBKClientThread
    • (Highlight) If the writing is already complete, this process will also trigger all successful callbacks in the pending queue even if the current writing is failed[code-1]

Issue-1: write stuck due to pending add callback by multiple threads

  • Settings
    • Ensembles: 3
    • WriteQuoram: 3
    • AckQuoram: 2
steps write entries client->BK1 client->BK2 client-> BK3
1 write BKs
2 start writing start writing start writing
3 write success
ack: 1/3
write success
ack: 2/3
4 mark the writing as complete since ack quorum is 2/3
5 connected and disconnected
6 Triggers a failed PendingAddOp.writeComplete
8 thread: bookkeeper workers thread: client-server io
7 Trigger all succeed callbacks in the pending queue Trigger all succeed callbacks in the pending queue[code-1]

Since there are multiple threads that will trigger all successful callbacks in the pending queue, it may cause the following race condition[code-2]

  • BTW, the thread-1 and thread-2 may be triggered by different PendingAddOps
steps thread-1 thread-2
1 peek pending addOp from the queue peek pending addOp from the queue
2 check it is the first item: success check it is the first item: success
3 call queue.pop call queue.pop
4 Issue: the second OP will never get a callback triggering

[1] code link: https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/PendingAddOp.java#L307

// PendingAddOp.writeComplete
public synchronized void writeComplete(int rc, long ledgerId, long entryId, BookieId addr, Object ctx) {
        if (completed) {
            sendAddSuccessCallbacks();
            maybeRecycle();
            return;
}

[2] code-link: https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2092-L2124

    // LedgerHandle.sendAddSuccessCallbacks
    void sendAddSuccessCallbacks() {
        while ((pendingAddOp = pendingAddOps.peek()) != null && !changingEnsemble) {
            if (!pendingAddOp.completed) {
                return;
            }
            pendingAddOps.remove();
            explicitLacFlushPolicy.updatePiggyBackedLac(lastAddConfirmed);
            pendingAddsSequenceHead = pendingAddOp.entryId;
            pendingAddOp.submitCallback(BKException.Code.OK);
        }
    }

Issue-2: ledger will be closed with a incorrect length

Since the task that triggers all successful callbacks in the pending queue may be run in IO thread, the task "triggers all successful callbacks in the pending queue" and closing ledger may concurrectly execute

steps worker-thread io-thread
1 start closing ledger peek pending addOp from the queue
2 check it is the first item: success
3 drain pending adds[code-3]
4 reduce ledger.length which was popped out from the queue[code-3]
5 call queue.pop and pop nothing
6 update ledger.LAC

The variables ledger.LAC and ledger.length do not match

[3] code-link: https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerHandle.java#L2076-L2084

    synchronized List<PendingAddOp> drainPendingAddsAndAdjustLength() {
        PendingAddOp pendingAddOp;
        List<PendingAddOp> opsDrained = new ArrayList<PendingAddOp>(pendingAddOps.size());
        while ((pendingAddOp = pendingAddOps.poll()) != null) {
            addToLength(-pendingAddOp.entryLength);
            opsDrained.add(pendingAddOp);
        }
        return opsDrained;
    }

The issue we encountered

A pulsar topic is stuck at ClosingLedger state

pulsar topic stats

{
  "entriesAddedCounter" : 12485917,
  "numberOfEntries" : 126383,
  "totalSize" : 53668291,
  "currentLedgerEntries" : 137418,
  "currentLedgerSize" : 58374388,
  "lastLedgerCreatedTimestamp" : "2025-02-06T09:13:45.371Z",
  "waitingCursorsCount" : 1,
  "pendingAddEntriesCount" : 8342123,
  "lastConfirmedEntry" : "41901:126416",
  "state" : "ClosingLedger",
  "ledgers" : [ {
    "ledgerId" : 41901,
    "entries" : 0,
    "size" : 0,
    "offloaded" : false,
    "underReplicated" : false
  } ],
  "cursors" : {
...

logs

2025-02-06T09:14:06,421+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901, 126417) to bookie (2, bookie-0:3181): Bookie handle is not available"
2025-02-06T09:14:06,421+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901, 126412) to bookie (2, bookie-0:3181): Bookie handle is not available"
2025-02-06T09:14:06,421+0000 [pulsar-io-3-5] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: bookie-0:3181 [id: 0xbc663b9b, L:/10.247.4.112:59578 - R:bookie-0/10.247.4.38:3181]"
2025-02-06T09:14:06,421+0000 [pulsar-io-3-5] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xbc663b9b, L:/10.247.4.112:59578 - R:bookie-0/10.247.4.38:3181] authenticated as BookKeeperPrincipal{ANONYMOUS}"
2025-02-06T09:14:06,423+0000 [BookKeeperClientWorker-OrderedExecutor-2-0] INFO  org.apache.bookkeeper.client.LedgerHandle - New Ensemble: [bookie-3:3181, bookie-2:3181, bookie-1:3181] for ledger: 41901"
2025-02-06T09:14:06,423+0000 [pulsar-io-3-5] WARN  org.apache.bookkeeper.proto.PerChannelBookieClient - Exception caught on:[id: 0xbc663b9b, L:/10.247.4.112:59578 - R:bookie-0/10.247.4.38:3181] cause: recvAddress(..) failed: Connection reset by peer"
2025-02-06T09:14:06,423+0000 [pulsar-io-3-5] INFO  org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xbc663b9b, L:/10.247.4.112:59578 ! R:bookie-0/10.247.4.38:3181]"
2025-02-06T09:14:06,423+0000 [pulsar-io-3-6] WARN  org.apache.bookkeeper.mledger.impl.OpAddEntry - [{topic}] The add op is terminal legacy callback for entry 41901-126416 adding."
2025-02-06T09:14:06,423+0000 [pulsar-io-3-6] WARN  org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (41901, 126426) to bookie (2, bookie-1:3181): Bookie handle is not available"
2025-02-06T09:14:06,423+0000 [BookKeeperClientWorker-OrderedExecutor-3-0] WARN  org.apache.bookkeeper.client.BookieWatcherImpl - replaceBookie for bookie: bookie-1:3181 in ensemble: [bookie-3:3181, bookie-2:3181, bookie-1:3181] is not adhering to placement policy and chose bookie-0:3181. excludedBookies [bookie-1:3181] and quarantinedBookies [bookie-2:3181]"

Changes

Switch the thread to Bookkeeper works if the connection is broken.

@lhotari
Copy link
Member

lhotari commented Feb 20, 2025

Great analysis @poorbarcode. IIRC, there might be some other pending PRs in this area where the review isn't complete. I'll check if they are related.

@poorbarcode
Copy link
Contributor Author

rerun failure checks

@poorbarcode poorbarcode closed this Mar 5, 2025
@poorbarcode poorbarcode reopened this Mar 5, 2025
@dlg99
Copy link
Contributor

dlg99 commented Apr 9, 2025

@poorbarcode please take a look at the tests - there failures/timeouts

@StevenLuMT StevenLuMT closed this Apr 10, 2025
@StevenLuMT StevenLuMT reopened this Apr 10, 2025
@StevenLuMT
Copy link
Member

reopen's reason: rerun failure checks

@StevenLuMT
Copy link
Member

image

https://github.com/apache/bookkeeper/actions/runs/14370207115/job/40291989272?pr=4557
Error: Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 121.4 s <<< FAILURE! -- in org.apache.bookkeeper.meta.TestLongZkLedgerIdGenerator
Error: org.apache.bookkeeper.meta.TestLongZkLedgerIdGenerator.testGenerateLedgerId -- Time elapsed: 121.4 s <<< FAILURE!
junit.framework.AssertionFailedError: Wait ledger id generation threads to stop timeout :
at junit.framework.Assert.fail(Assert.java:57)
at junit.framework.Assert.assertTrue(Assert.java:22)
at junit.framework.TestCase.assertTrue(TestCase.java:192)
at org.apache.bookkeeper.meta.TestLongZkLedgerIdGenerator.testGenerateLedgerId(TestLongZkLedgerIdGenerator.java:132)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at junit.framework.TestCase.runTest(TestCase.java:177)
at junit.framework.TestCase.runBare(TestCase.java:142)
at junit.framework.TestResult$1.protect(TestResult.java:122)
at junit.framework.TestResult.runProtected(TestResult.java:142)
at junit.framework.TestResult.run(TestResult.java:125)
at junit.framework.TestCase.run(TestCase.java:130)
at junit.framework.TestSuite.runTest(TestSuite.java:241)
at junit.framework.TestSuite.run(TestSuite.java:236)

@poorbarcode poorbarcode force-pushed the fix/add_entry_cb_order branch from 6be195f to 1632cab Compare May 13, 2025 14:52
@poorbarcode
Copy link
Contributor Author

Rebased master

Copy link
Member

@StevenLuMT StevenLuMT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good jobs

@StevenLuMT StevenLuMT merged commit e47926b into apache:master May 14, 2025
23 checks passed
Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

hangc0276 pushed a commit that referenced this pull request Jun 4, 2025
)

[fix] Write stuck due to pending add callback by multiple threads (#4557)

(cherry picked from commit e47926b)
hangc0276 pushed a commit that referenced this pull request Jun 4, 2025
)

[fix] Write stuck due to pending add callback by multiple threads (#4557)

(cherry picked from commit e47926b)
priyanshu-ctds pushed a commit to datastax/bookkeeper that referenced this pull request Jul 11, 2025
…ache#4557)

[fix] Write stuck due to pending add callback by multiple threads (apache#4557)

(cherry picked from commit e47926b)
(cherry picked from commit 143e5a8)
sandeep-ctds pushed a commit to datastax/bookkeeper that referenced this pull request Jul 22, 2025
…ache#4557)

[fix] Write stuck due to pending add callback by multiple threads (apache#4557)

(cherry picked from commit e47926b)
(cherry picked from commit 143e5a8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants