Skip to content

Conversation

@BewareMyPower
Copy link
Contributor

@BewareMyPower BewareMyPower commented Jul 15, 2025

Main Issue: #24497

Motivation

When the OpReadEntry#readEntriesComplete is triggered by RangeEntryCacheImpl#readFromStorage with an empty list of entries, NPE will happen here:

? cursor.getNextAvailablePosition(lastPosition) : lastPosition.getNext();

Then updateReadPosition and checkReadCompletion will be skipped, where the result is very similar to #24497 (comment)

  • The callback of asyncReadEntries will never complete
  • The entriesReadCount will be updated but the readPosition won't
  • pendingReadOps will not be reduced to 0

Unfortunately, from the existing analysis, the issue of 24497 might not be caused by this bug, but this NPE is still possible.

Modifications

Remove the confusing lastPosition argument in internalReadEntriesComplete. It's only set when all entries have been filtered out and it needs to reset the readPosition in ManagedLedgerImpl#internalReadFromLedger. It can be replaced by a simple combination of updateReadPosition and checkReadCompletion.

In internalReadEntriesComplete, just call checkReadCompletion when no entries are read to trigger another round of read.

In addition, catch exceptions when triggering user-provided callback and print logs with better information. Without this change, the exception will be caught by SingleThreadTask#safeRunTask and the log will be Error while running task: that does not have description for a specific cursor or read position.

Verifying this change

Make EntryCache configurable in tests add add testReadNoEntries to verify that the asyncReadEntries will eventually succeed even if one read operation returns an empty entry list.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

Copy link
Member

@dao-jun dao-jun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BewareMyPower
Copy link
Contributor Author

testScan failed after this change, mark it as drafted first

@BewareMyPower BewareMyPower marked this pull request as draft July 16, 2025 02:24
@BewareMyPower BewareMyPower added type/bug The PR fixed a bug or issue reported a bug area/ML labels Jul 16, 2025
@BewareMyPower BewareMyPower marked this pull request as ready for review July 16, 2025 02:58
@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2025

Codecov Report

❌ Patch coverage is 69.23077% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.31%. Comparing base (bbc6224) to head (5515e1a).
⚠️ Report is 1266 commits behind head on master.

Files with missing lines Patch % Lines
...rg/apache/bookkeeper/mledger/impl/OpReadEntry.java 62.26% 15 Missing and 5 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24515      +/-   ##
============================================
+ Coverage     73.57%   74.31%   +0.74%     
+ Complexity    32624    32552      -72     
============================================
  Files          1877     1869       -8     
  Lines        139502   146089    +6587     
  Branches      15299    16760    +1461     
============================================
+ Hits         102638   108569    +5931     
+ Misses        28908    28900       -8     
- Partials       7956     8620     +664     
Flag Coverage Δ
inttests 26.69% <23.07%> (+2.11%) ⬆️
systests 23.28% <35.38%> (-1.04%) ⬇️
unittests 73.81% <69.23%> (+0.97%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...kkeeper/mledger/impl/ManagedLedgerFactoryImpl.java 59.50% <100.00%> (-21.89%) ⬇️
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.47% <100.00%> (+0.80%) ⬆️
...keeper/mledger/impl/cache/RangeEntryCacheImpl.java 63.86% <ø> (+5.11%) ⬆️
...rg/apache/bookkeeper/mledger/impl/OpReadEntry.java 74.81% <62.26%> (-9.50%) ⬇️

... and 1097 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@BewareMyPower
Copy link
Contributor Author

@lhotari Could you take a look again?

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to submit these minor comments earlier.

@BewareMyPower BewareMyPower merged commit eea7c13 into apache:master Jul 19, 2025
185 of 192 checks passed
@BewareMyPower BewareMyPower deleted the bewaremypower/ml-read-stuck branch July 19, 2025 04:26
lhotari pushed a commit that referenced this pull request Jul 22, 2025
lhotari pushed a commit that referenced this pull request Jul 23, 2025
lhotari pushed a commit that referenced this pull request Jul 23, 2025
@lhotari
Copy link
Member

lhotari commented Jul 23, 2025

@BewareMyPower The "EntryCacheCreator" solution added in this PR is obsolete. There's already a way to intercept reads in Pulsar tests with PulsarMockReadHandleInterceptor.

example usage:

bkc.setReadHandleInterceptor(new PulsarMockReadHandleInterceptor() {
@Override
public CompletableFuture<LedgerEntries> interceptReadAsync(long ledgerId, long firstEntry, long lastEntry,
LedgerEntries entries) {
return CompletableFuture.supplyAsync(() -> {
return entries;
}, CompletableFuture.delayedExecutor(3, TimeUnit.SECONDS));
}
});

Since the changes to add the "EntryCacheCreator" collide with #24363 changes, I'd suggest that we remove the "EntryCacheCreator" and update the tests added in this PR to use PulsarMockReadHandleInterceptor. I can do a PR to handle that.

@lhotari
Copy link
Member

lhotari commented Jul 23, 2025

the replacement using ReadHandleInterceptor is

    bkc.setReadHandleInterceptor(new PulsarMockReadHandleInterceptor() {
            @Override
            public CompletableFuture<LedgerEntries> interceptReadAsync(long ledgerId, long firstEntry, long lastEntry,
                                                                       LedgerEntries entries) {
                if (firstRead.compareAndSet(true, false)) {
                    // LedgerEntriesImpl doesn't allow empty entries list.
                    // Implementing a dummy LedgerEntries that returns an empty list.
                    return CompletableFuture.completedFuture(new LedgerEntries() {
                        @Override
                        public org.apache.bookkeeper.client.api.LedgerEntry getEntry(long entryId) {
                            return null;
                        }

                        @Override
                        public Iterator<org.apache.bookkeeper.client.api.LedgerEntry> iterator() {
                            return EmptyIterator.INSTANCE;
                        }

                        @Override
                        public void close() {

                        }
                    });
                } else {
                    return CompletableFuture.completedFuture(entries);
                }
            }
        });

@BewareMyPower
Copy link
Contributor Author

Let me try this approach tomorrow

@lhotari
Copy link
Member

lhotari commented Jul 23, 2025

Let me try this approach tomorrow

I already submitted the PR in #24552 since I'll use that to be able to rebase and test #24363.

priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
…are read from BK (apache#24515)

(cherry picked from commit eea7c13)
(cherry picked from commit c887259)
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
…are read from BK (apache#24515)

(cherry picked from commit eea7c13)
(cherry picked from commit c887259)
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
…are read from BK (apache#24515)

(cherry picked from commit eea7c13)
(cherry picked from commit f1974aa)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
…are read from BK (apache#24515)

(cherry picked from commit eea7c13)
(cherry picked from commit c887259)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 24, 2025
…are read from BK (apache#24515)

(cherry picked from commit eea7c13)
(cherry picked from commit f1974aa)
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Jul 28, 2025
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Jul 28, 2025
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants