Skip to content

Conversation

@dao-jun
Copy link
Member

@dao-jun dao-jun commented Jul 16, 2025

Motivation

In #23683 and #23817, we handled the failure of ML PayloadProcessor.

But it maybe lead to message lost or out-of-order.

You can reproduce the issue via PublishWithMLPayloadProcessorTest:

  1. Publish message-0 ~ message-9 to the topic and consume.
  2. When enable message deduplication, we can consume the following messages:
Received message: message-0
Received message: message-1
Received message: message-2
Received message: message-4
Received message: message-5
Received message: message-6
Received message: message-7
Received message: message-8
Received message: message-9

The message-3 is lost
3. When disable message deduplication, we can consume the following messages:

Received message: message-0
Received message: message-1
Received message: message-2
Received message: message-4
Received message: message-5
Received message: message-6
Received message: message-7
Received message: message-8
Received message: message-9
Received message: message-0
Received message: message-1
Received message: message-2
Received message: message-3
Received message: message-4
Received message: message-5
Received message: message-6
Received message: message-7
Received message: message-8
Received message: message-9

** The message-3 is after message-4, message is out-of-order.

The key-point is we didn't fail all the incoming messages until Topic un-fenced.

Modifications

  1. Fail all the incoming write request until topic un-fenced.
  2. Add tests

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 16, 2025
@dao-jun dao-jun self-assigned this Jul 16, 2025
@dao-jun dao-jun added type/bug The PR fixed a bug or issue reported a bug release/blocker Indicate the PR or issue that should block the release until it gets resolved area/ML ready-to-test release/4.0.6 release/3.3.8 release/3.0.13 labels Jul 16, 2025
@dao-jun dao-jun closed this Jul 16, 2025
@dao-jun dao-jun reopened this Jul 16, 2025
@BewareMyPower BewareMyPower changed the title [ml][fix] Fix the possibility of message loss or disorder when ML PayloadProcessor processing fails [fix][ml] Fix the possibility of message loss or disorder when ML PayloadProcessor processing fails Jul 16, 2025
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should take care of the test time when adding tests.

@dao-jun dao-jun requested a review from BewareMyPower July 17, 2025 05:09
Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improve the docs via Gemini

@dao-jun dao-jun requested a review from lhotari July 22, 2025 14:08
@dao-jun dao-jun requested a review from codelipenghui July 24, 2025 02:38
@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2025

Codecov Report

❌ Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 74.28%. Comparing base (bbc6224) to head (553aebb).
⚠️ Report is 1253 commits behind head on master.

Files with missing lines Patch % Lines
...a/org/apache/bookkeeper/mledger/ManagedLedger.java 0.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24522      +/-   ##
============================================
+ Coverage     73.57%   74.28%   +0.71%     
- Complexity    32624    32958     +334     
============================================
  Files          1877     1874       -3     
  Lines        139502   146244    +6742     
  Branches      15299    16773    +1474     
============================================
+ Hits         102638   108640    +6002     
- Misses        28908    28968      +60     
- Partials       7956     8636     +680     
Flag Coverage Δ
inttests 26.95% <18.75%> (+2.37%) ⬆️
systests 23.26% <18.75%> (-1.07%) ⬇️
unittests 73.78% <93.75%> (+0.94%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...che/bookkeeper/mledger/impl/ManagedLedgerImpl.java 81.34% <100.00%> (+0.68%) ⬆️
...org/apache/bookkeeper/mledger/impl/OpAddEntry.java 76.92% <100.00%> (+4.10%) ⬆️
...sar/broker/service/persistent/PersistentTopic.java 80.23% <100.00%> (+1.78%) ⬆️
...a/org/apache/bookkeeper/mledger/ManagedLedger.java 0.00% <0.00%> (ø)

... and 1108 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dao-jun dao-jun merged commit a14794a into apache:master Jul 24, 2025
96 of 98 checks passed
@dao-jun dao-jun deleted the fix/payload_processor_failure branch July 24, 2025 10:17
@dao-jun dao-jun removed the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Jul 24, 2025
lhotari pushed a commit that referenced this pull request Jul 24, 2025
…loadProcessor processing fails (#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
lhotari pushed a commit that referenced this pull request Jul 24, 2025
…loadProcessor processing fails (#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
lhotari pushed a commit that referenced this pull request Jul 24, 2025
…loadProcessor processing fails (#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Jul 28, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Jul 28, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 28, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
(cherry picked from commit 4f156b5)
priyanshu-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 28, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
(cherry picked from commit a378bdc)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 29, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
(cherry picked from commit a378bdc)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 30, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
(cherry picked from commit a378bdc)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 31, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
(cherry picked from commit a14794a)
(cherry picked from commit 4f156b5)
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
…loadProcessor processing fails (apache#24522)

Co-authored-by: Yunze Xu <xyzinfernity@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants