Skip to content

ensemble change may cause pendingAddOps of ledgerHandle to be unable to be resent, and the pulsar partition may become unavailable. #4459

@keyboardbobo

Description

@keyboardbobo

BUG REPORT

Describe the bug

When restarting bookie or in a high-traffic back pressure scenario, ensemble change will occur. If the value of newEnsemble is exactly the same as origEnsemble, replaced = EnsembleUtils.diffEnsemble(origEnsemble, newEnsemble) returns an empty HashSet, calling the unsetSuccessAndSendWriteRequest method will not be able to resend the request, resulting in all ledger requests being blocked.

To Reproduce

Steps to reproduce the behavior:

  1. restart bookie
  2. multiple ensemble changes occurred (I don't know why the bookie lists of the two ensemble changes are exactly the same):

2024-06-26 16:22:52.0453 [BookKeeperClientWorker-OrderedExecutor-28-0] INFO org.apache.bookkeeper.client.LedgerHandle - New Ensemble: [10.199.102.18:3181, 10.200.48.84:3181] for ledger: 320092
2024-06-26 16:22:53.0542 [BookKeeperClientWorker-OrderedExecutor-28-0] INFO org.apache.bookkeeper.client.LedgerHandle - New Ensemble: [10.199.102.18:3181, 10.200.48.84:3181] for ledger: 320092

Expected behavior

Partition messages can be sent normally

Screenshots
IMG_6474
IMG_6475

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions