-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Geo Replication lost messages or frequently fails due to Deduplication is not appropriate for Geo-Replication #23697
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...roker/src/main/java/org/apache/pulsar/broker/service/persistent/GeoPersistentReplicator.java
Outdated
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/GeoReplicationProducerImpl.java
Outdated
Show resolved
Hide resolved
b9ae34a to
0d2e235
Compare
|
/pulsarbot rerun-failure-checks |
5a2f2c7 to
cc44dc3
Compare
|
@poorbarcode It's a good idea to just use the ledger ID and entry ID for the message deduplication. In this case, we can also remove the deduplication state after the ledger get fully replicated. For example:
|
|
@poorbarcode @gaoran10 @Technoboy- @codelipenghui Why could this PR be cherry-picked to branch-3.0 and branch-4.0? It changes the |
|
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 26a211c)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
…ls due to Deduplication is not appropriate for Geo-Replication (apache#23697)" This reverts commit 2607a49.
…ntly fails due to Deduplication is not appropriate for Geo-Replication (apache#23697)"" This reverts commit a8a1c4a.
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 26a211c)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
…o Deduplication is not appropriate for Geo-Replication (apache#23697) (cherry picked from commit 4ac4f3c) (cherry picked from commit 307b5c9)
…ls due to Deduplication is not appropriate for Geo-Replication (apache#23697)" This reverts commit d794dd4.
…ls due to Deduplication is not appropriate for Geo-Replication (apache#23697)" This reverts commit 988a099.
That's reasonable @poorbarcode, but we'd better start a discussion in the community to get aligned since the fix has changed the core logic and protocol for geo-replication. |
Motivation
Background
How does deduplication work?
{pendingMessages}-1:-1if the sequence ID published is lower than the previous messages.{pendingMessages}is larger than the one that was rejected.{next} > {rejected}: ignore the error, and continue work{next} < {rejected}: close channels and reconnect.Conditions that issue happened
{pendingMessages}withmessage.sequenceIdbut ignoresmessage.original-producer-name, which may cause the sequence-ids in{pendingMessages}is not increasing-1:-1send response will fail.Issue-1: loss messages
seq: 0), M2(seq: 1)seq: 0), M4(seq: 1){pendingMessages}:[0,1]{pendingMessages}:[0,1,0,1]seq 0, position 0:0seq 1, position 0:1seq 0, position -1: -1seq 1, position -1:-1{pendingMessages}:[empty]0now).[M1, M2, M1, M2][M1, M2]You can reproduce the issue by the test
testDeduplicationNotLostMessageIssue-2: frequently fails
3:0with sequence-id103:1with sequence-id13:2with sequence-id2-0 Replicator copies messages
{pendingMessages}:[10,1, 2]3:0successfully{pendingMessages}:[1,2]3:0(a duplicated publishing)-1:-1(new position relates to the latest publishing) for the latest send-response.failed-sequenced:10 > pendingMessages[0].sequenceId: 1No test for reproducing this issue.
Modifications
Solution: replicators use a specified sequence ID(
ledegrId:entryIdof the original topic) instead of using the original producers’3:0with sequence-id103:1with sequence-id13:2with sequence-id23:0){pendingMessages}:3:0, 3:1, 3:2]3:0successfully{pendingMessages}:[3:1, 3:2]3:0(a duplicated publishing)-1:-1(new position relates to the latest publishing) for the latest send-response.failed-sequenced(3:0) < pendingMessages[0].sequenceId(3:2)Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: x