-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Closed
Labels
type/bugThe PR fixed a bug or issue reported a bugThe PR fixed a bug or issue reported a bug
Description
Problem
This exception is repeated on the log when using replicated subscriptions:
07:17:59.770 [bookkeeper-ml-workers-OrderedExecutor-4-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentReplicator - [persistent://georep/default/t1][cluster-a -> cluster-b] Unexpected exception: Field 'replicated_from' is not set
java.lang.IllegalStateException: Field 'replicated_from' is not set
at org.apache.pulsar.common.api.proto.MessageMetadata.getReplicatedFrom(MessageMetadata.java:151) ~[org.apache.pulsar-pulsar-common-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.pulsar.broker.service.persistent.PersistentReplicator.checkReplicatedSubscriptionMarker(PersistentReplicator.java:763) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.pulsar.broker.service.persistent.PersistentReplicator.readEntriesComplete(PersistentReplicator.java:366) ~[org.apache.pulsar-pulsar-broker-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.bookkeeper.mledger.impl.OpReadEntry.lambda$checkReadCompletion$2(OpReadEntry.java:156) ~[org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.bookkeeper.mledger.util.SafeRun$1.safeRun(SafeRun.java:32) [org.apache.pulsar-managed-ledger-2.8.0-SNAPSHOT.jar:2.8.0-SNAPSHOT]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0]
at org.apache.bookkeeper.common.util.OrderedExecutor$TimedRunnable.run(OrderedExecutor.java:203) [org.apache.bookkeeper-bookkeeper-common-4.13.0.jar:4.13.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.60.Final.jar:4.1.60.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
To reproduce
- Create 2-cluster k8s deployment (sample script/helm to do so)
- Create consumer with replicated subscription for topic georep/default/t1 in cluster-a and close it
- Create consumer with replicated subscription for topic georep/default/t1 in cluster-b and close it
- Create producer for topic georep/default/t1 in cluster-a and publish messages to the topic
- Create consumer with replicated subscription for topic georep/default/t1 in cluster-a, consume 1 message and close it
- Create consumer with replicated subscription for topic georep/default/t1 in cluster-b, consume 1 message and close it
It might be possible to reproduce the issue with fewer steps.
Observations
It seems that the code location broker after the switch to LightProto (#9046).
The fix is easy for the exception above. The concern is the lack of test coverage for replicated subscriptions.
Metadata
Metadata
Assignees
Labels
type/bugThe PR fixed a bug or issue reported a bugThe PR fixed a bug or issue reported a bug