Skip to content

BadVersionException , unable to consume topic, failing to reconnect #14779

@cerebrotecnologico

Description

@cerebrotecnologico

Describe the bug
I am using Pulsar 2.8.0.
The topic is partitioned, 3 partitions.
The backlog quota is set to 50h
The TTL is set to 48h.

This is the second time that we observe this error. The application reads from 5 different topics, but only one topic has been affected twice.

This is the topic with the highest traffic processed by the same application. (Yet, our app is not really data intensive when compared to other use cases, it process less than 1 million events per hour)

My application suddenly stopped processing messages, tries to reconnect but keeps getting errors about ZK BadVersion.

[persistent://tenant/namespace/mytopic-partition-2] [platform_persistent://tenant/namespace/mytopic] Could not get connection to broker: java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /managed-ledgers/tenant/namespace/persistent/my-topic-partition-2/platform_persistent%3A%2F%2Ftenant%2Fnamespace%2Fmytopic -- Will try again in 0.1 s

We unloaded the topic and the application restarted processing messages. (The application did not need to be restarted).

Any idea what causes this? How to prevent it.

To Reproduce
Unknown.

Expected behavior
I expected that the topic would be unloaded automatically, causing the clients to be able to reconnect to a new broker.

** Logs
Broker:
09:13:45.822 [pulsar-io-4-2] WARN org.apache.pulsar.broker.service.BrokerService - Namespace is being unloaded, cannot add topic persistent://platform/system/workflow-event-even-partition-0
09:13:45.823 [BookKeeperClientWorker-OrderedExecutor-1-0] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://platform/system/workflow-event-even-partition-0] Attempting to add producer to a fenced topic
09:13:45.823 [BookKeeperClientWorker-OrderedExecutor-1-0] ERROR org.apache.pulsar.broker.service.ServerCnx - [/10.42.105.16:49756] Failed to add producer to topic persistent://platform/system/workflow-event-even-partition-0: producerId=289578, org.apache.pulsar.broker.service.BrokerServiceException$TopicFencedException: Topic is temporarily unavailable
09:13:45.844 [ForkJoinPool.commonPool-worker-3] WARN org.apache.pulsar.broker.service.BrokerService - Namespace bundle for topic (persistent://platform/system/workflow-event-even-partition-0) not served by this instance. Please redo the lookup. Request is denied: namespace=platform/system
09:13:45.854 [ForkJoinPool.commonPool-worker-3] WARN org.apache.pulsar.broker.service.BrokerService - Namespace bundle for topic (persistent://platform/system/workflow-event-even-partition-0) not served by this instance. Please redo the lookup. Request is denied: namespace=platform/system
09:13:45.877 [pulsar-2-2] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker pulsar-broker-2.pulsar-broker.pulsar.svc.cluster.local:8080 is overloaded: CPU: 96.51538%, MEMORY: 51.04191%, DIRECT MEMORY: 12.5%, BANDWIDTH IN: 2.6368966%, BANDWIDTH OUT: 0.3707455%
09:13:45.877 [pulsar-2-2] WARN org.apache.pulsar.broker.loadbalance.impl.LeastLongTermMessageRate - Broker http://pulsar-broker-2.pulsar-broker.pulsar.svc.cluster.local:8080 is overloaded: max usage=0.9651538133621216

09:28:46.277 [bookkeeper-ml-scheduler-OrderedScheduler-1-0] WARN org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [platform/system/persistent/workflow-event-even-partition-2] Failed to update consumer platform_persistent%3A%2F%2Fplatform%2Fsystem%2Fworkflow-event-even_queue
at org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl$7.operationFailed(ManagedLedgerImpl.java:940) ~[io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]
at org.apache.bookkeeper.util.SafeRunnable$1.safeRun(SafeRunnable.java:43) [org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]
09:28:46.281 [bookkeeper-ml-scheduler-OrderedScheduler-1-0] ERROR org.apache.pulsar.broker.service.persistent.PersistentTopic - [persistent://platform/system/workflow-event-even-partition-2] Failed to create subscription: platform_persistent://platform/system/workflow-event-even_queue
java.util.concurrent.CompletionException: org.apache.pulsar.broker.service.BrokerServiceException$PersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException$BadVersionException: org.apache.pulsar.metadata.api.MetadataStoreException$BadVersionException: org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /managed-ledgers/platform/system/persistent/workflow-event-even-partition-2/platform_persistent%3A%2F%2Fplatform%2Fsystem%2Fworkflow-event-even_queue
at java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:704) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088) ~[?:?]
at org.apache.pulsar.broker.service.persistent.PersistentTopic$3.openCursorFailed(PersistentTopic.java:887) ~[io.streamnative-pulsar-broker-2.8.2.0.jar:2.8.2.0]
at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$2.operationFailed(ManagedCursorImpl.java:567) ~[io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]
at org.apache.bookkeeper.mledger.impl.ManagedCursorImpl$23.operationFailed(ManagedCursorImpl.java:2348) ~[io.streamnative-managed-ledger-2.8.2.0.jar:2.8.2.0]

At the time this occurred, I saw long GC in a ZK node:
image

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions