Skip to content

[Bug] Deadlock in PersistentSubscription.close / NamespaceService.unloadNamespaceBundle / PulsarService.closeAsync countered in tests #23952

@lhotari

Description

@lhotari

Search before asking

  • I searched in the issues and found nothing similar.

Read release policy

  • I understand that unsupported versions don't get bug fixes. I will attempt to reproduce the issue on a supported version of Pulsar client and Pulsar broker.

Version

master branch

Minimal reproduce step

No steps to reproduce, faced in tests

Found one Java-level deadlock:
=============================
"main":
  waiting to lock monitor 0x00007fd520fe6f10 (object 0x000010003425f350, a org.apache.pulsar.broker.service.persistent.PersistentSubscrip
tion),
  which is held by "PulsarTestContext-executor-OrderedExecutor-0-0"

"PulsarTestContext-executor-OrderedExecutor-0-0":
  waiting to lock monitor 0x00007fd4f400dd00 (object 0x000010003425fd70, a org.apache.pulsar.broker.service.persistent.PersistentDispatch
erSingleActiveConsumer),
  which is held by "broker-topic-workers-OrderedExecutor-0-0"

"broker-topic-workers-OrderedExecutor-0-0":
  waiting to lock monitor 0x00007fd7a406f4a0 (object 0x000010003427f678, a org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead),
  which is held by "PulsarTestContext-executor-OrderedExecutor-0-0"

Java stack information for the threads listed above:
===================================================
"main":
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription.close(PersistentSubscription.java)
	- waiting to lock <0x000010003425f350> (a org.apache.pulsar.broker.service.persistent.PersistentSubscription)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$close$56(PersistentTopic.java:1697)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda/0x00007fd54cb397a8.accept(Unknown Source)
	at java.util.concurrent.ConcurrentHashMap.forEach(java.base@21.0.6/ConcurrentHashMap.java:1603)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$close$57(PersistentTopic.java:1697)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda/0x00007fd54cb39358.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniAcceptNow(java.base@21.0.6/CompletableFuture.java:757)
	at java.util.concurrent.CompletableFuture.uniAcceptStage(java.base@21.0.6/CompletableFuture.java:735)
	at java.util.concurrent.CompletableFuture.thenAccept(java.base@21.0.6/CompletableFuture.java:2214)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.close(PersistentTopic.java:1688)
	at org.apache.pulsar.broker.service.BrokerService.lambda$unloadServiceUnit$116(BrokerService.java:2371)
	at org.apache.pulsar.broker.service.BrokerService$$Lambda/0x00007fd54cb76aa8.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniComposeStage(java.base@21.0.6/CompletableFuture.java:1187)
	at java.util.concurrent.CompletableFuture.thenCompose(java.base@21.0.6/CompletableFuture.java:2341)
	at org.apache.pulsar.broker.service.BrokerService.lambda$unloadServiceUnit$118(BrokerService.java:2371)
	at org.apache.pulsar.broker.service.BrokerService$$Lambda/0x00007fd54cb6f800.accept(Unknown Source)
	at java.util.concurrent.ConcurrentHashMap.forEach(java.base@21.0.6/ConcurrentHashMap.java:1603)
	at org.apache.pulsar.broker.service.BrokerService.unloadServiceUnit(BrokerService.java:2341)
	at org.apache.pulsar.broker.service.BrokerService.unloadServiceUnit(BrokerService.java:2314)
	at org.apache.pulsar.broker.namespace.OwnedBundle.lambda$handleUnloadRequest$0(OwnedBundle.java:138)
	at org.apache.pulsar.broker.namespace.OwnedBundle$$Lambda/0x00007fd54cb6f5c8.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniComposeStage(java.base@21.0.6/CompletableFuture.java:1187)
	at java.util.concurrent.CompletableFuture.thenCompose(java.base@21.0.6/CompletableFuture.java:2341)
	at org.apache.pulsar.broker.namespace.OwnedBundle.handleUnloadRequest(OwnedBundle.java:138)
	at org.apache.pulsar.broker.namespace.NamespaceService.unloadNamespaceBundle(NamespaceService.java:848)
...
	at org.apache.pulsar.broker.namespace.NamespaceService.unloadNamespaceBundle(NamespaceService.java:839)
...
	at org.apache.pulsar.broker.namespace.NamespaceService.unloadNamespaceBundle(NamespaceService.java:830)
	at org.apache.pulsar.broker.service.BrokerService.lambda$unloadNamespaceBundlesGracefully$30(BrokerService.java:999)
	at org.apache.pulsar.broker.service.BrokerService$$Lambda/0x00007fd54cb6ef58.accept(Unknown Source)
	at java.lang.Iterable.forEach(java.base@21.0.6/Iterable.java:75)
	at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:992)
	at org.apache.pulsar.broker.service.BrokerService.unloadNamespaceBundlesGracefully(BrokerService.java:962)
	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:525)
...
	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:509)
	at org.apache.pulsar.broker.PulsarService.close(PulsarService.java:484)
...


"PulsarTestContext-executor-OrderedExecutor-0-0":
	at org.apache.pulsar.broker.service.AbstractDispatcherSingleActiveConsumer.disconnectActiveConsumers(AbstractDispatcherSingleActiveConsumer.java)
	- waiting to lock <0x000010003425fd70> (a org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer)
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription.resetCursor(PersistentSubscription.java:856)
	- locked <0x000010003425f350> (a org.apache.pulsar.broker.service.persistent.PersistentSubscription)
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription$6.findEntryComplete(PersistentSubscription.java:824)
	at org.apache.pulsar.broker.service.persistent.PersistentMessageFinder.findEntryComplete(PersistentMessageFinder.java:162)
	at org.apache.bookkeeper.mledger.impl.OpFindNewest.readEntryComplete(OpFindNewest.java:133)
	at org.apache.bookkeeper.mledger.impl.cache.RangeEntryCacheImpl$1.readEntriesComplete(RangeEntryCacheImpl.java:241)
	at org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead.readEntriesComplete(PendingReadsManager.java:253)
	- locked <0x000010003427f678> (a org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead)
	at org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead.lambda$attach$0(PendingReadsManager.java:232)
	at org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead$$Lambda/0x00007fd54cb0fc60.run(Unknown Source)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.safeRunTask(SingleThreadExecutor.java:137)
	at org.apache.bookkeeper.common.util.SingleThreadExecutor.run(SingleThreadExecutor.java:107)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.runWith(java.base@21.0.6/Thread.java:1596)
	at java.lang.Thread.run(java.base@21.0.6/Thread.java:1583)
"broker-topic-workers-OrderedExecutor-0-0":
	at org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead.addListener(PendingReadsManager.java)
	- waiting to lock <0x000010003427f678> (a org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager$PendingRead)
	at org.apache.bookkeeper.mledger.impl.cache.PendingReadsManager.readEntries(PendingReadsManager.java:430)
...
	at org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer.readMoreEntries(PersistentDispatcherSingleActiveConsumer.java:387)
	- locked <0x000010003425fd70> (a org.apache.pulsar.broker.service.persistent.PersistentDispatcherSingleActiveConsumer)
...

full dead lock details: https://gist.github.com/lhotari/135bb1a5a045d00c19cf374fca1ff8f7#file-threaddump15633_2025-02-08_00-txt-L1327-L1542
full thread dump: https://gist.github.com/lhotari/135bb1a5a045d00c19cf374fca1ff8f7
analysis: https://jstack.review/?https://gist.github.com/lhotari/135bb1a5a045d00c19cf374fca1ff8f7

What did you expect to see?

no deadlocks

What did you see instead?

there was a deadlock in closing PulsarService

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugThe PR fixed a bug or issue reported a bug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions