Skip to content

Conversation

@Shawyeok
Copy link
Contributor

Fixes #23982

Motivation

The graceful broker shutdown endpoint (/admin/v2/brokers/shutdown) was experiencing a deadlock when called via the admin API. The issue occurred because:

  1. The HTTP request thread calls the shutdown endpoint
  2. This triggers PulsarService.closeAsync()WebService.close()
  3. The Jetty server attempts graceful shutdown by waiting for all active requests to complete
  4. However, the current thread processing the shutdown request is itself one of those active requests
  5. This creates a circular dependency where the thread waits for itself to complete, causing a deadlock

The stack trace shows the thread stuck in CountDownLatch.await() in FutureCallback.get(), waiting indefinitely for the shutdown to complete.

Click to show full stacktrace
"pulsar-web-35-7" #81 prio=5 os_prio=31 cpu=13.36ms elapsed=10.73s tid=0x0000000125b5e800 nid=0x15203 waiting on condition  [0x0000000552e92000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.10/Native Method)
	- parking to wait for  <0x000020000498b000> (a java.util.concurrent.CountDownLatch$Sync)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.10/LockSupport.java:252)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(java.base@17.0.10/AbstractQueuedSynchronizer.java:717)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@17.0.10/AbstractQueuedSynchronizer.java:1074)
	at java.util.concurrent.CountDownLatch.await(java.base@17.0.10/CountDownLatch.java:276)
	at org.eclipse.jetty.util.FutureCallback.get(FutureCallback.java:129)
	at org.eclipse.jetty.util.FutureCallback.get(FutureCallback.java:30)
	at org.eclipse.jetty.server.handler.AbstractHandlerContainer.doShutdown(AbstractHandlerContainer.java:175)
	at org.eclipse.jetty.server.Server.doStop(Server.java:447)
	at org.eclipse.jetty.util.component.AbstractLifeCycle.stop(AbstractLifeCycle.java:94)
	- locked <0x0000200015e610c0> (a java.lang.Object)
	at org.apache.pulsar.broker.web.WebService.close(WebService.java:426)
	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:529)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeVirtual(java.base@17.0.10/DirectMethodHandle$Holder)
	at java.lang.invoke.LambdaForm$MH/0x0000007800488000.invoke(java.base@17.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000007800488400.invoke(java.base@17.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000007800444c00.invokeExact_MT(java.base@17.0.10/LambdaForm$MH)
	at java.lang.invoke.MethodHandle.invokeWithArguments(java.base@17.0.10/MethodHandle.java:732)
	at org.mockito.internal.util.reflection.InstrumentationMemberAccessor$Dispatcher$ByteBuddy$6yR64096.invokeWithArguments(Unknown Source)
	at org.mockito.internal.util.reflection.InstrumentationMemberAccessor.invoke(InstrumentationMemberAccessor.java:265)
	at org.mockito.internal.util.reflection.ModuleMemberAccessor.invoke(ModuleMemberAccessor.java:55)
	at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.tryInvoke(MockMethodAdvice.java:316)
	at org.mockito.internal.creation.bytebuddy.MockMethodAdvice$RealMethodCall.invoke(MockMethodAdvice.java:236)
	at org.mockito.internal.invocation.InterceptedInvocation.callRealMethod(InterceptedInvocation.java:142)
	at org.mockito.internal.stubbing.answers.CallsRealMethods.answer(CallsRealMethods.java:45)
	at org.mockito.Answers.answer(Answers.java:90)
	at org.mockito.internal.handler.MockHandlerImpl.handle(MockHandlerImpl.java:111)
	at org.mockito.internal.handler.NullResultGuardian.handle(NullResultGuardian.java:29)
	at org.mockito.internal.handler.InvocationNotifierHandler.handle(InvocationNotifierHandler.java:34)
	at org.mockito.internal.creation.bytebuddy.access.MockMethodInterceptor.doIntercept(MockMethodInterceptor.java:84)
	at org.mockito.internal.creation.bytebuddy.MockMethodAdvice.handle(MockMethodAdvice.java:136)
	at org.apache.pulsar.broker.PulsarService.closeAsync(PulsarService.java:483)
	at org.apache.pulsar.broker.admin.impl.BrokersBase.doShutDownBrokerGracefullyAsync(BrokersBase.java:476)
	at org.apache.pulsar.broker.admin.impl.BrokersBase.lambda$shutDownBrokerGracefully$36(BrokersBase.java:461)
	at org.apache.pulsar.broker.admin.impl.BrokersBase$$Lambda$1528/0x0000007800b57790.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniComposeStage(java.base@17.0.10/CompletableFuture.java:1187)
	at java.util.concurrent.CompletableFuture.thenCompose(java.base@17.0.10/CompletableFuture.java:2309)
	at org.apache.pulsar.broker.admin.impl.BrokersBase.shutDownBrokerGracefully(BrokersBase.java:461)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.10/Native Method)
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.10/NativeMethodAccessorImpl.java:77)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.10/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@17.0.10/Method.java:568)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$860/0x00000078007c52e0.invoke(Unknown Source)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:359)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:312)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
	at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
	at org.apache.pulsar.broker.web.WebService$FilterInitializer$WaitUntilPulsarServiceIsReadyForIncomingRequestsFilter.doFilter(WebService.java:336)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.eclipse.jetty.servlets.QoSFilter.doFilter(QoSFilter.java:202)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:722)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
	at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
	at org.eclipse.jetty.server.HttpChannel$$Lambda$1067/0x0000007800a6aa58.dispatch(Unknown Source)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.10/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.10/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.10/Thread.java:842)

Modifications

  1. Modified PulsarService.closeAsync():

    • Added overloaded method closeAsync(boolean waitForStop)
    • Original closeAsync() now calls closeAsync(true) for backward compatibility
    • New parameter allows callers to specify whether to wait for web service shutdown
  2. Fixed BrokersBase.shutDownBrokerGracefully():

    • Changed to call pulsar().closeAsync(false) instead of pulsar().closeAsync()
    • This prevents the HTTP request thread from waiting for itself to complete

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • testShutdownViaAdminApi() in PulsarServiceTest.java

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 11, 2025
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! just a few minor comments about parameter naming

@lhotari lhotari changed the title Fix cannot shutdown broker gracefully by admin api [fix][broker] Fix cannot shutdown broker gracefully by admin api Sep 11, 2025
@Shawyeok Shawyeok requested a review from lhotari September 12, 2025 02:30
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 82.14286% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.18%. Comparing base (415c6fa) to head (0c66974).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
.../java/org/apache/pulsar/broker/web/WebService.java 80.00% 4 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24731      +/-   ##
============================================
- Coverage     74.27%   74.18%   -0.09%     
- Complexity    33569    33595      +26     
============================================
  Files          1896     1900       +4     
  Lines        148111   148377     +266     
  Branches      17164    17204      +40     
============================================
+ Hits         110009   110079      +70     
- Misses        29370    29500     +130     
- Partials       8732     8798      +66     
Flag Coverage Δ
inttests 26.34% <0.00%> (-0.16%) ⬇️
systests 22.75% <0.00%> (+0.08%) ⬆️
unittests 73.70% <82.14%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...n/java/org/apache/pulsar/broker/PulsarService.java 83.70% <100.00%> (-1.21%) ⬇️
...g/apache/pulsar/broker/admin/impl/BrokersBase.java 88.03% <100.00%> (-0.48%) ⬇️
.../java/org/apache/pulsar/broker/web/WebService.java 88.93% <80.00%> (-1.84%) ⬇️

... and 98 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhotari lhotari added this to the 4.2.0 milestone Sep 12, 2025
@lhotari lhotari added the type/bug The PR fixed a bug or issue reported a bug label Sep 12, 2025
@lhotari lhotari merged commit 4169395 into apache:master Sep 12, 2025
53 checks passed
lhotari pushed a commit that referenced this pull request Sep 12, 2025
lhotari pushed a commit that referenced this pull request Sep 12, 2025
lhotari pushed a commit that referenced this pull request Sep 16, 2025
lhotari pushed a commit that referenced this pull request Sep 16, 2025
nodece pushed a commit to ascentstream/pulsar that referenced this pull request Sep 16, 2025
ganesh-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 18, 2025
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 19, 2025
@lhotari
Copy link
Member

lhotari commented Sep 19, 2025

@Shawyeok An existing test BrokerServiceTest.testShutDownWithMaxConcurrentUnload became flaky after this change. Would you mind addressing #24765?

srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Sep 19, 2025
@Shawyeok
Copy link
Contributor Author

@Shawyeok An existing test BrokerServiceTest.testShutDownWithMaxConcurrentUnload became flaky after this change. Would you mind addressing #24765?

Sure, I’ll take a look at it shortly.

Shawyeok added a commit to Shawyeok/pulsar that referenced this pull request Sep 20, 2025
…ntUnload

Revise `BrokerServiceTest.testShutDownWithMaxConcurrentUnload`, since apache#24731 fixed the issue where the broker couldn’t be shut down via Pulsar Admin. The original test contains a race condition in its assertion.
```java
admin.brokers().shutDownBrokerGracefully(1, false);
Awaitility.await().atLeast(bundleNum - 1, TimeUnit.SECONDS).untilAsserted(() -> {
    assertEquals(pulsar.getBrokerService().getTopics().size(), 0);  // race condition: pulsar.getBrokerService() is possibly be null because the broker is shutting down
});
```
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Broker cannot always shutdown gracefully

3 participants