Skip to content

Conversation

@poorbarcode
Copy link
Contributor

Motivation

Issue

  • Brokers acquire namespace-bundle locks with the data OwnershipCache.selfOwnerInfo
  • The initialization of the OwnershipCache.selfOwnerInfo is in the method PulsarService.start() as bellow
    • call startNamespaceService(): this line created the object OwnershipCache, but the variable OwnershipCache.selfOwnerInfo will be set an empty value due to PulsarService.brokerServiceUrl has not been initialized yet.
    • Initialize PulsarService.brokerServiceUrl.
    • start leader election service and load manager, at this moment the variable OwnershipCache.selfOwnerInfo is still empty, so the client may get a null response
    • call nsService.initialize(): reset the variable of OwnershipCache.selfOwnerInfo.

Error logs

2024-11-26T12:47:27,323+0000 [pulsar-io-6-2] WARN  org.apache.pulsar.client.impl.BinaryProtoLookupService - [persistent://{tenant}/{ns}/__change_events-partition-0] invalid url null : Cannot invoke "String.length()" because "this.input" is null
java.lang.NullPointerException: Cannot invoke "String.length()" because "this.input" is null
	at java.base/java.net.URI$Parser.parse(Unknown Source) ~[?:?]
	at java.base/java.net.URI.<init>(Unknown Source) ~[?:?]
	at org.apache.pulsar.client.impl.BinaryProtoLookupService.lambda$findBroker$7(BinaryProtoLookupService.java:198) ~[io.streamnative-pulsar-client-original-3.3.1.8.jar:3.3.1.8]
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
	at java.base/java.util.concurrent.CompletableFuture.complete(Unknown Source) ~[?:?]
	at org.apache.pulsar.client.impl.ClientCnx.handleLookupResponse(ClientCnx.java:649) ~[io.streamnative-pulsar-client-original-3.3.1.8.jar:3.3.1.8]
	at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:154) ~[io.streamnative-pulsar-common-3.3.1.8.jar:3.3.1.8]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318) ~[io.netty-netty-codec-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.handler.flush.FlushConsolidationHandler.channelRead(FlushConsolidationHandler.java:152) ~[io.netty-netty-handler-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868) ~[io.netty-netty-transport-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799) ~[io.netty-netty-transport-classes-epoll-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:501) ~[io.netty-netty-transport-classes-epoll-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:399) ~[io.netty-netty-transport-classes-epoll-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[io.netty-netty-common-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[io.netty-netty-common-4.1.113.Final.jar:4.1.113.Final]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.113.Final.jar:4.1.113.Final]
	at java.base/java.lang.Thread.run(Unknown Source) [?:?]

Heap dump

OwnershipCache.selfOwnerInfo will be reset, but OwnershipCache.selfOwnerInfoDisabled never be set again.

Screenshot 2024-11-26 at 20 58 48

Modifications

  • reset the variable of OwnershipCache.selfOwnerInfo before starting the Load Manager.
  • add a warning log if a system topic lookup will get an empty broker URL.
  • Since it is hard to write a test, I skip writing a test

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@codecov-commenter
Copy link

codecov-commenter commented Nov 26, 2024

Codecov Report

Attention: Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.

Project coverage is 74.32%. Comparing base (bbc6224) to head (c9120b1).
Report is 757 commits behind head on master.

Files with missing lines Patch % Lines
...g/apache/pulsar/broker/lookup/TopicLookupBase.java 55.55% 2 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23642      +/-   ##
============================================
+ Coverage     73.57%   74.32%   +0.75%     
- Complexity    32624    34986    +2362     
============================================
  Files          1877     1944      +67     
  Lines        139502   147236    +7734     
  Branches      15299    16248     +949     
============================================
+ Hits         102638   109439    +6801     
- Misses        28908    29340     +432     
- Partials       7956     8457     +501     
Flag Coverage Δ
inttests 27.42% <69.23%> (+2.84%) ⬆️
systests 24.34% <69.23%> (+0.02%) ⬆️
unittests 73.70% <69.23%> (+0.85%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...n/java/org/apache/pulsar/broker/PulsarService.java 83.62% <100.00%> (+1.25%) ⬆️
...apache/pulsar/broker/namespace/OwnershipCache.java 80.18% <100.00%> (-5.08%) ⬇️
...g/apache/pulsar/broker/lookup/TopicLookupBase.java 68.58% <55.55%> (+1.17%) ⬆️

... and 660 files with indirect coverage changes

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, added minor comments about improving added log messages.

@poorbarcode I have one question: One of the goals of #22977 was to prevent serving requests before the broker is initialized. Did the broker version that you investigated contain this change?

@poorbarcode
Copy link
Contributor Author

@lhotari

I have one question: One of the goals of #22977 was to prevent serving requests before the broker is initialized. Did the broker version that you investigated contain this change?

Yes, it contains. We build the package with the latest version of branch-3.3.

BTW, I think #22977 should have already solved the current issue, let's keep tracing the issue by the new log this PR added.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos to fix in the log messages.

poorbarcode and others added 2 commits November 27, 2024 23:54
…picLookupBase.java

Co-authored-by: Lari Hotari <lhotari@users.noreply.github.com>
…picLookupBase.java

Co-authored-by: Lari Hotari <lhotari@users.noreply.github.com>
@poorbarcode
Copy link
Contributor Author

@lhotari

Some typos to fix in the log messages.

Sorry for that. Fixed

@lhotari lhotari changed the title [fix] [broker] fix null lookup result when brokers are starting [fix][broker] fix null lookup result when brokers are starting Nov 27, 2024
@lhotari lhotari merged commit bd3b3b8 into apache:master Nov 27, 2024
54 checks passed
lhotari pushed a commit that referenced this pull request Nov 27, 2024
lhotari pushed a commit that referenced this pull request Nov 27, 2024
lhotari pushed a commit that referenced this pull request Nov 27, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 28, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 28, 2024
Comment on lines +354 to +357
boolean tlsEnabled = pulsar.getConfig().isBrokerClientTlsEnabled();
if (!tlsEnabled && StringUtils.isBlank(lookupData.getBrokerUrl())) {
log.warn("[{}] Unexpected lookup result: brokerUrl is required when TLS isn't enabled. options: {},"
+ " result {}", topic, options, lookupData);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems not make sense. brokerClientTlsEnabled only represents if TLS is enabled for built-in client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants