KAFKA-19585: Avoid noisy NPE logs when closing consumer after constructor failures#20491
Conversation
lianetm
left a comment
There was a problem hiding this comment.
Thanks for the patch! Just a minor comment for consideration
| @Test | ||
| public void testFailConstructor() { | ||
| final Properties props = requiredConsumerConfig(); | ||
| props.put(ConsumerConfig.GROUP_ID_CONFIG, "group-id"); |
There was a problem hiding this comment.
Interesting, trying to understand why we needed this, I realized that it was probably to make sure there is a non-null groupMetadata, so you can ensure that the test pass because it hits they newly added null check on appEventHandler, and not because of the existing early return on groupMetadata.isEmpty, correct? But then I realized that the groupMetadata creation happens after the metrics anyways, so it will end up always being null for this test anyways, right? (meaning I expect this test would pass, no noisy NPE even without this PR)
That being said, I think the sanity checks make sense, and the test will surely help catch any regression (a simple change or order in the actions of the constructor would make the noisy NPEs appear I expect).
So no changes I would say, just thinking out loud here and sharing for the record.
There was a problem hiding this comment.
groupMetadata is always initialized (ref), so my understanding was what you said, i.e. "you can ensure that the test pass because it hits they newly added null check on appEventHandler, and not because of the existing early return on groupMetadata.isEmpty".
creation happens after the metrics anyways, so it will end up always being null for this test anyways, right?
Good point. The only place we could add something after the initialization of groupMetadata but before the initialization of the other handlers/invokers that are checked for nullity is in RequestManagers.supplier, but that doesn't throw any exception. Now I'm wondering how these NPEs were obtained in the first place 🤔? Only place groupMetadata isn't checked for emptiness is in awaitPendingAsyncCommitsAndExecuteCommitCallbacks, and lastPendingAsyncCommit is always null unless there's already been an asyncCommit, which I assume is impossible if the consumer isn't yet built?
There was a problem hiding this comment.
I have just dived into the constructor and found that reproducing the noisy close scenario is pretty tricky.
In the context of Kafka, most cases will return early due to the groupMetadata.isEmpty.
I suppose the scenario might only occur at the JVM level, such as an OOM during construction.
| props.put(ConsumerConfig.METRIC_REPORTER_CLASSES_CONFIG, "an.invalid.class"); | ||
| final ConsumerConfig config = new ConsumerConfig(props); | ||
|
|
||
| try (LogCaptureAppender appender = LogCaptureAppender.createAndRegister()) { |
There was a problem hiding this comment.
It seems that code was copied from ShareConsumerImplTest#testFailConstructor with some improvements -- for example, the appender now gets closed :)
@gensericghiro Do you have a free cycle to file a MINOR patch to fix it in ShareConsumerImplTest#testFailConstructor?
There was a problem hiding this comment.
Yes, will do it
…ctor failures (apache#20491) If there's a failure in the kafka consumer constructor, we attempt to close it https://github.com/lianetm/kafka/blob/2329def2ff9ca4f7b9426af159b6fa19a839dc4d/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L540 In that case, it could be the case that some components may have not been created, so we should consider some null checks to avoid noisy logs about NPE. This noisy logs have been reported with the console share consumer in a similar scenario, so this task is to review and do a similar fix for the Async if needed. The fix is to check if handlers/invokers are null before trying to close them. Similar to what was done here apache#20290 Reviewers: TengYao Chi <frankvicky@apache.org>, Lianet Magrans <lmagrans@confluent.io>
…va::testFailConstructor (apache#20514) Similarly to what was done for AsyncKafkaConsumerTest::testFailConstructor, [here](apache#20491) Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…ctor failures (apache#20491) If there's a failure in the kafka consumer constructor, we attempt to close it https://github.com/lianetm/kafka/blob/2329def2ff9ca4f7b9426af159b6fa19a839dc4d/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L540 In that case, it could be the case that some components may have not been created, so we should consider some null checks to avoid noisy logs about NPE. This noisy logs have been reported with the console share consumer in a similar scenario, so this task is to review and do a similar fix for the Async if needed. The fix is to check if handlers/invokers are null before trying to close them. Similar to what was done here apache#20290 Reviewers: TengYao Chi <frankvicky@apache.org>, Lianet Magrans <lmagrans@confluent.io>
…va::testFailConstructor (apache#20514) Similarly to what was done for AsyncKafkaConsumerTest::testFailConstructor, [here](apache#20491) Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
If there's a failure in the kafka consumer constructor, we attempt to
close it
https://github.com/lianetm/kafka/blob/2329def2ff9ca4f7b9426af159b6fa19a839dc4d/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java#L540
In that case, it could be the case that some components may have not
been created, so we should consider some null checks to avoid noisy logs
about NPE.
This noisy logs have been reported with the console share consumer in a
similar scenario, so this task is to review and do a similar fix for the
Async if needed.
The fix is to check if handlers/invokers are null before trying to close
them. Similar to what was done here
#20290
Reviewers: TengYao Chi frankvicky@apache.org, Lianet Magrans
lmagrans@confluent.io