Skip to content

KAFKA-19384: The passing of BrokerRegistrationRequestTest is a false positive#4

Closed
apalan60 wants to merge 13 commits into
trunkfrom
Kafka-19384
Closed

KAFKA-19384: The passing of BrokerRegistrationRequestTest is a false positive#4
apalan60 wants to merge 13 commits into
trunkfrom
Kafka-19384

Conversation

@apalan60

Copy link
Copy Markdown
Owner

No description provided.

apoorvmittal10 and others added 13 commits August 7, 2025 13:11
…ache#20315)

The PR removes unnecessary updates for find next fetch offset. When the
state is in transition and not yet completed then anyways respective
offsets should not be considered for acquisition. The find next fetch
offset is updated finally when transition is completed.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit
<adixit@confluent.io>
)

Minor PR to move persister call outside of the lock. The lock is not
required while making the persister call.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit <adixit@confluent.io>
Now that Kafka support Java 17, this PR makes some changes in `trogdor`
module. The changes mostly include:

- Collections.emptyList(), Collections.singletonList() and
Arrays.asList() are replaced with List.of()
- Collections.emptyMap() and Collections.singletonMap() are replaced
with Map.of()
- Collections.singleton() is replaced with Set.of()

Some minor cleanups around use of enhanced switch blocks and conversion
of classes to record classes.

Reviewers: Ken Huang <s7133700@gmail.com>, Vincent Jiang
 <vpotucek@me.com>, Chia-Ping Tsai <chia7712@gmail.com>
…timeout (apache#20310)

Fixing max delivery check on acquisition lock timeout and write state
RPC failure.

When acquisition lock is already timed out and write state RPC failure
occurs then we need to check if records need to be archived. However
with the fix we do not persist the information, which is relevant as
some records may be  archived or delivery count is bumped. The
information will be persisted eventually.

The persister call has failed already hence issuing another persister
call due to a failed persister call earlier is not correct. Rather let
the data persist in future persister calls.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Abhinav Dixit
 <adixit@confluent.io>
…_ENABLE_CONFIG (apache#20322)

Document deprecation of PARTITIONER_ADPATIVE_PARTITIONING_ENABLE_CONFIG
in `upgrade.html`, which was missed in apache#20317

Reviewers: Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
 <chia7712@gmail.com>
…ose (apache#20290)

*What*
https://issues.apache.org/jira/browse/KAFKA-19572

- If a `ShareConsumer` constructor failed due to any exception, then we
call `close()` in the catch block.

- If there were uninitialized members accessed during `close()`, then it
would throw a NPE. Currently there are no null checks, hence we were
attempting to use these fields during `close()` execution.

- To avoid this, PR adds null checks in the `close()` function before we
access fields which possibly could be null.

Reviewers: Apoorv Mittal <amittal@confluent.io>, Lianet Magrans
 <lmagrans@confluent.io>
Improve RLMM doc:
1. Distinguish RLMM configs from other tiered storage configs, all RLMM
configs need to start with a specific prefix, but the original
documentation miss description.
2. Added description of additional configs for client, which is required
when configuring authentication information. This can confuse users, for
example: Aiven-Open/tiered-storage-for-apache-kafka#681

Reviewers: Luke Chen <showuon@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
The previous URL http://lambda-architecture.net/ seems to now be controlled by spammers

Co-authored-by: Shashank <hsshashank.grad@gmail.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
…r in delayedShareFetch tryComplete method (apache#20280)

In the current implementation, some delayed share fetch operations get
trapped in the delayed share fetch purgatory when the partition
leaderships change during share consumption. This is because there is no
check in code to make sure the current broker is still the partition
leader corresponding to the share partitions. So, when leadership
changes, the share partitions cannot be acquired, because they have
already been fenced, and tryComplete returns false. Although the
operatio does get completed when the timer expires for it, but it is too
late by then, and the operation get stuck in the watchers list waiting
for it to get purged when estimated operations increase to more than
1000. This Pr resolves this by adding the required check so that if
partition leadership changes, then the delayed share fetches waiting on
it gets completed instantaneously.

Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield
 <aschofield@confluent.io>
Use Java 24 for the spotbugs checks, now that Spotbugs works  on Java
24.

Added some more warning exclusions for warnings that are new to 4.9.4.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…N format (apache#20336)

This PR do following:
1. Use correct json format to test.
2. make `PartitionReassignmentState` and `VerifyAssignmentResult.java`
become record

Reviewers: TengYao Chi <frankvicky@apache.org>, Ken Huang
 <s7133700@gmail.com>, PoAn Yang <payang@apache.org>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @apalan60, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily addresses a false positive in the BrokerRegistrationRequestTest by migrating the test from Scala to Java and enhancing its validation logic. It also includes several other improvements across the Kafka codebase, focusing on null-safety in consumer operations, refining share partition state management, and modernizing various utility and test classes by adopting newer Java language features.

Highlights

  • Fixing False Positive Test: The BrokerRegistrationRequestTest was refactored from Scala to Java, introducing more robust validation for broker registration scenarios, particularly concerning metadata version compatibility and ZK broker migrations, which previously led to a false positive test result.
  • Enhanced Consumer Stability: Added comprehensive null-safety checks within ShareConsumerImpl to prevent NullPointerExceptions during consumer close operations and acknowledgement handling, improving the consumer's robustness.
  • Improved Share Partition State Management: Refined the logic for acquiring and acknowledging records in SharePartition, ensuring correct state transitions and offset management, especially when dealing with acquisition lock timeouts and write state RPC failures.
  • Codebase Modernization: Updated various utility and test classes across the project to leverage modern Java features, such as records and enhanced collection factory methods, contributing to cleaner and more concise code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@apalan60 apalan60 closed this Aug 11, 2025

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a variety of improvements across the codebase. It fixes a false positive in BrokerRegistrationRequestTest by replacing the Scala test with a more accurate Java version. It also enhances robustness in the ShareConsumerImpl by adding defensive null checks, which are validated by a new test case. Several bug fixes and logical improvements have been made to the share consumer and delayed fetch handling, particularly around state transitions and leadership changes. Additionally, a significant portion of the trogdor and tools modules has been modernized, adopting Java records, switch expressions, and immutable collections, which improves code clarity and safety. The upgrade of the SpotBugs tool and removal of related workarounds is also a welcome maintenance update. Overall, these are solid changes that improve test quality, code robustness, and maintainability.

Comment on lines 67 to 78
public void testPartitionNumbers() {
List<Integer> partsANumbers = PARTSA.partitionNumbers();
assertEquals(Integer.valueOf(0), partsANumbers.get(0));
assertEquals(Integer.valueOf(1), partsANumbers.get(1));
assertEquals(Integer.valueOf(2), partsANumbers.get(2));
assertTrue(partsANumbers.contains(0));
assertTrue(partsANumbers.contains(1));
assertTrue(partsANumbers.contains(2));
assertEquals(3, partsANumbers.size());

List<Integer> partsBNumbers = PARTSB.partitionNumbers();
assertEquals(Integer.valueOf(0), partsBNumbers.get(0));
assertEquals(Integer.valueOf(1), partsBNumbers.get(1));
assertTrue(partsBNumbers.contains(0));
assertTrue(partsBNumbers.contains(1));
assertEquals(2, partsBNumbers.size());
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The change from assertEquals(Integer.valueOf(0), partsANumbers.get(0)) to assertTrue(partsANumbers.contains(0)) is a good improvement. The original test was depending on the iteration order of a HashMap's key set, which is not guaranteed. The new assertion correctly checks for the presence of the partition numbers without being sensitive to their order, making the test more robust.

Comment on lines +259 to +264
return switch (state) {
case PENDING -> new TaskPending(spec);
case RUNNING -> new TaskRunning(spec, startedMs, getCombinedStatus());
case STOPPING -> new TaskStopping(spec, startedMs, getCombinedStatus());
case DONE -> new TaskDone(spec, startedMs, doneMs, error, cancelled, getCombinedStatus());
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a switch expression here is a great modernization. It's more concise and, being exhaustive, removes the need for a final throw new RuntimeException("unreachable"), making the code safer at compile time.

Comment on lines +132 to +135
return switch (spec.action()) {
case CONNECT -> new ConnectStressor(spec);
case FETCH_METADATA -> new FetchMetadataStressor(spec);
};

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This refactoring from a traditional switch statement to a more modern switch expression is a good improvement. It makes the code more concise and readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants