Skip to content

Conversation

@vinkal-chudgar
Copy link
Contributor

@vinkal-chudgar vinkal-chudgar commented Nov 10, 2025

Fixes #24963

Motivation

The in-flight deduplication for BinaryProtoLookupService#getPartitionedTopicMetadata used only TopicName. Calls that differ in metadataAutoCreationEnabled or useFallbackForNonPIP344Brokers were coalesced, which could return a future that does not match the requested behavior.

Concrete example:

  • Thread A calls getPartitionedTopicMetadata(topic, true, false) and creates a future
  • Thread B calls getPartitionedTopicMetadata(topic, false, true) and receives the same future from A
  • Thread B gets a result that reflects metadataAutoCreationEnabled=true, which is incorrect

This PR fixes deduplication by keying in-flight requests on all method parameters.

Modifications

  • Implemented composite key class PartitionedTopicMetadataKey (static inner class) to uniquely identify requests by topicName, metadataAutoCreationEnabled, useFallbackForNonPIP344Brokers
  • Changed partitionedMetadataInProgress to use PartitionedTopicMetadataKey as the map key instead of TopicName
  • Used computeIfAbsent with the new key and removed the entry on completion with remove(key, future)
  • Tests
    • Added BinaryProtoLookupServiceTest#testPartitionedMetadataDeduplicationAndCleanup
    • Added BinaryProtoLookupServiceTest#testPartitionedMetadataDeduplicationDifferentParameterCombinations

Verifying this change

  • Make sure that the change passes the CI checks.

This change added tests and can be verified as follows:

  • Added BinaryProtoLookupServiceTest#testPartitionedMetadataDeduplicationAndCleanup
  • Added BinaryProtoLookupServiceTest#testPartitionedMetadataDeduplicationDifferentParameterCombinations

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: vinkal-chudgar#5

…clude method parameters"

Key partitionedMetadataInProgress by TopicName, metadataAutoCreationEnabled, and useFallbackForNonPIP344Brokers.
This prevents coalescing of requests that should be distinct.

Fixes apache#24963

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Nov 10, 2025
@vinkal-chudgar vinkal-chudgar marked this pull request as ready for review November 10, 2025 21:31
@Technoboy-
Copy link
Contributor

Why could take the method params into account ? There will be more request to the broker, but the result is the same.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Nov 11, 2025

Why could take the method params into account ? There will be more request to the broker, but the result is the same.

@Technoboy- I think that this is more about correctness than the possibility that the broker response might be the same in most cases. It's strictly not the same in all cases. Therefore I think that such bugs should be fixed.

@Technoboy-
Copy link
Contributor

Why could take the method params into account ? There will be more request to the broker, but the result is the same.

@Technoboy- I think that this is more about correctness than the possibility that the broker response might be the same in most cases. It's strictly not the same in all cases. Therefore I think that such bugs should be fixed.

This is not a bug

Copy link
Contributor

@Technoboy- Technoboy- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is ok for the topicName, please give more possible issue for this bug

@lhotari
Copy link
Member

lhotari commented Nov 11, 2025

Why could take the method params into account ? There will be more request to the broker, but the result is the same.

@Technoboy- I think that this is more about correctness than the possibility that the broker response might be the same in most cases. It's strictly not the same in all cases. Therefore I think that such bugs should be fixed.

This is not a bug

@Technoboy- I think that depends on how you define a bug. The impact of the issue is that boolean metadataAutoCreationEnabled, boolean useFallbackForNonPIP344Brokers parameters for the getPartitionedTopicMetadata method are ignored for any other requests which are made while there's a request in progress. Let's say that the first request is made with metadataAutoCreationEnabled=false and while it's in progress a request with metadataAutoCreationEnabled=true is made. This will definitely result in an inconsistent situation. However, I agree that the likelihood that this would happen is low. Regardless of that, such issues should be fixed and I would consider them bugs. Your definition of a bug could vary and I think it's fine.

@vinkal-chudgar
Copy link
Contributor Author

Why could take the method params into account ? There will be more request to the broker, but the result is the same.

@Technoboy- The reason we must include these parameters in the key is that they can lead to fundamentally different outcomes for the caller, breaking the API contract if not handled separately. The logic within getPartitionedTopicMetadataAsync demonstrates this.

if (!metadataAutoCreationEnabled && !clientCnx.isSupportsGetPartitionedMetadataWithoutAutoCreation()) {
    if (useFallbackForNonPIP344Brokers) {
        // Fall back to old behavior
        finalAutoCreationEnabled = true;
    } else {
        // This path fails immediately
        partitionFuture.completeExceptionally(new PulsarClientException.FeatureNotSupportedException(...));
        return;
    }
}
// This path sends a request to the broker
ByteBuf request = Commands.newPartitionMetadataRequest(topicName.toString(), requestId, finalAutoCreationEnabled);

Here is a practical example of why caching them separately is critical, as two calls with the same TopicName but different parameters can produce completely different results:

  • When the broker does not support PIP-344, the outcome depends entirely on the useFallbackForNonPIP344Brokers flag:

    • One call might fail immediately with a FeatureNotSupportedException.
    • Another might fall back to the old behavior and succeed.

    If we only deduped by TopicName, a caller that expects a failure could incorrectly receive a successful result from the other caller. This breaks the method contract for that caller.

  • Even when the broker supports PIP-344, the metadataAutoCreationEnabled flag changes the request sent. Coalescing requests with different values for this flag hides the caller’s chosen behavior and side effects (like topic auto-creation).

Regarding your point about increasing request volume: this change still correctly deduplicates all concurrent calls that use the exact same parameters. A new request is only sent when the parameters are different, ensuring we respect the caller's intent while still preventing redundant traffic for identical calls.

@Technoboy- Technoboy- added this to the 4.2.0 milestone Nov 11, 2025
@Technoboy-
Copy link
Contributor

Why could take the method params into account ? There will be more request to the broker, but the result is the same.

@Technoboy- I think that this is more about correctness than the possibility that the broker response might be the same in most cases. It's strictly not the same in all cases. Therefore I think that such bugs should be fixed.

This is not a bug

@Technoboy- I think that depends on how you define a bug. The impact of the issue is that boolean metadataAutoCreationEnabled, boolean useFallbackForNonPIP344Brokers parameters for the getPartitionedTopicMetadata method are ignored for any other requests which are made while there's a request in progress. Let's say that the first request is made with metadataAutoCreationEnabled=false and while it's in progress a request with metadataAutoCreationEnabled=true is made. This will definitely result in an inconsistent situation. However, I agree that the likelihood that this would happen is low. Regardless of that, such issues should be fixed and I would consider them bugs. Your definition of a bug could vary and I think it's fine.

Make sense

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 61.11111% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.30%. Comparing base (0896c0a) to head (ae68bbd).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...e/pulsar/client/impl/BinaryProtoLookupService.java 61.11% 3 Missing and 4 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24965      +/-   ##
============================================
- Coverage     74.44%   74.30%   -0.14%     
+ Complexity    34071    33520     -551     
============================================
  Files          1920     1920              
  Lines        150062   150077      +15     
  Branches      17404    17408       +4     
============================================
- Hits         111708   111515     -193     
- Misses        29495    29672     +177     
- Partials       8859     8890      +31     
Flag Coverage Δ
inttests 26.32% <55.55%> (-0.32%) ⬇️
systests 22.81% <55.55%> (-0.09%) ⬇️
unittests 73.85% <61.11%> (-0.12%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...e/pulsar/client/impl/BinaryProtoLookupService.java 79.49% <61.11%> (-1.76%) ⬇️

... and 97 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhotari lhotari merged commit 0cdab92 into apache:master Nov 11, 2025
100 of 107 checks passed
lhotari pushed a commit that referenced this pull request Nov 11, 2025
…clude method parameters (#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
lhotari pushed a commit that referenced this pull request Nov 11, 2025
…clude method parameters (#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
lhotari pushed a commit that referenced this pull request Nov 11, 2025
…clude method parameters (#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
lhotari pushed a commit that referenced this pull request Nov 11, 2025
…clude method parameters (#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
nodece pushed a commit to nodece/pulsar that referenced this pull request Nov 12, 2025
…clude method parameters (apache#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 13, 2025
…clude method parameters (apache#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
(cherry picked from commit fac8f15)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 13, 2025
…clude method parameters (apache#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
(cherry picked from commit fac8f15)
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 13, 2025
…clude method parameters (apache#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
(cherry picked from commit d0ff77e)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 14, 2025
…clude method parameters (apache#24965)

Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
(cherry picked from commit 0cdab92)
(cherry picked from commit d0ff77e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Deduplication logic for getPartitionedTopicMetadata doesn't take method parameters into account

4 participants