-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][client] Fix consumer leak when thread is interrupted before subscribe completes #24100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0c22173 to
e7922a3
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #24100 +/- ##
============================================
+ Coverage 73.57% 74.24% +0.66%
+ Complexity 32624 32452 -172
============================================
Files 1877 1863 -14
Lines 139502 144327 +4825
Branches 15299 16458 +1159
============================================
+ Hits 102638 107150 +4512
+ Misses 28908 28710 -198
- Partials 7956 8467 +511
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
@shibd Good work! It would be useful to have a failing test case that reproduces this problem included in this PR. Could you please add it too? It seems that it would be a slight variation of the test case that you have added. The "can't get consumer object" part isn't currently tested so that after fixing the issue the "consumer object can be get" (the test would clarify what this actually means). |
In this case, regardless of this fix. the user wont get the consumer object, which is expected. Because the subscribe throw an exception, that's not the focus. The focus is that we need to close this consumer. |
@shibd My point was that once there's a bug, a good practice would be to add a failing test case which reproduces the issue and clarifies it. In this case "the user wont get the consumer object, which is expected" could be understood in many ways. It might be clear to yourself, but not for code reviewers. That's why "failing test cases" reproducing the exact problem are always desired when it's possible to add one. Obviously for many Pulsar bugs that's really hard. In this case, it most likely isn't hard. |
|
You can run |
@shibd That's not reproducing this flow what you described in the PR description
The step "Cannot close the consumer since user will get InterruptedException and can't get consumer object" is missing. Adding a separate test case that reproduces the user's flow would be useful as I have explained before. Just simply put it in code in a test case which clarifies what "Cannot close the consumer since user will get InterruptedException and can't get consumer object" means. That's the most simplest way without wasting time in writing comments. :) |
|
I don't want to waste your time. I believe I've already added tests to cover it. You'll see that I've covered in the tests that the consumer will be removed(closed) after this PR fix. That's the issue I'm addressing. In this scenario, not getting the consumer object is expected. Why would we add a test case for it? A method throws an exception, and you want to cover that it returns null? If you can review the code directly and let me know where improvements are needed, I'd be happy to optimize it. |
That's true, that there's a test, but the test is not from the viewpoint of the user of our software. The flow described in the description isn't performed in the test case. That would be an optimal "failing test case". In many software teams, this is a well known concept. However, in the Pulsar code base, most of our tests are not following usual best practices and don't appear to be good examples of how to write good tests.
That's not "Cannot close the consumer since user will get InterruptedException and can't get consumer object", what was in the description. I agree that the internal behavior change is covered in the test, but not the original bug that appeared.
Simply test the scenario "Cannot close the consumer since user will get InterruptedException and can't get consumer object" and assume that it results in whatever is the expected result. It's not more harder than that.
As long as the mentioned scenario is handled in a separate test case, I'm fine with this PR. I know that it's unusual to knit pick on such details in our PRs, but unless attention is paid to details we won't be able to learn from each other and improve as a whole. |
First, I clearly asserted here that when the user calls Isn't that enough? |
|
hi, @lhotari , I respect all your opinions, but I really didn't understand what you meant. Could you comment directly on the code about what needs to be changed? Or could you show the test you want directly in the code? |
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConsumerBuilderImpl.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This issue is somewhat peripheral, but it poses a potential risk. In Java, thread interruptions are often ignored, which can lead to resource leaks. Additionally, we have many similar methods that do not properly handle interruptions.
To ensure correctness, users should be aware that they must handle thread interruptions appropriately.
When using the Pulsar Client/Admin, the users should not call the thread interrupt.
|
hi, @nodece For operation type of api, such as: However, we cannot consider the consumer leak described in this PR as expected behavior; we should fix it. |
Thanks @shibd. It is simply about "Cannot close the consumer since user will get InterruptedException and in this case user can't get consumer object". I don't understand the part "and in this case user can't get consumer object" and was expecting that a test case would have clarified it. Now, reading your comments again, it seems that "user can't get consumer object" is expected behavior and this sentence is just misleading and that's not something to be tested in the first place as you have described. |
|
@shibd this is the unedited DeepSeek suggestion: https://gist.github.com/lhotari/c76c08a59c6c17b27425cea04b7fa215 |
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, good work @shibd
…scribe completes (apache#24100) (cherry picked from commit e51a639) (cherry picked from commit 6bc4f8f)
…scribe completes (apache#24100) (cherry picked from commit e51a639)
…scribe completes (apache#24100) (cherry picked from commit e51a639) (cherry picked from commit 6bc4f8f)
…scribe completes (apache#24100) (cherry picked from commit e51a639) (cherry picked from commit 6bc4f8f)
…scribe completes (apache#24100) (cherry picked from commit e51a639) (cherry picked from commit 9d8e385)
…scribe completes (apache#24100) (cherry picked from commit e51a639) (cherry picked from commit 9d8e385)
…scribe completes (apache#24100) (cherry picked from commit e51a639) (cherry picked from commit 9d8e385)
Motivation
When the user has a thread try to subscribe(create a consumer) to a topic.
But the thread get interrupted before getting a response for the consumer creation.
For example:
Thread Afor aborting a job.Modifications
interruptedBeforeConsumerCreation. When this variable is true, the consumer needs to be closed.Verifying this change
testInterruptedWhenCreateConsumerto reproduce this issue.Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: