-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][test] Stabilize testMsgDropStat by reliably triggering non-persistent publisher drop #24929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…istent publisher drop Reliably trigger and verify connection-level non-persistent drop while preserving the original scenario and API. - Set maxConcurrentNonPersistentMessagePerConnection=0 in the test and restart the broker. Since ServerCnx drops when inFlight > max, setting the limit to 0 causes any second overlapping send on the same connection to be dropped (entryId == -1) and recorded. - Replace the previous large task loop that used Thread.sleep and a 20% latch with a small, synchronized batch of producer.send() calls started together using a CyclicBarrier. This explicitly creates the required overlap without sleeps or ad hoc timing. No production code changes. Test only. Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, I added some review comments to address.
pulsar-broker/src/test/java/org/apache/pulsar/client/api/NonPersistentTopicTest.java
Outdated
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/client/api/NonPersistentTopicTest.java
Show resolved
Hide resolved
pulsar-broker/src/test/java/org/apache/pulsar/client/api/NonPersistentTopicTest.java
Outdated
Show resolved
Hide resolved
- Replace completionLatch.await() with assertTrue(completionLatch.await(20, TimeUnit.SECONDS)) - Replace manual throwable check with assertNull(error.get(), "Concurrent send encountered an exception") Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com>
|
@lhotari - Thank you for the review. I’ve addressed the comments. Could you please take a look when convenient? |
|
/pulsarbot rerun-failure-checks |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #24929 +/- ##
=============================================
+ Coverage 38.74% 74.25% +35.51%
- Complexity 13265 33511 +20246
=============================================
Files 1856 1913 +57
Lines 145165 149446 +4281
Branches 16848 17362 +514
=============================================
+ Hits 56238 110967 +54729
+ Misses 81385 29625 -51760
- Partials 7542 8854 +1312
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
…istent publisher drop (apache#24929) Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> (cherry picked from commit 60acfba) (cherry picked from commit 1a2e12f)
…istent publisher drop (apache#24929) Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> (cherry picked from commit 60acfba) (cherry picked from commit 1a2e12f)
…istent publisher drop (apache#24929) Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> (cherry picked from commit 60acfba)
…istent publisher drop (apache#24929) Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> (cherry picked from commit 60acfba) (cherry picked from commit d66e5ee)
…istent publisher drop (apache#24929) Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> (cherry picked from commit 60acfba) (cherry picked from commit d66e5ee)
Fixes #24908
Motivation
NonPersistentTopicTest.testMsgDropStatfails intermittently on CI:Failure point:
assertTrue(latch.await(5, TimeUnit.SECONDS))times out because no publisher drop is observed within 5 seconds (noMessageIdImplwithentryId == -1).Root cause (in the original test loop)
pulsar/pulsar-broker/src/test/java/org/apache/pulsar/client/api/NonPersistentTopicTest.java
Lines 882 to 905 in fbc50b0
Why it flakes:
maxConcurrentNonPersistentMessagePerConnection = 1. InServerCnx#handleSend, the broker drops only whennonPersistentPendingMessages > maxNonPersistentMessagePerConnection. WithmaxConcurrentNonPersistentMessagePerConnection = 1, the first concurrent send increments the counter from 0 to 1 and is accepted. The second concurrent send checks 1 > 1 (false), then increments the counter to 2 and is accepted. Only a third overlapping send sees 2 > 1 and is dropped. Therefore, at least three overlapping sends are required to trigger any drop. The original loop does not guarantee that level of overlap.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java
Lines 1999 to 2013 in fbc50b0
producer.send()are not coordinated to start together, so overlap ofsend()calls on the same connection is left to scheduler timing and is therefore probabilistic.Thread.sleep(10), introducing timing assumptions that vary under CI load.This PR makes the publisher-drop trigger reliable without changing the test scenario or API usage.
Modifications
Test-only changes in
NonPersistentTopicTest.testMsgDropStat. No production code or docs changed.Replace the unsynchronized, sleep-based loop with a small synchronized batch of
send()calls.Use a
CyclicBarrierto start a fixed number ofproducer.send()calls at the same time, explicitly creating the required overlap without sleeps or ad-hoc timing. Execute the batch inside a boundedAwaitilitywait until at least onesend()returns aMessageIdImplwithentryId == -1, then assert that condition.Changed
maxConcurrentNonPersistentMessagePerConnectionfrom 1 to 0conf.setMaxConcurrentNonPersistentMessagePerConnection(0);Rational: In
ServerCnx#handleSendfor non-persistent topics, a publish is dropped only whennonPersistentPendingMessages > maxNonPersistentMessagePerConnection. Lowering the limit from 1 to 0 reduces the overlap requirement from three in-flight sends to two, which, combined with the synchronized start, makes the publisher drop reliably detectable in this test.Verifying this change
Personal CI Results
Tested in Personal CI fork: vinkal-chudgar#3
Status: All checks have passed (48 successful checks, 3 skipped)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: vinkal-chudgar#3