[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning measurement and using adjacent 2-sec averages #24864

Denovo1998 · 2025-10-17T04:15:31Z

Main Issue: #xyz

PIP: #xyz

Motivation

The test PublishRateLimiterOverconsumingTest.testOverconsumingTokensWithBrokerPublishRateLimiter was intermittently failing under concurrent + batching scenarios. Failures were caused by:

Warm-up and window misalignment: the first effective “second” often spans partial time across two real seconds, producing a very low first bucket and a very high second bucket (e.g., 24 and 1212).
Bursty traffic: multiple independent producers with batching flushes lead to token bursts that concentrate into a single second.
Average rate computed from total elapsed time: when bursts make the overall processing time shorter, avg = totalMessages/timeSinceStart becomes artificially high (e.g., 750 vs 500) even though the rate limiter is fine.

These issues are the same class as addressed for Dispatch in PR #24012. The test needs to measure rate more robustly to avoid false negatives while still detecting real regressions.

org.apache.pulsar.broker.service.DispatchRateLimiterOverconsumingTest#testOverconsumingTokensWithBrokerDispatchRateLimiter

Modifications

This change makes the test resilient by aligning measurement and smoothing per-second variability, following the approach used for Dispatch in PR #24012:

Align measurement start to the first received message:
- Start counting only after the first message arrives (set startTimeNanos in the messageListener). This removes the initial misalignment in the first bucket.
Use windowed average for “overall average”:
- Compute the test’s “average rate” from the collected per-second windows (skip the first second and take up to durationSeconds windows), instead of using totalMessages/elapsedTime, which is sensitive to overall time shortening due to bursts.
- Assertion: the window-based average must be within ±40% of the configured rate.
Use adjacent 2-second averages for per-second stability:
- For each pair of adjacent seconds (skipping the head and tail pairs), assert that the mean of the two is within ±55% of the configured rate. This compensates for a low bucket followed by a high bucket caused by bursts or warm-up misalignment.
- If there are too few windows (< 4), fall back to a single-second check (skip head and tail) with a ±50% tolerance to avoid spurious failures.
Make wait condition robust:
- Wait for all messages to be consumed rather than requiring a particular number of windows. This avoids waiting on windows in fast runs.
Include trailing partial window:
- Add the last partial bucket (if any) to the snapshot to not silently drop the tail.
Non-functional/logging:
- Keep the 500 ms polling for the rate tracker but only emit a per-second rate on full-second boundaries using lastCalculatedSecond.
- Additional comments and logs for clarity.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: Denovo1998#13

… measurement and using adjacent 2-sec averages

codecov-commenter · 2025-10-17T09:42:42Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.27%. Comparing base (ee33c99) to head (0dd64d9).
⚠️ Report is 128 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #24864      +/-   ##
============================================
- Coverage     74.45%   74.27%   -0.18%     
+ Complexity    33531    33461      -70     
============================================
  Files          1913     1913              
  Lines        149280   149315      +35     
  Branches      17324    17331       +7     
============================================
- Hits         111140   110907     -233     
- Misses        29341    29559     +218     
- Partials       8799     8849      +50

Flag	Coverage Δ
inttests	`26.67% <ø> (-0.26%)`	⬇️
systests	`22.78% <ø> (-0.11%)`	⬇️
unittests	`73.79% <ø> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 98 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

… measurement and using adjacent 2-sec averages (#24864) (cherry picked from commit 562d446)

… measurement and using adjacent 2-sec averages (apache#24864) (cherry picked from commit 562d446) (cherry picked from commit 2d7ce26)

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning…

0dd64d9

… measurement and using adjacent 2-sec averages

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 17, 2025

This was referenced Oct 17, 2025

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning measurement and using adjacent 2-sec averages Denovo1998/pulsar#13

Closed

Flaky-test: PublishRateLimiterOverconsumingTest.testOverconsumingTokensWithBrokerPublishRateLimiter #24846

Closed

lhotari approved these changes Oct 17, 2025

View reviewed changes

lhotari merged commit 562d446 into apache:master Oct 17, 2025
99 of 102 checks passed

lhotari added this to the 4.2.0 milestone Oct 28, 2025

lhotari added release/4.1.2 release/4.0.8 labels Oct 28, 2025

lhotari pushed a commit that referenced this pull request Oct 28, 2025

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning…

2d7ce26

… measurement and using adjacent 2-sec averages (#24864) (cherry picked from commit 562d446)

lhotari added the cherry-picked/branch-4.0 label Oct 28, 2025

lhotari pushed a commit that referenced this pull request Oct 28, 2025

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning…

25aba4d

… measurement and using adjacent 2-sec averages (#24864) (cherry picked from commit 562d446)

lhotari added the cherry-picked/branch-4.1 label Oct 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning measurement and using adjacent 2-sec averages #24864

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning measurement and using adjacent 2-sec averages #24864

Uh oh!

Denovo1998 commented Oct 17, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning measurement and using adjacent 2-sec averages #24864

[fix][test] Stabilize PublishRateLimiterOverconsumingTest by aligning measurement and using adjacent 2-sec averages #24864

Uh oh!

Conversation

Denovo1998 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Verifying this change

Documentation

Matching PR in forked repository

Uh oh!

codecov-commenter commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Denovo1998 commented Oct 17, 2025 •

edited

Loading

codecov-commenter commented Oct 17, 2025 •

edited

Loading