Skip to content

Conversation

@Denovo1998
Copy link
Contributor

@Denovo1998 Denovo1998 commented Oct 17, 2025

Fixes #24846

Main Issue: #xyz

PIP: #xyz

Motivation

The test PublishRateLimiterOverconsumingTest.testOverconsumingTokensWithBrokerPublishRateLimiter was intermittently failing under concurrent + batching scenarios. Failures were caused by:

  • Warm-up and window misalignment: the first effective “second” often spans partial time across two real seconds, producing a very low first bucket and a very high second bucket (e.g., 24 and 1212).
  • Bursty traffic: multiple independent producers with batching flushes lead to token bursts that concentrate into a single second.
  • Average rate computed from total elapsed time: when bursts make the overall processing time shorter, avg = totalMessages/timeSinceStart becomes artificially high (e.g., 750 vs 500) even though the rate limiter is fine.

These issues are the same class as addressed for Dispatch in PR #24012. The test needs to measure rate more robustly to avoid false negatives while still detecting real regressions.

org.apache.pulsar.broker.service.DispatchRateLimiterOverconsumingTest#testOverconsumingTokensWithBrokerDispatchRateLimiter

Modifications

This change makes the test resilient by aligning measurement and smoothing per-second variability, following the approach used for Dispatch in PR #24012:

  • Align measurement start to the first received message:

    • Start counting only after the first message arrives (set startTimeNanos in the messageListener). This removes the initial misalignment in the first bucket.
  • Use windowed average for “overall average”:

    • Compute the test’s “average rate” from the collected per-second windows (skip the first second and take up to durationSeconds windows), instead of using totalMessages/elapsedTime, which is sensitive to overall time shortening due to bursts.
    • Assertion: the window-based average must be within ±40% of the configured rate.
  • Use adjacent 2-second averages for per-second stability:

    • For each pair of adjacent seconds (skipping the head and tail pairs), assert that the mean of the two is within ±55% of the configured rate. This compensates for a low bucket followed by a high bucket caused by bursts or warm-up misalignment.
    • If there are too few windows (< 4), fall back to a single-second check (skip head and tail) with a ±50% tolerance to avoid spurious failures.
  • Make wait condition robust:

    • Wait for all messages to be consumed rather than requiring a particular number of windows. This avoids waiting on windows in fast runs.
  • Include trailing partial window:

    • Add the last partial bucket (if any) to the snapshot to not silently drop the tail.
  • Non-functional/logging:

    • Keep the 500 ms polling for the rate tracker but only emit a per-second rate on full-second boundaries using lastCalculatedSecond.
    • Additional comments and logs for clarity.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: Denovo1998#13

… measurement and using adjacent 2-sec averages
@codecov-commenter
Copy link

codecov-commenter commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.27%. Comparing base (ee33c99) to head (0dd64d9).
⚠️ Report is 128 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24864      +/-   ##
============================================
- Coverage     74.45%   74.27%   -0.18%     
+ Complexity    33531    33461      -70     
============================================
  Files          1913     1913              
  Lines        149280   149315      +35     
  Branches      17324    17331       +7     
============================================
- Hits         111140   110907     -233     
- Misses        29341    29559     +218     
- Partials       8799     8849      +50     
Flag Coverage Δ
inttests 26.67% <ø> (-0.26%) ⬇️
systests 22.78% <ø> (-0.11%) ⬇️
unittests 73.79% <ø> (-0.18%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 98 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lhotari lhotari merged commit 562d446 into apache:master Oct 17, 2025
99 of 102 checks passed
@lhotari lhotari added this to the 4.2.0 milestone Oct 28, 2025
lhotari pushed a commit that referenced this pull request Oct 28, 2025
… measurement and using adjacent 2-sec averages (#24864)

(cherry picked from commit 562d446)
lhotari pushed a commit that referenced this pull request Oct 28, 2025
… measurement and using adjacent 2-sec averages (#24864)

(cherry picked from commit 562d446)
manas-ctds pushed a commit to datastax/pulsar that referenced this pull request Oct 29, 2025
… measurement and using adjacent 2-sec averages (apache#24864)

(cherry picked from commit 562d446)
(cherry picked from commit 2d7ce26)
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Nov 6, 2025
… measurement and using adjacent 2-sec averages (apache#24864)

(cherry picked from commit 562d446)
(cherry picked from commit 2d7ce26)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky-test: PublishRateLimiterOverconsumingTest.testOverconsumingTokensWithBrokerPublishRateLimiter

3 participants