-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix][broker] Make InflightReadsLimiter asynchronous and apply it for replay queue reads #23901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0bd227b to
302b671
Compare
|
Could we confirm if we are not over-engineering this rate limiter?
|
|
It looks like new configurations have been added and the code has been refactored. I think we need a PIP and should avoid cherry-picking back to previous versions. |
@shibd This PR is addressing bugs in the existing implementation, #23506 and #23504. I can always create a PIP to cover things, however I'd rather first have this PR reviewed. Avoiding cherry-picking back to previous versions doesn't make sense since this PR addresses clear issues that also Pulsar 3.0.x users are facing. |
Good questions, @heesung-sn.
That's a valid point. In the current solution, there's also a queue when
There will always be a need for a queue unless excessive traffic is shed by rejecting requests and making them fail. It's a valid question whether the implementation could queue up requests to another queue completely or fail requests. The current solution already rejects requests when the limit is reached. That's a problem, especially for enabling the The scope of this PR is to apply the pulsar/managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/ManagedCursorImpl.java Lines 1628 to 1658 in 331a997
(I have other plans to improve this logic, but that's not the scope of this PR.) That's why failing individual read operations isn't a proper way to trigger backpressure in Pulsar. It will add even more load to the system when retries happen. A better approach is to slow down processing (queue) and let all other backpressure solutions propagate and eventually slow down incoming work into the system. That's how backpressure should work, but there are currently gaps in Pulsar. There's a broader design problem that backpressure isn't designed end-to-end so that this would actually happen and would be validated with analysis andtests. There was a discussion about backpressure on the dev mailing list in 2022 without much participation from the community (backpressure was also covered slightly in rate limiting discussions). In general, there are solutions for minimal queues in backpressure solutions. When all operations are blocking operations, it's a simple problem to solve. With asynchronous operations, it becomes a hard problem to solve. In the JVM space, the Reactive Streams specification and libraries are explicitly focused on solving asynchronous non-blocking backpressure with minimal queues (buffers) in processing pipelines. In the case of the
Essentially, InflightsReadsLimiter is like Java's java.util.concurrent.Semaphore. It's not a rate limiter where the rate is defined in terms of requests per second. The InflightsReadLimiter class is already very simple:
While there might be room for further simplification in the InflightReadLimiter, I believe the current implementation strikes a good balance between functionality and simplicity. The core logic is quite straightforward and the additional code is mostly for metrics and monitoring, which are important for production systems. How would you suggest simplifying the InflightReadLimiter? |
d6c1669 to
b838cd1
Compare
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java
Outdated
Show resolved
Hide resolved
managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/PendingReadsManager.java
Outdated
Show resolved
Hide resolved
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #23901 +/- ##
============================================
+ Coverage 73.57% 74.18% +0.60%
+ Complexity 32624 31858 -766
============================================
Files 1877 1853 -24
Lines 139502 143733 +4231
Branches 15299 16337 +1038
============================================
+ Hits 102638 106627 +3989
+ Misses 28908 28694 -214
- Partials 7956 8412 +456
Flags with carried forward coverage won't be shown. Click here to find out more.
|
… replay queue reads (apache#23901) (cherry picked from commit c5173d5) (cherry picked from commit 37f0bc2)
… replay queue reads (apache#23901) (cherry picked from commit c5173d5) (cherry picked from commit 37f0bc2)
@BewareMyPower Please see issues #23504 and #23506. The original solution has never worked properly.
Where is this issue reported?
Sure, everyone has their right to independent opinions. If you look at the changes in this PR, they are all relevant for solving the issues. |
@BewareMyPower I have replied in #23901 (comment) to the questions. I hope that shades light into why this PR is relevant and necessary. Please let me know if you have any questions. |
|
I'm currently reviewing this PR so I can have a better understanding on it. Generally, we need a careful review on such a huge PR. But it's still hard. From my experiences, it's very easy to hide some bugs not noticed in a huge PR.
It's in SN's private repo, let me ping you in that issue. BTW, @shibd I think we should report it in the Apache repo as well. |
@BewareMyPower I don't disagree with you about the size of a PR. Large PRs are harder to review and there are more chances that the changes cause regressions. It's also good to notice that if bugs pass CI and all regression test suites, it's also a bug in the test suites, which should also be addressed. Some problems require larger PRs than others. If we start bashing contributors that fix broader issues, it's going to kill this project. I hope that instead of complaining about large PRs and regressions, please put the effort in improving test suites to catch the regressions and put the effort in reviews when it's needed. That's simply more productive for everyone.
Memory leaks could also be seen as a failure of our test suites to catch them. For addressing memory leaks, it would be possible to run unit tests with paranoid level and fail the tests if any leaks are detected. This would be an effective way to catch such problems as early as possible. |
To avoid misleading someone else, the memory leak issue is not caused by this PR, see #23955 |
|
Fix for change related PendingReadsManager deadlock: #23958 |
… replay queue reads (apache#23901)
… LTS - Remove `managedLedgerOffloadReadThreads` from apache#24025 - Remove `managedLedgerMaxReadsInFlightPermitsAcquireTimeoutMillis` and `managedLedgerMaxReadsInFlightPermitsAcquireQueueSize` from apache#23901 - Remove `managedLedgerCursorResetLedgerCloseTimestampMaxClockSkewMillis` from apache#22792 The configs above only increase the complexity and are hard to configure. Add more comments to `managedLedgerPersistIndividualAckAsLongArray` from apache#23759. This config was added to keep the compatibility from 3.x or earlier so it has value to retain. 3.x users must configure it with false when upgrading to 4.0.
Fixes #23504
Fixes #23506
Motivation
In the current implementation before this PR there are multiple problems:
managedLedgerMaxReadsInFlightSizeInMB, reported as [Bug] Redelivery (replay) queue reads don't get deduplicated or limited by managedLedgerMaxReadsInFlightSizeInMB #23504managedLedgerMaxReadsInFlightSizeInMBlimit doesn't work properly ifmanagedLedgerReadEntryTimeoutSecondsisn't set, reported as [Bug] managedLedgerMaxReadsInFlightSizeInMB limit doesn't work properly if managedLedgerReadEntryTimeoutSeconds isn't set #23506.In addition to this, even if
managedLedgerReadEntryTimeoutSecondsis set, when the limit is reached, there's no queuing and all threads will be retrying in a tight loop to acquire permits. This tight loop causes unnecessary CPU consumption and contention.Modifications
Enhanced permit acquisition:
managedLedgerMaxReadsInFlightPermitsAcquireQueueSizemanagedLedgerMaxReadsInFlightPermitsAcquireTimeoutMillisImproved size estimation:
managedLedgerMaxReadsInFlightSizeInMBlimit is now applied to replay queue readsDocumentation
docdoc-requireddoc-not-neededdoc-complete