Skip to content

Conversation

@zymap
Copy link
Member

@zymap zymap commented Mar 25, 2022


Master issue: #3231

Motivation

When there has a bookie is slow to respond the add entry request,
and the AQ is smaller than WQ, the client will hole the entry buffer
and that will cause the memory won't be released.

More context:
apache/pulsar#14861

Modifications

Add the memory limit for the client to the send add request.

@zymap zymap self-assigned this Mar 25, 2022
@zymap
Copy link
Member Author

zymap commented Mar 25, 2022

The test come with next commit

@zymap zymap changed the title Add memory limiter for the add entry request to avoid OOM [WIP] Add memory limiter for the add entry request to avoid OOM Mar 25, 2022
@dlg99
Copy link
Contributor

dlg99 commented Mar 28, 2022

@zymap Have you tried configuring backpressure for the BK client and server? I think it covers your case.

@zymap
Copy link
Member Author

zymap commented Mar 28, 2022

@dlg99 Thanks for your information. But looks like the maxAddsInProgressLimit only applied on the v3 request?

@zymap
Copy link
Member Author

zymap commented Apr 21, 2022

I tried the backpressure with V2 protocol. It doesn't work. Because it is a server limitation, when there has a server slow to respond, the other server will respond successfully, but the client will hold the entry buffer until the slowest response is returned.

Same issue: https://lists.apache.org/thread/r2yh6gxgnhzzz35o63db040wt2t6o108

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have left some feedback.

Please add tests as well.

@merlimat FYI

private final BookieAddressResolver bookieAddressResolver;

private final long bookieErrorThresholdPerInterval;
private Optional<MemoryLimitController> memoryLimitController;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final?

}
}

private static class WriteAndFlushCallbackImpl implements WriteAndFlushCallback {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be probably recycled

private Optional<WriteAndFlushCallback> setMemoryLimit(final long entryId, final long entrySize) throws InterruptedException {
if (getMemoryLimitController().isPresent()) {
MemoryLimitController mlc = getMemoryLimitController().get();
mlc.reserveMemory(entrySize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should handle this with a timeout otherwise we may block forever?

"clientConnectBookieUnavailableLogThrottling";

// client memory limit options
protected static final String CLIENT_MEMORY_LIMIT_ENABLED = "clientMemoryLimitEnabled";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is only for writes.
We should reflect this in the name

@hangc0276
Copy link
Contributor

This issue has been resolved by #3321 , #3323 , #3324 , we can close this PR.

@hangc0276 hangc0276 closed this Jul 25, 2022
@eolivelli
Copy link
Contributor

@hangc0276 I don't think that the issue is resolved.
We still have the problem that you may have a unbounded number of pending writes

@zymap what do you think ?

@eolivelli eolivelli reopened this Sep 27, 2022
@zymap zymap marked this pull request as draft September 28, 2022 07:12
zymap added 2 commits October 11, 2022 13:05
---

*Motivation*

When there has a bookie is slowly to respond the add entry request,
and the AQ is smaller than WQ, the client will hole the entry buffer
and that will cause the memory won't be released.

More context:
apache/pulsar#14861

*Modifications*

Add the memory limit for the client to send add request.
@zymap zymap force-pushed the client-memory-limit branch from fa285bf to a55b91b Compare October 11, 2022 08:44
@zymap
Copy link
Member Author

zymap commented Oct 11, 2022

The test output looks like works as expected, I will try to add more tests and clean the code.

logs
2022-10-11T16:56:19,449 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$1@68] - Write state changed to false
2022-10-11T16:56:19,449 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:19,449 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 0
2022-10-11T16:56:20,400 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 1024
2022-10-11T16:56:20,400 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 2048
2022-10-11T16:56:20,400 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 3072
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 4096
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 5120
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 6144
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 7168
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 8192
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 9216
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:BookieClientMemoryCounterTest$1@68] - Write state changed to true
2022-10-11T16:56:20,401 - INFO  - [Time-limited test:BookieClientMemoryCounterTest@78] - wait for the memory released
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 8192
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 7168
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 6144
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 5120
2022-10-11T16:56:20,450 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 4096
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 3072
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 2048
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 1024
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$1@68] - Write state changed to false
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:20,451 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 0
2022-10-11T16:56:21,403 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 1024
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 2048
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 3072
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 4096
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 5120
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 6144
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 7168
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 8192
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:WriteMemoryCounter@49] - increment the size to 9216
2022-10-11T16:56:21,404 - INFO  - [Time-limited test:BookieClientMemoryCounterTest$1@68] - Write state changed to true
2022-10-11T16:56:21,405 - INFO  - [Time-limited test:BookieClientMemoryCounterTest@78] - wait for the memory released
2022-10-11T16:56:21,454 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:BookieClientMemoryCounterTest$2@84] - Add complete with rc 0
2022-10-11T16:56:21,455 - INFO  - [BookKeeperClientWorker-OrderedExecutor-0-0:WriteMemoryCounter@57] - decrement the size to 8192

@zymap zymap changed the title [WIP] Add memory limiter for the add entry request to avoid OOM Add memory limiter for the add entry request to avoid OOM Oct 18, 2022
@zymap zymap marked this pull request as ready for review October 18, 2022 12:15
}

public long getWriteMemoryLowWaterMark() {
return getInt(WRITE_MEMORY_LOW_WATER_MARK, 64 * 1024 * 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLong?

}

public long getWriteMemoryHighWaterMark() {
return getInt(WRITE_MEMORY_HIGH_WATER_MARK, 256 * 1024 * 1024);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getLong?

*/
public class WriteWaterMark {
private static final long DEFAULT_LOW_WATER_MARK = 1610612736L; // 1.5GB
private static final long DEFAULT_HIGH_WATER_MARK = 2147483648L; // 2GB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the default lowWaterMark=64MB and HighWaterMark=256MB in ClientConfiguration, but default lowWaterMark=1.5GB and HighWaterMark=2GB in WriteWaterMark?

@Slf4j
public class WriteMemoryCounter {
private final WriteWaterMark writeWaterMark;
private AtomicLong sizeCounter = new AtomicLong(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final type?


public void decrementPendingWriteBytes(long size) {
long usage = sizeCounter.addAndGet(-size);
log.info("decrement the size to {}", usage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to debug level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants