Skip to content

Conversation

@merlimat
Copy link
Contributor

@merlimat merlimat commented May 7, 2021

Motivation

Currently, the journal has a configurable max queue size, with default to 10,000 entries. This can be used as a proxy measure to control the max amount of memory that has to be retained from the entries that are in the journal queue.

This is not very flexible and there are few potential problems that arise:

  1. If using multiple journals, each one of them gets 10,000 entries
  2. entries can be of very different sizes, from few bytes to several MBs, so it can be difficult to estimate the memory usage
  3. if entries are big, it can cause the bookie to go OOM when the queue gets filled up

Instead (or additionally) of number of entries, we should use the total amount of memory as the metric for enabling the backpressure.

Changes

Added MemoryLimitController class, already used in Pulsar (apache/pulsar#8965) and use it to block the thread adding to the journal queue when memory is exhausted.

Defaulting to use 5% of max direct memory (eg: 100MB out of 2GB)

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to disable this feature in case there is some problem ?

like configuring -1 as memory limit in order to disable it

@merlimat
Copy link
Contributor Author

merlimat commented May 7, 2021

like configuring -1 as memory limit in order to disable it

As documented in the bk_server.conf setting to 0, disables the limit

@merlimat
Copy link
Contributor Author

merlimat commented May 7, 2021

@eolivelli PTAL again

@dlg99 dlg99 modified the milestones: 4.14.0, 4.15.0 May 7, 2021
@merlimat
Copy link
Contributor Author

merlimat commented May 7, 2021

@dlg99 I marked this on purpose for 4.14

# Set the max amount of memory that can be used by the journal.
# If empty, this will be set to use 5% of available direct memory
# Setting it to 0, it will disable the max memory control for the journal.
# journalMaxMemorySizeMb=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looks like this will be additional limit to t he max Journal Q size?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. that would still be in place

@dlg99 dlg99 modified the milestones: 4.15.0, 4.14.0 May 7, 2021
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm
@dlg99 please merge before packaging 4.14

@merlimat merlimat merged commit 63867a9 into apache:master May 8, 2021
@merlimat merlimat deleted the journal-mem-limit branch May 8, 2021 06:07
Copy link
Contributor

@hsaputra hsaputra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

merlimat pushed a commit that referenced this pull request May 24, 2021
### Motivation
After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow.
```
replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN
```

### Modification
1. add label empty check for `PrometheusTextFormatUtil`
2. add label scope check test cover
3. add prometheus metric regex pattern check in test case

Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits:

8590704 [hangc0276] format code
a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check
bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/**
732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration
b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14
8dc108b [Matteo Merli] y
73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable
034ef85 [Shoothzj] Fix logger member not correct; (#2605)
b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658)
683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650)
8091096 [Matteo Merli] Allow to bypass journal for writes (#2401)
63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710)
87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
merlimat pushed a commit that referenced this pull request May 29, 2021
### Motivation
After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow.
```
replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN
```

### Modification
1. add label empty check for `PrometheusTextFormatUtil`
2. add label scope check test cover
3. add prometheus metric regex pattern check in test case

Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits:

8590704 [hangc0276] format code
a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check
bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/**
732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration
b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14
8dc108b [Matteo Merli] y
73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable
034ef85 [Shoothzj] Fix logger member not correct; (#2605)
b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658)
683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650)
8091096 [Matteo Merli] Allow to bypass journal for writes (#2401)
63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710)
87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
Sunny-Island pushed a commit to Sunny-Island/bookkeeper that referenced this pull request Jun 2, 2021
### Motivation
After add label for prometheus metric by apache#2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow.
```
replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN
```

### Modification
1. add label empty check for `PrometheusTextFormatUtil`
2. add label scope check test cover
3. add prometheus metric regex pattern check in test case

Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits:

8590704 [hangc0276] format code
a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check
bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/**
732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration
b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14
8dc108b [Matteo Merli] y
73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable
034ef85 [Shoothzj] Fix logger member not correct; (apache#2605)
b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (apache#2658)
683ad45 [Matteo Merli] Allow to attach labels to metrics (apache#2650)
8091096 [Matteo Merli] Allow to bypass journal for writes (apache#2401)
63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (apache#2710)
87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (apache#2707)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants