Skip to content

Conversation

@merlimat
Copy link
Contributor

Motivation

There are certain metrics reported by bookies or clients that are including the bookie name in the metric name. This is typically an anti-pattern in many metric collections systems. For example, this can create a big number of metrics in Prometheus and it will severely impact the its performance.

In general, metric collection systems support the concept of labels that can be attached to metrics.

Modifications

  • Added the labels in the BK stats API
  • Prometheus implementation is using the labels instead of putting that as part of the metric name
  • Other stats providers will keep by default using the label in the metric name for backward compatibility.

@merlimat merlimat added this to the 4.14.0 milestone Mar 17, 2021
@merlimat merlimat self-assigned this Mar 17, 2021
Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@nicoloboschi please take a look

Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@eolivelli
Copy link
Contributor

@merlimat
we have a failing test, related to this patch

Error:  Tests run: 9, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 0.545 s <<< FAILURE! - in org.apache.bookkeeper.stats.prometheus.TestPrometheusMetricsProvider
Error:  org.apache.bookkeeper.stats.prometheus.TestPrometheusMetricsProvider.testCache  Time elapsed: 0.117 s  <<< FAILURE!
java.lang.AssertionError: expected same:<org.apache.bookkeeper.stats.prometheus.PrometheusStatsLogger@5f341870> was not:<org.apache.bookkeeper.stats.prometheus.PrometheusStatsLogger@553f17c>
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.failNotSame(Assert.java:828)
	at org.junit.Assert.assertSame(Assert.java:771)
	at org.junit.Assert.assertSame(Assert.java:782)
	at org.apache.bookkeeper.stats.prometheus.TestPrometheusMetricsProvider.testCache(TestPrometheusMetricsProvider.java:54)

@sursingh
Copy link
Contributor

@merlimat : With this change we are not using caching metrics provider. So caching tests are not longer valid. So the test should be removed

@merlimat
Copy link
Contributor Author

@merlimat : With this change we are not using caching metrics provider. So caching tests are not longer valid. So the test should be removed

Good point. I'll get this PR back in shape.

@dlg99 dlg99 modified the milestones: 4.14.0, 4.15.0 May 7, 2021
@merlimat merlimat modified the milestones: 4.15.0, 4.14.0 May 8, 2021
@merlimat merlimat merged commit 683ad45 into apache:master May 8, 2021
@merlimat merlimat deleted the metric-labels branch May 8, 2021 13:28
merlimat pushed a commit that referenced this pull request May 24, 2021
### Motivation
After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow.
```
replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN
```

### Modification
1. add label empty check for `PrometheusTextFormatUtil`
2. add label scope check test cover
3. add prometheus metric regex pattern check in test case

Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits:

8590704 [hangc0276] format code
a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check
bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/**
732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration
b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14
8dc108b [Matteo Merli] y
73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable
034ef85 [Shoothzj] Fix logger member not correct; (#2605)
b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658)
683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650)
8091096 [Matteo Merli] Allow to bypass journal for writes (#2401)
63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710)
87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
merlimat pushed a commit that referenced this pull request May 29, 2021
### Motivation
After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow.
```
replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN
```

### Modification
1. add label empty check for `PrometheusTextFormatUtil`
2. add label scope check test cover
3. add prometheus metric regex pattern check in test case

Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits:

8590704 [hangc0276] format code
a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check
bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/**
732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration
b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14
8dc108b [Matteo Merli] y
73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable
034ef85 [Shoothzj] Fix logger member not correct; (#2605)
b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658)
683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650)
8091096 [Matteo Merli] Allow to bypass journal for writes (#2401)
63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710)
87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
Sunny-Island pushed a commit to Sunny-Island/bookkeeper that referenced this pull request Jun 2, 2021
### Motivation
After add label for prometheus metric by apache#2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow.
```
replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN
```

### Modification
1. add label empty check for `PrometheusTextFormatUtil`
2. add label scope check test cover
3. add prometheus metric regex pattern check in test case

Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com>

This closes apache#2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits:

8590704 [hangc0276] format code
a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check
bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/**
732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration
b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14
8dc108b [Matteo Merli] y
73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable
034ef85 [Shoothzj] Fix logger member not correct; (apache#2605)
b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (apache#2658)
683ad45 [Matteo Merli] Allow to attach labels to metrics (apache#2650)
8091096 [Matteo Merli] Allow to bypass journal for writes (apache#2401)
63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (apache#2710)
87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (apache#2707)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants