-
Notifications
You must be signed in to change notification settings - Fork 962
fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy #2658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy #2658
Conversation
|
rerun failure checks |
...er-server/src/main/java/org/apache/bookkeeper/client/RegionAwareEnsemblePlacementPolicy.java
Outdated
Show resolved
Hide resolved
| List<BookieId> ensemble2 = repp.newEnsemble(3, 3, 2, | ||
| null, new HashSet<>()).getResult(); | ||
| ensemble1.retainAll(ensemble2); | ||
| assert(ensemble1.size() >= 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we depend on a Random number, is this test going to be flaky ? probably not, but I would like to double check your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the lastRegionIndex initialized to specific value, all bookkeeper clients will start with the same region, which may lead to hotpot for regions. However those bookkeeper clients create ledger speed won't keep sync, starting will the same region won't make much difference. I changed the initialized value to 0.
|
rerun failure checks |
1 similar comment
|
rerun failure checks |
|
@eolivelli @merlimat @sijie Please take a look, thanks. |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
### Motivation After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow. ``` replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN ``` ### Modification 1. add label empty check for `PrometheusTextFormatUtil` 2. add label scope check test cover 3. add prometheus metric regex pattern check in test case Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com> This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits: 8590704 [hangc0276] format code a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/** 732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14 8dc108b [Matteo Merli] y 73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable 034ef85 [Shoothzj] Fix logger member not correct; (#2605) b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658) 683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650) 8091096 [Matteo Merli] Allow to bypass journal for writes (#2401) 63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710) 87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
### Motivation After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow. ``` replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN ``` ### Modification 1. add label empty check for `PrometheusTextFormatUtil` 2. add label scope check test cover 3. add prometheus metric regex pattern check in test case Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com> This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits: 8590704 [hangc0276] format code a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/** 732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14 8dc108b [Matteo Merli] y 73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable 034ef85 [Shoothzj] Fix logger member not correct; (#2605) b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658) 683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650) 8091096 [Matteo Merli] Allow to bypass journal for writes (#2401) 63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710) 87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
### Motivation After add label for prometheus metric by apache#2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow. ``` replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN ``` ### Modification 1. add label empty check for `PrometheusTextFormatUtil` 2. add label scope check test cover 3. add prometheus metric regex pattern check in test case Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits: 8590704 [hangc0276] format code a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/** 732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14 8dc108b [Matteo Merli] y 73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable 034ef85 [Shoothzj] Fix logger member not correct; (apache#2605) b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (apache#2658) 683ad45 [Matteo Merli] Allow to attach labels to metrics (apache#2650) 8091096 [Matteo Merli] Allow to bypass journal for writes (apache#2401) 63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (apache#2710) 87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (apache#2707)
Motivation
When use
RegionAwareEnsemblePlacementPolicyto select ensemble set, it always select the same region set for all ledger creation.The reason is that it traverse the availableRegions set in the same order for ensemble selection and select bookies in the same region set, which lead the load not balance between regions.
Changes
regionIndexfield to mark next select index for available region list, avoiding traverse from the same position for each ensemble selection.I will add the test latter.