-
Notifications
You must be signed in to change notification settings - Fork 962
Allow to bypass journal for writes #2401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@merlimat Are we betting that the replication factor of data and low MTTR of bookies in public cloud would be good enough that we would never loose all copies of data? Or are you okay with all the data going away in case of catastrophe? What is the standard configuration that you use? |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is good and we already discussed about it with @jvrao and @sijie
I see a few limitations with this proposal:
- we are going to violate fencing and LAC in case of bookie restart
- we are not coordinating with checkpoint, lastMark mechanism....(the position in the journal does not advance?)
- the client is not aware of what's going on
I would accept this patch if we make that the bookie is not able to restart, this way as soon as the bookie is up and running every guarantee is safe, when you reboot the bookie it is like you lost the bookie forever.
Otherwise I was thinking with my colleague @nicoloboschi about a more complex solution, with a new WriteFlag.
But we are still working on BP41 and we had not the chance to share our ideas with the community.
Drafting a BP would also be a good step as this will be a significant feature for BK
(We should also add tests that cover the change)
|
@eolivelli we have another change that handles the situation on reboot. The problem here isn't so much about losing the replica of an entry. The problem is losing a replica which we have claimed to persist. The solution we have is an integrity checker, which on checks on boot (after a non-clean shutdown), whether the ledgers on the bookie have been closed. If they have not, they are marked a in limbo, which means that they will not be able to respond "I don't have this entry" because that could cause data loss during ledger recovery operations. Equally, an in limbo ledger will never accept a write, as we don't know if that ledger has been fenced. |
|
@ivankelly so we are on the same page. Can you share the second part of the story as a PR? |
|
@eolivelli there are a bunch of dependent changes to get that change in, such as some work to make Bookie an interface, dependency injection, changes in cookie handling, etc. I do have a backlog item to start merging these, but time as always is my greatest foe. I won't get around to looking at this until mid September at least. IMO, there's no rush in getting the change in this PR in. I think we should go a bit further and completely remove the journal in the case that it is disabled, but again, time hasn't permitted. |
|
Yes I agree Let's keep this PR open as a reminder. I am leaving my 'request changes' vote I minimal design document about this important change would be very awesome. |
|
@merlimat @ivankelly do you have plans to port all of this precious work to the community repository? I am very interested in this work and I would like not to duplicate the work and especially diverge on the implementation. From my side I would like to have a WriteFlag on the client side in order to control this behaviour, because in EmailSuccess.com we are sharing the BK cluster among lots of different components and only one (https://github.com/diennea/blobit) is very interested in this change, because it helps a lot in reducing the overall write amplification. |
|
@eolivelli How would |
|
It would be good to have journal writes controlled by the client on a per ledger basis. From Pulsar, then this can be controlled on a topic/application level (like all the qourum settings that can be set at the client), and would be useful in any environment, even where disks are not ephemeral. |
|
@joefk it looks like that @ivankelly and @merlimat are going to contribute this fix with a cluster wide configuration. |
|
@merlimat @ivankelly |
|
@eolivelli I think it's important to get this PR in, even though the integrity check is not ready. This is a valid mode of running that uses different tradeoffs compared to the default behavior. Nonetheless it's a very useful mode for several use cases. This does not preclude a more fine graned approach later on. But in many cases, a cluster wide settings is more than appropriate. |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@merlimat can we add a unit test (no need to start a bookie)?
|
What do you mean exactly with "no need to start a bookie" ? And what kind of test? |
|
I mean that we can mock the components and just verify the sequence of calls internally to the bookie. Btw add the test the way you prefer |
|
The test is about verifying that we are not calling the Journal with that configuration option |
|
@merlimat if you want I can pickup this branch and add a test case if you do not have cycles |
|
@rdhabalia , @ivankelly - any comments or concerns about this PR? |
|
@eolivelli @hsaputra added test |
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm
### Motivation After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow. ``` replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN ``` ### Modification 1. add label empty check for `PrometheusTextFormatUtil` 2. add label scope check test cover 3. add prometheus metric regex pattern check in test case Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com> This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits: 8590704 [hangc0276] format code a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/** 732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14 8dc108b [Matteo Merli] y 73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable 034ef85 [Shoothzj] Fix logger member not correct; (#2605) b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658) 683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650) 8091096 [Matteo Merli] Allow to bypass journal for writes (#2401) 63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710) 87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
### Motivation After add label for prometheus metric by #2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow. ``` replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN ``` ### Modification 1. add label empty check for `PrometheusTextFormatUtil` 2. add label scope check test cover 3. add prometheus metric regex pattern check in test case Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com> This closes #2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits: 8590704 [hangc0276] format code a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/** 732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14 8dc108b [Matteo Merli] y 73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable 034ef85 [Shoothzj] Fix logger member not correct; (#2605) b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (#2658) 683ad45 [Matteo Merli] Allow to attach labels to metrics (#2650) 8091096 [Matteo Merli] Allow to bypass journal for writes (#2401) 63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (#2710) 87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (#2707)
### Motivation After add label for prometheus metric by apache#2650, it will cause prometheus metric format check failed when no label specified for a statsLogger. The metric list as follow. ``` replication_bookkeeper_client_bookkeeper_client_bookie_watcher_NEW_ENSEMBLE_TIME{success="false",quantile="0.9999", } NaN ``` ### Modification 1. add label empty check for `PrometheusTextFormatUtil` 2. add label scope check test cover 3. add prometheus metric regex pattern check in test case Reviewers: lipenghui <penghui@apache.org>, Andrey Yegorov <None>, Matteo Merli <mmerli@apache.org>, Jia Zhai <zhaijia@apache.org>, Addison Higham <None>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2718 from hangc0276/chenhang/fix_bookeeper_metric_bug and squashes the following commits: 8590704 [hangc0276] format code a6942d4 [chenhang] fix prometheus metric provider bug and add test to cover label scope and metric format check bb8b1e0 [Andrey Yegorov] Include gradle files into the source artifact for releases, exclude site2/** 732b6cf [Andrey Yegorov] [maven-release-plugin] prepare for next development iteration b0d9f10 [Andrey Yegorov] [maven-release-plugin] prepare branch branch-4.14 8dc108b [Matteo Merli] y 73e22ca [Don Inghram] ISSUE2620: RocksDB log path configurable 034ef85 [Shoothzj] Fix logger member not correct; (apache#2605) b824a60 [hangc0276] fix always select the same region set bug for RegionAwareEnsemblePlacementPolicy (apache#2658) 683ad45 [Matteo Merli] Allow to attach labels to metrics (apache#2650) 8091096 [Matteo Merli] Allow to bypass journal for writes (apache#2401) 63867a9 [Matteo Merli] Impose a memory limit on the bookie journal (apache#2710) 87579b0 [Matteo Merli] Read entry error should print lastAddConfirmed in the log (apache#2707)
Motivation
In some deployment scenarios, applications might want to skip the journal, avoiding the double writing of data to disks.
One such examples is a deployment in a cloud environment with writes on locally attached disks, where absolute throughput is more important than strict data durability. In this case a journal is not providing a lot of value, considering that if the VM is terminated, the local disk is gone.
In such deployments, it's better to just rely on application level replication of data instead of the properties of the journal.
Changes
Added option to bypass the journal at the bookie level.