osd: optimize PG removal (part1&2 of 2)#37496
Conversation
|
Here is an overview on the issue and proposed fixes: https://docs.google.com/presentation/d/1Qid__UuHmE5PhVmFT8aviZADuiLp32zzbhq7dNwaGsA/edit?usp=sharing |
|
jenkins test classic perf |
|
b0e1adc to
ce14fcd
Compare
|
@ifed01 needs rebase |
ce14fcd to
c42dc51
Compare
cb70eb6 to
bd092aa
Compare
|
jenkins test api |
|
10d797c to
36159ca
Compare
tchaikov
left a comment
There was a problem hiding this comment.
i reran the failed tests in https://pulpito.ceph.com/kchai-2021-08-03_00:34:30-rados-wip-kefu-testing-2021-08-02-1223-1-distro-basic-smithi/ at https://pulpito.ceph.com/kchai-2021-08-03_10:14:51-rados-wip-kefu-testing2-2021-08-03-1601-distro-basic-smithi/. the later was compiled from master + this PR. so i think this changeset is suspicious.
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
Pulpito Run Rados Suite Results : Only this PR was included in the following run: All Runs & Failures Tracked here: https://pad.ceph.com/p/master_runs_tracking Related Failures Summary (JobIDs):
|
It seems that the optimization in this pr targets only scenarios in which non-rotational device is used as the backend, is this correct? If so, why not optimize rotational device scenarios? It looks to me that the same approach should also work for those scenarios. Thanks:-) |
right, this isn't supported for rotational drives for now.
Technically this should be doable but we haven't tested that for those drives. And I'm hesitant to support more complex scenarios for RocksDB above spinning drives. RocksDB is mainly aimed to work with fast(!) devices and IMO it makes sense to discourage its usage for spinning drives as much as possible. We've seen a lot of bad user experience in these scenarios... |
if (_use_rotational_settings()) {
dout(10) << __func__
<< " bulk removal not supported for KV on a spinner drive"
<< dendl;
return -ENOTSUP;
}it looks like a typo, should use _use_db_rotational_settings instead. |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
@ifed01 is this ready for another round of teuthology testing? |
yeah, planning to give it a try once shaman completes the build |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
2 similar comments
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
The rationale is to have faster space reclaiming operation for bluestore which performs as minimal DB operations as possible. Additionally onode is trimmed out from the cache after it to ensure smoother collection reaping if any. Signed-off-by: Igor Fedotov <ifedotov@suse.com>
This includes ability to read (single key mode/prefix bulk mode) and clear omaps. Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Signed-off-by: Igor Fedotov <ifedotov@suse.com>
Due to improved pg removal performance one can get false positive error detection on the above keyphrase when running ec-inconsistent-hinfo task. Signed-off-by: Igor Fedotov <igor.fedotov@croit.io>
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
hi |
|
@ShimTanny this PR is probably not the right place for discussing existing release code. Maybe try the users mailing list? |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
This patch reworks bulk PG removal by utilizing that fact that both onode and OMAP naming scheme are "PG-grouped" now.
Hence one can apply single-shot ranged delete op (well - two ranged delete ops in fact) to remove everything related to a PG from RocksDB. The tricky thing is that we should release disk space occupied by the deleted onodes so that bulk removal is prepended with space reclamation stage which enumerates all the onodes and releases their extents. Due to RocksDB implementation this looks more beneficial as we replace multiple DB deletes with multiple DB updates. The former are known to cause DB performance degradation once applied in a bulky manner. Unfortunately the above-mentioned ranged deletes aren't ideal in this respect too so they are followed by async ranged compactions to eliminate potential DB degradation.
Finally the PG removal scenario is as follows:
Performance numbers are available at: https://docs.google.com/spreadsheets/d/17V2mXUDEMAFVmSC67o1rQtrWNnAm_itBgzdxh1vRJMY/edit?usp=sharing
Fixes: https://tracker.ceph.com/issues/47174 and related
Signed-off-by: Igor Fedotov ifedotov@suse.com
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume tox