bluestore: do not _colletion_list when _remove_collection#49617
bluestore: do not _colletion_list when _remove_collection#49617
Conversation
Very high latencies on Bluestore::_collection_list() operations during PG removal. These latencies regularly exceed the OSD suicide timeout and cause the OSDs to thrash. We can remove the check here. Since PG::do_delete_work does check with collection_list to ensure there is no objects in db for the coll before remove_collection. Fixes: https://tracker.ceph.com/issues/58274 Signed-off-by: haoyixing <haoyixing@kuaishou.com>
|
I do agree that it would be prudent to avoid iterating over the PG namespace twice during PG deletion, since we know that iterating over the tombstones can be so painful, however I would argue that it makes more sense to remove the iteration within |
|
Although, with closer inspection of the code, I do acknowledge that even the bluestore level check does not provide a "definitive" guarantee that there are no objects left behind since the check does not happen transactionally. |
Based on this, it seems that we can still remove the check in bluestore, since it does not provide guarantees and there is already a check in OSD, and OSD is the main user of bluestore so bluestore no need to check itselves. |
|
Honestly I have conflicting feelings about this patch. |
In PG::do_delete_work, the parameter max of collection list was set to osd_target_transaction_size (default 30), but in bluestore, it was set to nonexistent_count (num of deleted objects) which could be far more large than 30. I think (personally) that's the reason why osd assert when bluestore _collection_list. The former does small checks for several times, the latter does one big check. @ifed01 |
Not all So removing one of those two iterations will be significant, but we still need further optimization for the other remaining iteration if we're going to really solve the issue. |
|
My latest thought on this problem is that we might consider adding an optional deadline parameter to If we were to go that route, then neither of these two |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Very high latencies on Bluestore::_collection_list() operations during PG removal. These latencies regularly exceed the OSD suicide timeout and cause the OSDs to thrash. We can remove the check here. Since PG::do_delete_work does check with collection_list
to ensure there is no objects in db for the coll before remove_collection.
Fixes: https://tracker.ceph.com/issues/58274
Signed-off-by: haoyixing haoyixing@kuaishou.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows