os/bluestore: set upper and lower bounds on rocksdb omap iterators#45904
os/bluestore: set upper and lower bounds on rocksdb omap iterators#45904yuriw merged 3 commits intoceph:masterfrom
Conversation
48aa568 to
5cce1d3
Compare
|
jenkins test api |
|
jenkins test api |
| * If the specified IteratorBounds arg has both an upper and a lower bound defined, and they have equal placement hash | ||
| * strings, we can be sure that the entire iteration range exists in a single CF. In that case, we return the relevant | ||
| * CF handle. In all other cases, we return a nullptr to indicate that the specified bounds cannot necessarily be mapped | ||
| * to a single CF. |
There was a problem hiding this comment.
This is indeed elegant! Are there specific things that users can do in their config to ensure this is the case most of the time?
There was a problem hiding this comment.
It is always the case for omap, as long as the bluestore_rocksdb_cfs is set to the default value (or is using an alternative hash constraint that still maps all keys for an individual omap object to the same CF).
There was a problem hiding this comment.
OK, great - just wondering if it's worth throwing notes in the config docs so that users know what the optimized path is.
| * If the specified IteratorBounds arg has both an upper and a lower bound defined, and they have equal placement hash | ||
| * strings, we can be sure that the entire iteration range exists in a single CF. In that case, we return the relevant | ||
| * CF handle. In all other cases, we return a nullptr to indicate that the specified bounds cannot necessarily be mapped | ||
| * to a single CF. |
There was a problem hiding this comment.
This is indeed elegant! Are there specific things that users can do in their config to ensure this is the case most of the time?
5cce1d3 to
0839807
Compare
cbodley
left a comment
There was a problem hiding this comment.
otherwise looks great. i'll defer to bluestore folks for final approval
Limits RocksDB omap Seek operations to the relevant key range of the object's omap. This prevents RocksDB from unnecessarily iterating over delete range tombstones in irrelevant omap CF shards. Avoids extreme performance degradation commonly caused by tombstones generated from RGW bucket resharding cleanup. Also prefer CFIteratorImpl over ShardMergeIteratorImpl when we can determine that all keys within specified IteratorBounds must be in a single CF. Fixes: https://tracker.ceph.com/issues/55324 Signed-off-by: Cory Snyder <csnyder@iland.com>
0839807 to
850c16c
Compare
|
@cbodley should I tag someone specific from the bluestore side for review? |
|
Generally this looks good to me. I have a concern though: may be it's better to make this feature's enablement configurable given that we're planning to backport it ASAP and hence can provide quite a limited testing. |
|
@ifed01 It's probably a valid concern given how touchy this stuff can be. We've been bitten by strange corner cases in the past. |
|
@yuriw ^ |
| rocksdb::ColumnFamilyHandle *get_key_cf(const prefix_shards& shards, const char* key, const size_t keylen); | ||
| rocksdb::ColumnFamilyHandle *get_cf_handle(const std::string& prefix, const std::string& key); | ||
| rocksdb::ColumnFamilyHandle *get_cf_handle(const std::string& prefix, const char* key, size_t keylen); | ||
| rocksdb::ColumnFamilyHandle *get_cf_handle(const std::string& prefix, const IteratorBounds& bounds); |
There was a problem hiding this comment.
Love that this new function is made similar to existing ones.
…isabled Add osd_rocksdb_iterator_bounds_enabled config option to allow rocksdb iterator bounds to be disabled. Also includes minor refactoring to shorten code associated with IteratorBounds initialization in bluestore. Signed-off-by: Cory Snyder <csnyder@iland.com>
ab4141c to
ca3ccd9
Compare
|
@yuriw extra commit has been added ^ |
…rBounds) Adds a precondition to RocksDBStore::get_cf_handle(string, IteratorBounds) to avoid duplicating logic of the only caller (RocksDBStore::get_iterator). Assertions will fail if preconditions are not met. Signed-off-by: Cory Snyder <csnyder@iland.com>
| } | ||
| auto iter = cf_handles.find(prefix); | ||
| if (iter == cf_handles.end() || iter->second.hash_l != 0) { | ||
| return nullptr; |
There was a problem hiding this comment.
I see here a check for hash_l==0.
This makes the implementation valid.
|
jenkins test make check |
|
@neha-ojha anything else that I can do to get this one merged? |
|
@cfsnyder it's going through some final testing. We plan to get this merged today once the testing is validated. |
|
http://pulpito.front.sepia.ceph.com/yuriw-2022-04-22_13:56:48-rados-wip-yuri2-testing-2022-04-22-0500-distro-default-smithi/ Failures, unrelated: Details: |
|
jenkins test make check |
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
|
This change broke upgrades, from octopus to pacific at least. Reverting #45963 backport from pacific restores omap functionality. See https://tracker.ceph.com/issues/55444 for details. |
Limits RocksDB omap Seek operations to the relevant key range of the object's omap.
This prevents RocksDB from unnecessarily iterating over delete range tombstones in
irrelevant omap CF shards. Avoids extreme performance degradation commonly caused
by tombstones generated from RGW bucket resharding cleanup. Also prefer CFIteratorImpl
over ShardMergeIteratorImpl when we can determine that all keys within specified
IteratorBounds must be in a single CF.
Fixes: https://tracker.ceph.com/issues/55324
Signed-off-by: Cory Snyder csnyder@iland.com
Checklist