pacific: revival and backport of fix for RocksDB optimized iterators#46096
Conversation
Limits RocksDB omap Seek operations to the relevant key range of the object's omap. This prevents RocksDB from unnecessarily iterating over delete range tombstones in irrelevant omap CF shards. Avoids extreme performance degradation commonly caused by tombstones generated from RGW bucket resharding cleanup. Also prefer CFIteratorImpl over ShardMergeIteratorImpl when we can determine that all keys within specified IteratorBounds must be in a single CF. Fixes: https://tracker.ceph.com/issues/55324 Signed-off-by: Cory Snyder <csnyder@iland.com> (cherry picked from commit 850c16c)
…isabled Add osd_rocksdb_iterator_bounds_enabled config option to allow rocksdb iterator bounds to be disabled. Also includes minor refactoring to shorten code associated with IteratorBounds initialization in bluestore. Signed-off-by: Cory Snyder <csnyder@iland.com> (cherry picked from commit ca3ccd9) Conflicts: src/common/options/osd.yaml.in Cherry-pick notes: - Conflicts due to option definition in common/options.cc in Pacific vs. common/options/osd.yaml.in in later releases
…rBounds) Adds a precondition to RocksDBStore::get_cf_handle(string, IteratorBounds) to avoid duplicating logic of the only caller (RocksDBStore::get_iterator). Assertions will fail if preconditions are not met. Signed-off-by: Cory Snyder <csnyder@iland.com> (cherry picked from commit 55ef16f)
…ounded iterator Iterator-bounding feature is introduced to make RocksDB iterators limited, so they would less likely traverse over tombstones. This is used when listing keys in fixed range, for example OMAPS for specific object. It is problematic when extending this logic to WholeSpaceIterator, since prefix must be taken into account. Fixes: https://tracker.ceph.com/issues/55444 Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
f150a99 to
9cdb2c1
Compare
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
|
This looks great! We're heavily impacted by https://tracker.ceph.com/issues/55324, so I really hope this PR can make it into 16.2.8! ❤️ |
|
I've tested this PR (applied to 16.2.7) on one of our test clusters with a lot of OMAP data. Without this PR, any rebalancing on the OMAP OSDs would lead to terrible performance until we compacted RocksDB to get rid of the tombstones, but with this PR applied, we don't get any performance hit (or small enough that we don't even notice). |
Thanks for testing this PR! It will be included in 16.2.8. |
neha-ojha
left a comment
There was a problem hiding this comment.
Test failures are unrelated https://pulpito.ceph.com/?branch=wip-yuri4-testing-2022-04-29-1830-pacific
Revival of:
#45963
Backport of:
#46095
Backport trackers: https://tracker.ceph.com/issues/55518 and https://tracker.ceph.com/issues/55442
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows