crimson/os/seastore: add fine-grained cache#50421
Conversation
Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
construct BufferSpace only for ObjectDataBlock. Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
get unload interval in extent and load them in parallel. Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
full extent got from transaction if not fully loaded Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
d0309da to
65f7d8d
Compare
| version(other.version), | ||
| poffset(other.poffset) {} | ||
| CachedExtent(extent_len_t length) : ptr(ceph::bufferptr( | ||
| buffer::create_page_aligned(length))) {} |
There was a problem hiding this comment.
Is this new Constructor necessary? It seems a little bit similar to CachedExtent::(ceph::bufferptr &&), maybe only one of them needs to be preserved?
There was a problem hiding this comment.
Who is the user of this new constructor?
There was a problem hiding this comment.
It is prepared for all types of CachedExtent other than ObjectBlock. Since I modify the constructor in get_extent, I add the new constructor in case. The old one is still used in some places. If we ensure CachedExtent(extent_len_t length) is used only for ObjectBlock or modify the CachedExtent(ceph::bufferptr &&ptr) in other places, only one of them is needed. (I haven't checked them)
There was a problem hiding this comment.
If we ensure CachedExtent(extent_len_t length) is used only for ObjectBlock or modify the CachedExtent(ceph::bufferptr &&ptr) in other places, only one of them is needed.
There is no user of CachedExtent(extent_len_t) because it is newly introduced in this PR.
Looks ObjectDataBlock is using another constructor CachedExtent(extent_len_t length, build_space_t) below.
Probably we only need explicit CachedExtent(extent_len_t length) : buffer_space(length) {}, right?
There was a problem hiding this comment.
Just realized where CachedExtent(extent_len_t) is used, refer to the comment #50421 (comment)
| if (ret->is_ptr()) { | ||
| return seastar::make_ready_future<CachedExtentRef>(ret); | ||
| } else { | ||
| touch_extent(*ret); |
There was a problem hiding this comment.
Seems that we should touch_extent no matter whether ret->is_ptr(), am I right?
There was a problem hiding this comment.
BTW, this seems to be a duplicated return clause
There was a problem hiding this comment.
Seems that we should
touch_extentno matter whetherret->is_ptr(), am I right?
I have no idea since the old implementation did not. Should we touch_extent when we find it in a transaction? @cyx1231st
There was a problem hiding this comment.
I have no idea since the old implementation did not. Should we
touch_extentwhen we find it in a transaction? @cyx1231st
Ah, sorry, I didn't notice it's retrieved from the transaction, just ignore this comment, please.
There was a problem hiding this comment.
Should we touch_extent when we find it in a transaction?
Following the existing logic, touch_extent is only called when an extent is firstly added to the transaction, or it is added to the cache.
So looks it is not necessary to touch here.
| @@ -572,6 +572,16 @@ class Cache { | |||
| return ret->wait_io().then([ret] { | |||
| return seastar::make_ready_future<CachedExtentRef>(ret); | |||
| }); | |||
There was a problem hiding this comment.
Should this return be removed?
There was a problem hiding this comment.
Sure, I forgot it. It's a mistake.
cyx1231st
left a comment
There was a problem hiding this comment.
Still looking.
Mostly code style suggestions to make the PR easier to read.
The comments are examples, please scan the PR and fix the others.
| get_extent_ertr::ready_future_marker{}, | ||
| std::move(ret)); | ||
| } | ||
| else { |
There was a problem hiding this comment.
Should be in the same line.
} else {
| return seastar::do_with( | ||
| alloc_cache_buf(aligned_length), | ||
| [extent, this, start, aligned_length](auto &ptr){ | ||
| return epm.read( |
There was a problem hiding this comment.
Let's reduce the unnecessary indention: make sure that the nested logic only has 2 extra spaces, not 4.
For example:
ceph/src/crimson/os/seastore/async_cleaner.cc
Lines 1068 to 1079 in 6e67bf8
| if (ret->is_interval_contained(e_off, e_len)) { | ||
| // ret may be invalid, caller must check | ||
| return get_extent_ret<T>( | ||
| get_extent_ertr::ready_future_marker{}, |
There was a problem hiding this comment.
indention needs to be increased by 2.
| } | ||
| } | ||
|
|
||
| friend class CachedExtent; |
There was a problem hiding this comment.
Drop the friend declaration, and define public interfaces instead. And don't expose unnecessary interfaces as public.
Organize the public interfaces before the private interfaces.
No protected section is needed if there is no inheritance.
| /// Actual data contents | ||
| ceph::bufferptr ptr; | ||
| /// Actual data contents of extent if present | ||
| std::optional<ceph::bufferptr> ptr; |
There was a problem hiding this comment.
Fix the extra space before ptr.
| poffset(other.poffset) { | ||
| if (other.ptr.has_value()) { | ||
| ptr = std::make_optional<ceph::bufferptr> | ||
| (other.ptr->c_str(), other.ptr->length()); |
There was a problem hiding this comment.
Normally:
ptr = std::make_optional<ceph::bufferptr>(
other.ptr->c_str(), other.ptr->length());
| if(offset == 0 && bptr.length() == get_length()) { | ||
| ptr = std::move(bptr); | ||
| } | ||
| else { buffer_space._add_buffer(offset, std::move(bptr)); } |
There was a problem hiding this comment.
Start with a new line:
} else {
buffer_space._add_buffer(offset, std::move(bptr));
}
| LOG_PREFIX(TransactionManager::pin_to_extent); | ||
| SUBTRACET(seastore_tm, "getting extent {}", t, *pin); | ||
| SUBTRACET(seastore_tm, "getting extent {}, data offset {} length {}", | ||
| t, *pin, offset, length); |
| if (!cached) { | ||
| auto ret = CachedExtent::make_cached_extent_ref<T>( | ||
| alloc_cache_buf(length)); | ||
| length); |
There was a problem hiding this comment.
Looks this is where (also line 313 below) the new constructor CachedExtent(extent_len_t length) is used. Why the original CachedExtent::make_cached_extent_ref<T>(alloc_cache_buf(length)) cannot work?
There was a problem hiding this comment.
Looks to me there is no need to introduce 2 new constructors to CachedExtent for the purpose.
CachedExtent(extent_len_t length) should be sufficient to reserve the length in this path.
cyx1231st
left a comment
There was a problem hiding this comment.
The general logic looks correct to me.
| } | ||
|
|
||
| /// Returns length of valid extent | ||
| extent_len_t get_valid_length() const { |
There was a problem hiding this comment.
get_valid_length() -> get_loaded_length()
| return ret; | ||
| } | ||
|
|
||
| bool is_ptr() { |
There was a problem hiding this comment.
bool is_ptr()
->
bool is_fully_loaded() const
| ).si_then([=, this, &t](auto extent) | ||
| -> get_extents_if_live_ret { | ||
| if (extent && extent->get_length() == len) { | ||
| if (extent && extent->get_valid_length() == len) { |
There was a problem hiding this comment.
Looks not accurate.
How about:
if (extent && extent->get_length() == len && extent->is_fully_loaded()) {
?
| return ptr.has_value(); | ||
| } | ||
|
|
||
| bool is_interval_contained(extent_len_t offset, extent_len_t length) { |
There was a problem hiding this comment.
bool is_range_loaded(offset, length) const
| if (!cached) { | ||
| auto ret = CachedExtent::make_cached_extent_ref<T>( | ||
| alloc_cache_buf(length)); | ||
| length); |
There was a problem hiding this comment.
Looks to me there is no need to introduce 2 new constructors to CachedExtent for the purpose.
CachedExtent(extent_len_t length) should be sufficient to reserve the length in this path.
| } | ||
|
|
||
| /// add buffer to bufferspace if fragmented | ||
| void add_buffer(extent_len_t offset, ceph::bufferptr&& bptr) { |
There was a problem hiding this comment.
Rename to load_range() to be consistent.
| } | ||
|
|
||
| /// get buffer by given offset and length. | ||
| ceph::bufferlist get_buffer(extent_len_t offset, extent_len_t length) { |
There was a problem hiding this comment.
rename to get_range(), seems can be const.
| extent->set_io_wait(); | ||
| auto old_length = extent->get_valid_length(); | ||
| return seastar::do_with( | ||
| extent->read_buffer(offset, length), |
There was a problem hiding this comment.
Even for the complete read, the following path of read_buffer() -> add_buffer() -> update_lru_size() can still be reused correctly with minor adjustions, right?
| epm.get_block_size()); | ||
| extent_len_t aligned_length = end - start; | ||
| return seastar::do_with( | ||
| alloc_cache_buf(aligned_length), |
There was a problem hiding this comment.
I think alloc_cache_buf() (i.e. allocating page aligned buffer) is not necessary for partial reads that are fragmented, block aligned should be fine and has better performance.
| } | ||
|
|
||
| /// convert bufferspace to ptr if full | ||
| void check_and_rebuild() { |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
|
Refering to #55449 (comment), |
I think we still need |
|
You're correct @xxhdx1985126 , I have adjusted my comments, see above. |

The fine-grained cache is introduced to improve the performance and reduce read amplification when reading small chunks of data from large extents. Overall, we add a structure named "BufferSpace" to control the small buffers in CachedExtent so that it can load the exact required data from disk. Now the ObjectDataBlock can be incomplete. Some paths are modified to adapt to this change.
Signed-off-by: Jianxin Li jianxin1.li@intel.com
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows