Skip to content

crimson/os/seastore: add fine-grained cache#50421

Closed
ljx023 wants to merge 6 commits intoceph:mainfrom
ljx023:wip-seastore-fine-grained-cache
Closed

crimson/os/seastore: add fine-grained cache#50421
ljx023 wants to merge 6 commits intoceph:mainfrom
ljx023:wip-seastore-fine-grained-cache

Conversation

@ljx023
Copy link

@ljx023 ljx023 commented Mar 7, 2023

The fine-grained cache is introduced to improve the performance and reduce read amplification when reading small chunks of data from large extents. Overall, we add a structure named "BufferSpace" to control the small buffers in CachedExtent so that it can load the exact required data from disk. Now the ObjectDataBlock can be incomplete. Some paths are modified to adapt to this change.

Signed-off-by: Jianxin Li jianxin1.li@intel.com

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Jianxin Li added 2 commits March 1, 2023 11:42
Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
construct BufferSpace only for ObjectDataBlock.

Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
@ljx023 ljx023 requested a review from a team as a code owner March 7, 2023 11:35
@ljx023
Copy link
Author

ljx023 commented Mar 7, 2023

image
The performance of randread from large extents does increase after adding this patch.

Jianxin Li added 4 commits March 7, 2023 13:34
get unload interval in extent and load them in parallel.

Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
full extent got from transaction if not fully loaded

Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
@ljx023 ljx023 force-pushed the wip-seastore-fine-grained-cache branch from d0309da to 65f7d8d Compare March 7, 2023 13:48
version(other.version),
poffset(other.poffset) {}
CachedExtent(extent_len_t length) : ptr(ceph::bufferptr(
buffer::create_page_aligned(length))) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new Constructor necessary? It seems a little bit similar to CachedExtent::(ceph::bufferptr &&), maybe only one of them needs to be preserved?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is the user of this new constructor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is prepared for all types of CachedExtent other than ObjectBlock. Since I modify the constructor in get_extent, I add the new constructor in case. The old one is still used in some places. If we ensure CachedExtent(extent_len_t length) is used only for ObjectBlock or modify the CachedExtent(ceph::bufferptr &&ptr) in other places, only one of them is needed. (I haven't checked them)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ensure CachedExtent(extent_len_t length) is used only for ObjectBlock or modify the CachedExtent(ceph::bufferptr &&ptr) in other places, only one of them is needed.

There is no user of CachedExtent(extent_len_t) because it is newly introduced in this PR.

Looks ObjectDataBlock is using another constructor CachedExtent(extent_len_t length, build_space_t) below.

Probably we only need explicit CachedExtent(extent_len_t length) : buffer_space(length) {}, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized where CachedExtent(extent_len_t) is used, refer to the comment #50421 (comment)

if (ret->is_ptr()) {
return seastar::make_ready_future<CachedExtentRef>(ret);
} else {
touch_extent(*ret);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we should touch_extent no matter whether ret->is_ptr(), am I right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, this seems to be a duplicated return clause

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that we should touch_extent no matter whether ret->is_ptr(), am I right?

I have no idea since the old implementation did not. Should we touch_extent when we find it in a transaction? @cyx1231st

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea since the old implementation did not. Should we touch_extent when we find it in a transaction? @cyx1231st

Ah, sorry, I didn't notice it's retrieved from the transaction, just ignore this comment, please.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we touch_extent when we find it in a transaction?

Following the existing logic, touch_extent is only called when an extent is firstly added to the transaction, or it is added to the cache.

So looks it is not necessary to touch here.

@@ -572,6 +572,16 @@ class Cache {
return ret->wait_io().then([ret] {
return seastar::make_ready_future<CachedExtentRef>(ret);
});
Copy link
Contributor

@xxhdx1985126 xxhdx1985126 Mar 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this return be removed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I forgot it. It's a mistake.

Copy link
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking.

Mostly code style suggestions to make the PR easier to read.

The comments are examples, please scan the PR and fix the others.

get_extent_ertr::ready_future_marker{},
std::move(ret));
}
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the same line.

} else {

return seastar::do_with(
alloc_cache_buf(aligned_length),
[extent, this, start, aligned_length](auto &ptr){
return epm.read(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reduce the unnecessary indention: make sure that the nested logic only has 2 extra spaces, not 4.

For example:

return extent_callback->with_transaction_intr(
src,
"clean_reclaim_space",
[this, &backref_extents, &pin_list, &reclaimed](auto &t)
{
return seastar::do_with(
std::vector<CachedExtentRef>(backref_extents),
[this, &t, &reclaimed, &pin_list](auto &extents)
{
LOG_PREFIX(SegmentCleaner::do_reclaim_space);
// calculate live extents
auto cached_backref_entries =

if (ret->is_interval_contained(e_off, e_len)) {
// ret may be invalid, caller must check
return get_extent_ret<T>(
get_extent_ertr::ready_future_marker{},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indention needs to be increased by 2.

}
}

friend class CachedExtent;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the friend declaration, and define public interfaces instead. And don't expose unnecessary interfaces as public.

Organize the public interfaces before the private interfaces.

No protected section is needed if there is no inheritance.

/// Actual data contents
ceph::bufferptr ptr;
/// Actual data contents of extent if present
std::optional<ceph::bufferptr> ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the extra space before ptr.

poffset(other.poffset) {
if (other.ptr.has_value()) {
ptr = std::make_optional<ceph::bufferptr>
(other.ptr->c_str(), other.ptr->length());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally:

ptr = std::make_optional<ceph::bufferptr>(
  other.ptr->c_str(), other.ptr->length());

if(offset == 0 && bptr.length() == get_length()) {
ptr = std::move(bptr);
}
else { buffer_space._add_buffer(offset, std::move(bptr)); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start with a new line:

} else {
  buffer_space._add_buffer(offset, std::move(bptr));
}

LOG_PREFIX(TransactionManager::pin_to_extent);
SUBTRACET(seastore_tm, "getting extent {}", t, *pin);
SUBTRACET(seastore_tm, "getting extent {}, data offset {} length {}",
t, *pin, offset, length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong indention

if (!cached) {
auto ret = CachedExtent::make_cached_extent_ref<T>(
alloc_cache_buf(length));
length);
Copy link
Member

@cyx1231st cyx1231st Mar 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks this is where (also line 313 below) the new constructor CachedExtent(extent_len_t length) is used. Why the original CachedExtent::make_cached_extent_ref<T>(alloc_cache_buf(length)) cannot work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me there is no need to introduce 2 new constructors to CachedExtent for the purpose.

CachedExtent(extent_len_t length) should be sufficient to reserve the length in this path.

Copy link
Member

@cyx1231st cyx1231st left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general logic looks correct to me.

}

/// Returns length of valid extent
extent_len_t get_valid_length() const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_valid_length() -> get_loaded_length()

return ret;
}

bool is_ptr() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool is_ptr()
->
bool is_fully_loaded() const

).si_then([=, this, &t](auto extent)
-> get_extents_if_live_ret {
if (extent && extent->get_length() == len) {
if (extent && extent->get_valid_length() == len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks not accurate.

How about:
if (extent && extent->get_length() == len && extent->is_fully_loaded()) {
?

return ptr.has_value();
}

bool is_interval_contained(extent_len_t offset, extent_len_t length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool is_range_loaded(offset, length) const

if (!cached) {
auto ret = CachedExtent::make_cached_extent_ref<T>(
alloc_cache_buf(length));
length);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me there is no need to introduce 2 new constructors to CachedExtent for the purpose.

CachedExtent(extent_len_t length) should be sufficient to reserve the length in this path.

}

/// add buffer to bufferspace if fragmented
void add_buffer(extent_len_t offset, ceph::bufferptr&& bptr) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to load_range() to be consistent.

}

/// get buffer by given offset and length.
ceph::bufferlist get_buffer(extent_len_t offset, extent_len_t length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to get_range(), seems can be const.

extent->set_io_wait();
auto old_length = extent->get_valid_length();
return seastar::do_with(
extent->read_buffer(offset, length),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even for the complete read, the following path of read_buffer() -> add_buffer() -> update_lru_size() can still be reused correctly with minor adjustions, right?

epm.get_block_size());
extent_len_t aligned_length = end - start;
return seastar::do_with(
alloc_cache_buf(aligned_length),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think alloc_cache_buf() (i.e. allocating page aligned buffer) is not necessary for partial reads that are fragmented, block aligned should be fine and has better performance.

}

/// convert bufferspace to ptr if full
void check_and_rebuild() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to maybe_build_bptr()

@github-actions
Copy link

github-actions bot commented May 9, 2023

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@github-actions
Copy link

github-actions bot commented Sep 7, 2023

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 7, 2023
@github-actions
Copy link

github-actions bot commented Oct 7, 2023

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@cyx1231st
Copy link
Member

cyx1231st commented May 13, 2024

I'm going to implement it based on the latest code, starting from cleanups e.g. #57432

Let me know if anyone is also working on this.

Update: @ljx023 will start to work on this.

@cyx1231st
Copy link
Member

cyx1231st commented May 13, 2024

Refering to #55449 (comment), seastore_max_data_allocation_size might be considerred to be disabled by default /removed after this feature.

@xxhdx1985126
Copy link
Contributor

xxhdx1985126 commented May 13, 2024

Refering to #55449 (comment), seastore_max_data_allocation_size might be considerred to be disabled/removed after this feature.

I think we still need seastore_max_data_allocation_size when full integrity check enabled, so that the read amplification won't be so large. I think we can drop seastore_max_data_allocation_size once we have fine-grained checksums.

@cyx1231st
Copy link
Member

You're correct @xxhdx1985126 , I have adjusted my comments, see above.

@ljx023 ljx023 deleted the wip-seastore-fine-grained-cache branch May 23, 2024 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants