Skip to content

bluestore: Elastic Shared Blobs 3 - main part#51441

Merged
rzarzynski merged 34 commits intoceph:mainfrom
aclamk:wip-aclamk-bs-esb-3-extentmap-dup
Sep 13, 2023
Merged

bluestore: Elastic Shared Blobs 3 - main part#51441
rzarzynski merged 34 commits intoceph:mainfrom
aclamk:wip-aclamk-bs-esb-3-extentmap-dup

Conversation

@aclamk
Copy link
Contributor

@aclamk aclamk commented May 11, 2023

This is part 3/4 of ESB work.
It modifies ExtentMap::dup() and adds all sub-functions required to make reuse of shared blobs.
This part can be enabled / disabled in runtime.
#51439
#51440
#51441
#51442

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

aclamk and others added 18 commits May 10, 2023 11:47
After introduction of lazy statfs updates and mechanism to store them at exit,
some tests required tune-up.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Fix it, so it can be enabled and work.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add more checks on consistency.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Upgrade local foreach_shared_blob into _fsck_foreach_shared_blob
that can be used on entire BlueStore scope.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
When we do fsck with non-repair mode, we do not get any info about shared blobs
that actually were corrupted. Now we print them.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Make faster exit when sharding not enabled.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
When reshard is applied for the first time, expand reshard range to encompas whole object.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
maybe_reshard is created to filter out unnecessary calls to request_reshard.
The intended use is to let just request maybe_reshard, and delegate check
if the action is really necessary to the implementation detail level.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
When encode_some fails twice ceph aborts.
Now we print object details just before.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modifed bluestore_blob_t to include current size of csum_data.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Add printing of len to operator<< for const bluestore_blob_t.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modified TransContext.
Changed
std::set<SharedBlobRef> shared_blobs_written
to
std::set<BlobRef> blobs_written

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Move finish_write from SharedBlob to Blob.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
This is necessary to enable adding more Buffers to Blobs that are shared.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Adapt split_cache to new situation.

Now buffers are attached to Blob, and we need always move them,
regardless that we already moved relevant SharedBlob.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Fix blobs having the same empty shared blob.

Each blob on creation gets its own unique (empty) SharedBlob object.
ExtentMap::dup() sometimes merges blobs together, so 2 different blobs
get the same SharedBlob object.

Function _do_remove() tries to convert shared blobs into regular ones.
If it succeeds we could get 2 blobs having the same EMPTY SharedBlob object.

The solution is to create detached SharedBlob if necessary.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
With BufferSpace now attached to Blob (was SharedBlob inside it),
on-the-fly 'writing' buffers must be copied to clones.
Otherwise those objects will read data from disk before it is written there.

Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Copy link
Contributor

@ifed01 ifed01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two major questions regarding this part:

  1. make_range_shared_maybe_merge implementation looks suspicious to me - see comments inline
  2. do we really need this "ifdef WITH_ESB" stuff? Wouldn't be having runtime switch between the original and new dup functionality sufficient?

if (copy_used_in_blob) {
used_in_blob = from.used_in_blob;
} else {
ceph_assert(from.blob.is_compressed());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: IMO this and the following assertions makes a little sense here. The function has got copy_used_in_blob parameter which defines the desired behavior. Other external factors (e.g. whether blob is compressed or not) which don't prevent the mutation shouldn't be considered at all - we can init used_in_blob in any case so let's do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any feedback here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion here is important. It checks against future attempts to reuse Blob::dup() in improper state.
It is true that for calling from dup_esb() the assert does not make sense.
But in future someone could attempt to call it of compressed blobs.

std::multimap<uint64_t /*blob_start*/, Blob*> candidates;
scan_shared_blobs(c, oldo, srcoff, length, candidates);

for (auto ep = oldo->extent_map.seek_lextent(srcoff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you get ep_start from scan_shared_blobs you might get rid off seek_lextent call here.

_dup_writing requires locking of BufferCacheShard

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-bs-esb-3-extentmap-dup branch from f01ab9e to dab04d0 Compare June 3, 2023 08:38
@aclamk aclamk force-pushed the wip-aclamk-bs-esb-3-extentmap-dup branch from dab04d0 to 29d22e7 Compare June 27, 2023 18:20
aclamk and others added 14 commits July 6, 2023 15:26
By moving BufferSpace from SharedBlob to Blob tracking of num_blobs get broken.
Fixed that and reinforced by adding asserts to BufferCacheShard destructor.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
The reason was historical - it was to give one access method to bc regardles where it was defined SharedBlob or Blob.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Signed-off-by: Adam Kupczyk <akupczyk@redhat.com>
Add functions that can copy parts of blobs.
It is necessary for merging blobs together,
which happens on cloning (ExtentMap::dup).

Fixed:
Blob::copy_extents_over_empty was faulty when insertion was targetting
last extent and that extent was invalid(empty).

Add dup() for bluestore_blob_t and bluestore_blob_use_tracker.

Changed:
Modified Blob::copy_from for better readability.
Added bluestore_blob_t::adjust_to initization that conforms to
other blob specifics.

Move assert for is_mutable() out of bluestore_blob_t::add_tail,
so it can be used in blobs that are shared.

Modify bluestore_blob_use_tracker_t::get to automatically expand
when accessing more AUs then originally declared.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Two new functions:
1) can_marge_blob checks if 2 blobs are compatible to be merged together
2) merge_blob merges 2 blobs into 1, emptying source and putting all to destination

Modify merge_blob() to return logical length of produced blob.

Clear "unused" bitmap

When make blob shared or merge it with other blob, clear unused.
It drops some potential optimizations for writing into large blobs,
but it is unlikely to be useful.
Such info can be useful only after it reverts from Shared to regular blob.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
When we punch_hole in blobs we leave Buffers unchanged.
Normally it is not a problem, but when we merge blobs there is a collision.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Introduce function that will scan through relevant range that is to be cloned,
and function that will find best matching blob to attach to.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Func reblob_extents is used to modify extents.
It is final step of melding 2 blobs together.
It removes reference to blob that is to be phased out,
and replaces it with reference to the blob that is the sum of them.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Adds function that converts a specified range in object to shared blobs,
possibly merging them with other shared blobs.

Modify make_range_shared_maybe_merge to allow for merging blobs.
It now uses can_merge_blobs and merge_blobs.

Add discard_unused_buffers() to make_range_shared.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Make ExtentMap::dup able to reuse some already existing shared blobs,
when a regular blob has to be transformed to shared blob.

Make ExtentMap::dup() use make_range_shared_maybe_merge a primary tool for cloning.

Modify ExtentMap::dup to make a diligent merge of blobs.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Only copy Buffers that were not copied before.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
1. Rename ExtentMap::dup to ExtentMap::dup_esb
2. Ressurect ExtentMap::dup

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Having new ExtentMap::dup that introduces heavy changes to blob processing
seems risky. We will enable it only on demand. In future, once the feature
is tested in production, the choice should be removed (and feature always on).

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
Modify _do_clone_range to select variant of ExtentMap::dup depending on elastic_shared_blob.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@aclamk aclamk force-pushed the wip-aclamk-bs-esb-3-extentmap-dup branch from 29d22e7 to d01c308 Compare July 18, 2023 12:40
@aclamk aclamk requested a review from ifed01 July 20, 2023 08:21
@aclamk
Copy link
Contributor Author

aclamk commented Jul 25, 2023

jenkins test api

if (copy_used_in_blob) {
used_in_blob = from.used_in_blob;
} else {
ceph_assert(from.blob.is_compressed());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any feedback here?

Remove unnecessary arguments from functions.
Reduce unneeded indirections for accessing extent_map.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
@ifed01
Copy link
Contributor

ifed01 commented Aug 10, 2023

jenkins test make check

@rzarzynski rzarzynski merged commit fd009f0 into ceph:main Sep 13, 2023
@rzarzynski
Copy link
Contributor

Merged as a part of #53178.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants