Skip to content

os/bluestore: Add data segmentation#53794

Closed
aclamk wants to merge 1 commit intoceph:mainfrom
aclamk:wip-aclamk-bs-segmented-data
Closed

os/bluestore: Add data segmentation#53794
aclamk wants to merge 1 commit intoceph:mainfrom
aclamk:wip-aclamk-bs-segmented-data

Conversation

@aclamk
Copy link
Contributor

@aclamk aclamk commented Oct 3, 2023

Split object data into segments of conf.bluestore_segment_data_size bytes.
This means that no blob will be in two segments at the same time.
Modified reshard function to prefer segment separation lines.
As a result no spanning blobs are created.

This was originally a part of improve recompression effort,
but it is enabling manipulation of conf.bluestore_extent_map_shard_target_size configurable
even in compressed random write environments without significant performance hit.

Contribution Guidelines

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Split object data into segments of conf.bluestore_segment_data_size bytes.
This means that no blob will be in two segments at the same time.
Modified reshard function to prefer segment separation lines.
As a result no spanning blobs are created.

Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
desc: How long cleaner should sleep before re-checking utilization
default: 5
with_legacy: true
- name: bluestore_segment_data_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be bluestore_onode_segment_size?

p2roundup<uint64_t>(write_offset + segment_size, segment_size),
middle_offset + middle_length);
_do_write_big(txc, c, o, write_offset, segment_end - write_offset, p, wctx);
write_offset = segment_end;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing decrement of middle_length?

uint64_t write_offset = middle_offset;
while (write_offset < middle_offset + middle_length) {
uint64_t segment_end = std::min(
p2roundup<uint64_t>(write_offset + segment_size, segment_size),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you might get more than a single segment in a blob with this implementation.
Shouldn't that be p2roundup(write_offset, segment_size) instead?

/*we will be too large if we wait for next segment*/) {
make_shard_here = true;
}
next_boundary = p2align(e->blob_end() + data_segment_size, data_segment_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if blob_end() points exactly to the next segment (e.g. blob_start = 0 and blob_end() = 1M and segment size = 1M) - then next boundary to be equal to 2MB which is apparently not what we expect (1M)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed to p2roundup.

if (onode_data_has_boundaries) {
if (e->blob_start() >= next_boundary) {
// this it the place we want to have shard boundary
if ((estimate >= target /*we have enough already*/) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we could neglect shard size growing above - wouldn't it be better to apply segment size boundary unconditionally here as well? I.e. always enforce shard at this boundary irrespective of estimate value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deal is that size of encoding varies hugely on presence of checksums.
For 4K checksums I get 5.5K of metadata per 4MB.
For 2K checksums I get 3.5K of metadata per 4MB.
For no checksums I get 1.5K of metadata per 4MB.
Maybe the preferred segment size should be dependent on checksum type, but sharding should be fixed?
I am actually for fixed shard size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE:
All test on random write
split = 4k/16:8k/10:12k/9:16k/8:20k/7:24k/7:28k/6:32k/6:36k/5:40k/5:44k/4:48k/4:52k/4:56k/3:60k/3:64k/3
Compressed 50% , no csum = 4443 bytes / 4MB
Compressed 50%, crc32c = 8776 bytes / 4MB
No-compress, no csum = 2686 bytes / 4MB
No-compress, crc32c = 7082 bytes / 4MB

@github-actions
Copy link

github-actions bot commented Jan 3, 2024

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Jan 3, 2024
@github-actions
Copy link

github-actions bot commented Feb 2, 2024

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Feb 2, 2024
@aclamk aclamk mentioned this pull request Mar 5, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants