Conversation
Split object data into segments of conf.bluestore_segment_data_size bytes. This means that no blob will be in two segments at the same time. Modified reshard function to prefer segment separation lines. As a result no spanning blobs are created. Signed-off-by: Adam Kupczyk <akupczyk@ibm.com>
| desc: How long cleaner should sleep before re-checking utilization | ||
| default: 5 | ||
| with_legacy: true | ||
| - name: bluestore_segment_data_size |
There was a problem hiding this comment.
may be bluestore_onode_segment_size?
| p2roundup<uint64_t>(write_offset + segment_size, segment_size), | ||
| middle_offset + middle_length); | ||
| _do_write_big(txc, c, o, write_offset, segment_end - write_offset, p, wctx); | ||
| write_offset = segment_end; |
There was a problem hiding this comment.
Missing decrement of middle_length?
| uint64_t write_offset = middle_offset; | ||
| while (write_offset < middle_offset + middle_length) { | ||
| uint64_t segment_end = std::min( | ||
| p2roundup<uint64_t>(write_offset + segment_size, segment_size), |
There was a problem hiding this comment.
IIUC you might get more than a single segment in a blob with this implementation.
Shouldn't that be p2roundup(write_offset, segment_size) instead?
| /*we will be too large if we wait for next segment*/) { | ||
| make_shard_here = true; | ||
| } | ||
| next_boundary = p2align(e->blob_end() + data_segment_size, data_segment_size); |
There was a problem hiding this comment.
what if blob_end() points exactly to the next segment (e.g. blob_start = 0 and blob_end() = 1M and segment size = 1M) - then next boundary to be equal to 2MB which is apparently not what we expect (1M)
There was a problem hiding this comment.
Fixed. Changed to p2roundup.
| if (onode_data_has_boundaries) { | ||
| if (e->blob_start() >= next_boundary) { | ||
| // this it the place we want to have shard boundary | ||
| if ((estimate >= target /*we have enough already*/) || |
There was a problem hiding this comment.
As we could neglect shard size growing above - wouldn't it be better to apply segment size boundary unconditionally here as well? I.e. always enforce shard at this boundary irrespective of estimate value?
There was a problem hiding this comment.
The deal is that size of encoding varies hugely on presence of checksums.
For 4K checksums I get 5.5K of metadata per 4MB.
For 2K checksums I get 3.5K of metadata per 4MB.
For no checksums I get 1.5K of metadata per 4MB.
Maybe the preferred segment size should be dependent on checksum type, but sharding should be fixed?
I am actually for fixed shard size.
There was a problem hiding this comment.
UPDATE:
All test on random write
split = 4k/16:8k/10:12k/9:16k/8:20k/7:24k/7:28k/6:32k/6:36k/5:40k/5:44k/4:48k/4:52k/4:56k/3:60k/3:64k/3
Compressed 50% , no csum = 4443 bytes / 4MB
Compressed 50%, crc32c = 8776 bytes / 4MB
No-compress, no csum = 2686 bytes / 4MB
No-compress, crc32c = 7082 bytes / 4MB
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
Split object data into segments of
conf.bluestore_segment_data_sizebytes.This means that no blob will be in two segments at the same time.
Modified reshard function to prefer segment separation lines.
As a result no spanning blobs are created.
This was originally a part of improve recompression effort,
but it is enabling manipulation of
conf.bluestore_extent_map_shard_target_sizeconfigurableeven in compressed random write environments without significant performance hit.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows