crimson/os/seastore: support partial read for data extents#60654
crimson/os/seastore: support partial read for data extents#60654
Conversation
6b0a2fc to
bf70e77
Compare
|
jenkins test make check |
bf70e77 to
b413c70
Compare
|
Changeset: fixed mistakes causing test errors. |
|
Updated the performance impact analysis: #60654 (comment) |
|
jenkins test docs |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
| desc: Max size in bytes that an extent can be, 0 to disable | ||
| default: 0 |
There was a problem hiding this comment.
No reason to keep this option beside nice-to-have, right?
There was a problem hiding this comment.
We still need it, otherwise the read amplification can be very large in scenarios where full extent integrity check is needed.
There was a problem hiding this comment.
Right, CRC is still validated at the extent granularity, not 4K-grained. So when 4KB is loaded from a 4MB extent, CRC can't be checked.
|
^ Fixing: see commit "implement and use maybe_indirect_extent_t::get_bl()" |
8cb3b52 to
b1b90d6
Compare
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…ents Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Mostly convert length to the hex format. Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
… to get absent extent Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Make it easier for TM::read_pin() users to consume extent without worrying about the indirections. This basically reverts 9cdcd06 Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…reads Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com> Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
…ject_data_handler Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com> Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com> Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
…ault Supposing that fine-grained-cache should address the read amplification issue. By-default disable seastore_max_data_allocation_size with fine-grained-cache since seastore_full_integrity_check is by-default disabled. Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Jianxin Li <jianxin1.li@intel.com>
…se of read Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
…rect_extent_t::get_bl() Return bufferlist because the extent may be partially loaded under indirection. Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
… rewritting it Signed-off-by: Yingxin Cheng <yingxin.cheng@intel.com>
39cca24 to
e4efceb
Compare
|
Rebased to see if the make check can pass. |
|
jenkins test make check |
|
jenkins test docs |
|
jenkins test make check |
1 similar comment
|
jenkins test make check |
This PR supersedes #57787
The first 10 commits are cleanups and preparations, the rest commits are the implementation:
In
CachedExtent, introduce aBufferSpaceto maintain/index partially loaded buffers. And once fully loaded, convert to page aligned bptr.Cachemust tolerate that in LRU, extents may be partially loaded, and they may be read by concurrent transactions. This is also possible for EXIST_CLEAN extents.This PR is expected to control/minimize read amplification to 1x when the data extent sizes are inconsistent with the read sizes.
CC @ljx023 @zhscn
Performance impacts
To roughly understand the impacts, the tests were simplified.
Test scenarios
In the same local environment, based on main (d6dfc1c):
A. baseline_limit_32K: seastore max-alloc-size=32KB
B. baseline_unlimited: seastore max-alloc-size disabled
C. fine-grained-cache: seastore max-alloc-size disabled
D. classic+bluestore: osd_op_num_shards = 32 (no further tuning applied)
Constraints
3.1. rampup: fill the image with 4KB sequential writes
3.2. workload: for each client, 60 seconds, depth=128, 4KB random reads
Results
A. baseline_limit_32K:
B. baseline_unlimited:
C. fine-grained-cache:
D. classic+bluestore:
Short/rough conclusions
In this specifc setting, after this PR:
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e