osd/ReplicatedBackend: read data portion of pushing object in batch#29588
osd/ReplicatedBackend: read data portion of pushing object in batch#29588xiexingguo wants to merge 1 commit intoceph:masterfrom
Conversation
For sparse objects, this allows us to build the whole fie-map simultaneously by leveraging BlueStore's asynchronous reading support, which is 2x~4X faster if pushing objects get massively fragmentated. Signed-off-by: xie xingguo <xie.xingguo@zte.com.cn>
acd8eb0 to
f624bf3
Compare
tchaikov
left a comment
There was a problem hiding this comment.
i think this change depends on some assumptions:
- the scatter/gather I/O can offset the overhead introduced by reading the holes in
data_included - the memory fragmentation can be neglected.
i cannot find the analysis in the commit message. neither can i find the test result.
could you help address the concerns above?
|
my main concern is the performance of small randomized rd/wr, so I only measured the time spending to read the whole fie map content off the disk, namely build_bush_op since we barely use omap in our testbed. after this change, the time decreased to 0.2s(note that the objects were logically less fragmented (but physically more fragmented) since we kept writing more data in-between): |
|
@liewegas Care to comment? |
|
When I looked before I had the same concern as @tchaikov that there was an implicit tradeoff here.. it doesn't seem right to read the zeros in the holes only to muck around to toss them out. I think a more elegant solution would be to add a read_sparse() method to ObjectStore that allows BlueStore to do the read in parallel and efficiently. A wrapper can be put in ObjectStore.h that does basically what this PR does (either a fiemap + read the pieces, or read + fiemap + chop up result), and then BlueStore can implement it efficiently. What do you think? |
yeah, that was my first version. but since I was planning to backport this fix back into luminous, I just ended up posting a minimal (and hence much reliable) fix at the last minute... will come back to this later |
|
#30061 merged. |
For sparse objects, this allows us to build the whole fie-map
simultaneously by leveraging BlueStore's asynchronous reading
support, which is 2x~4X faster if pushing objects get massively
fragmentated.
Signed-off-by: xie xingguo xie.xingguo@zte.com.cn
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test docsjenkins render docs