os: simplify os::Transaction -- get rid of the Transaction::decode_bp()#59979
os: simplify os::Transaction -- get rid of the Transaction::decode_bp()#59979SrinivasaBharath merged 1 commit intoceph:mainfrom
Conversation
ronen-fr
left a comment
There was a problem hiding this comment.
LGTM (apart from one minor note)
3abba15 to
cf2d9de
Compare
|
jenkins retest this please |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
|
This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution! |
|
@rzarzynski do we still want this? |
|
@Matt1360, oops, yes! Thanks for pinging. Resurrecting the PR. @SrinivasaBharath, @ljflores: what's the QA status? |
|
Hey folks, while evaluating the QA run that included this PR, I found a couple error that might be related to this PR:
The functions modified in this PR are shown in the backtrace of both the above failures, so I suspect they might be relevant. Could we please check if the above errors are indeed related to this PR? Meanwhile, I've requested @SrinivasaBharath to re-run the batch excluding this PR as well. |
|
Adding DNM since this PR had problems. |
src/os/bluestore/BlueStore.cc
Outdated
| } else { | ||
| auto& b = o->onode.attrs[name.c_str()] = val; | ||
| b.reassign_to_mempool(mempool::mempool_bluestore_cache_meta); | ||
| if (!val.is_contiguous()) { |
There was a problem hiding this comment.
What about the empty bufferlist case?
bool buffer::list::is_contiguous() const
{
return _num <= 1;
}rebuild() doesn't help with that at all – a call to front() still would be invalid.
void buffer::list::rebuild()
{
if (_len == 0) {
_carriage = &always_empty_bptr;
_buffers.clear_and_dispose();
_num = 0;
return;
}
if ((_len & ~CEPH_PAGE_MASK) == 0)
rebuild(ptr_node::create(buffer::create_page_aligned(_len)));
else
rebuild(ptr_node::create(buffer::create(_len)));
}Before the change the emaptyiness was handled by creating bufferptr with zero-length data:
buffer::ptr::ptr(const char *d, unsigned l) : _off(0), _len(l) // ditto.
{
_raw = buffer::copy(d, l).release();
_raw->nref.store(1, std::memory_order_release);
bdout << "ptr " << this << " get " << _raw << bendl;
} ceph::unique_leakable_ptr<buffer::raw> buffer::copy(const char *c, unsigned len) {
auto r = buffer::create_aligned(len, sizeof(size_t));
memcpy(r->get_data(), c, len);
return r;
} ceph::unique_leakable_ptr<buffer::raw> buffer::create_aligned(
unsigned len, unsigned align) {
return create_aligned_in_mempool(len, align,
mempool::mempool_buffer_anon);
}
...
ceph::unique_leakable_ptr<buffer::raw> buffer::create_aligned_in_mempool(
unsigned len, unsigned align, int mempool)
{
// If alignment is a page multiple, use a separate buffer::raw to
// avoid fragmenting the heap.
//
// Somewhat unexpectedly, I see consistently better performance
// from raw_combined than from raw even when the allocation size is
// a page multiple (but alignment is not).
//
// I also see better performance from a separate buffer::raw once the
// size passes 8KB.
if ((align & ~CEPH_PAGE_MASK) == 0 ||
len >= CEPH_PAGE_SIZE * 2) {
#ifndef __CYGWIN__
return ceph::unique_leakable_ptr<buffer::raw>(new raw_posix_aligned(len, align));
#else
return ceph::unique_leakable_ptr<buffer::raw>(new raw_hack_aligned(len, align));
#endif
}
return raw_combined::create(len, align, mempool);
}`os::Transaction::decode_bp()` has only one user: `_setattrs()` of `BlueStore`. It uses that for optimization purposes: keeping up contigous space instead of potentially fragmented `bufferlist` that would require rectifying memcpy later. The problem is `_setattrs()` also needs to avoid keeping large raw buffers with only small subset being referenced. It achieves this by copying the data if `bufferptr:::is_partial()` returns `true`. However, this means the memcpy happens virtually always as it's hard to even imagine the `val`, decoded from the wire, can fulfill the 0 waste requirement. Therefore the optimization doesn't make sense; it only imposes costs in terms of complexity breaking the symmetry between encode and decode in `os::Transation` (there is no `encode_bp()`). This commit kills the optimization and simplifies `os::Transaction`. Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
cf2d9de to
e3a6680
Compare
|
Addressed the empty bufferlist case. Thanks for unveiling it! |
|
jenkins test api |
1 similar comment
|
jenkins test api |
|
Although this was merged prematurely, it passed QA. Rados approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrackercephcomissues69793 |
os::Transaction::decode_bp()has only one user:_setattrs()ofBlueStore. It uses that for optimization purposes: keeping up contigous space instead of potentially fragmentedbufferlistthat would require rectifying memcpy later.The problem is
_setattrs()also needs to avoid keeping large raw buffers with only small subset being referenced. It achieves this by copying the data ifbufferptr:::is_partial()returnstrue. However, this means the memcpy happens virtually always as it's hard to even imagine theval, decoded from the wire, can fulfill the requirement 0 waste.Therefore the optimization doesn't make sense; it only imposes costs in terms of complexity breaking the symmetry between encode and decode in
os::Transation(there is noencode_bp()).This commit kills the optimization and simplifies
os::Transaction.This commit has been dissected from a bigger branch of EC-related optimizations. It particularly helps 7858030 which in turn is useful for the @bill-scales's optimization of alignment in rep & ec ops.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e