Skip to content

os: simplify os::Transaction -- get rid of the Transaction::decode_bp()#59979

Merged
SrinivasaBharath merged 1 commit intoceph:mainfrom
rzarzynski:wip-os-simplify-ostxn
Feb 10, 2025
Merged

os: simplify os::Transaction -- get rid of the Transaction::decode_bp()#59979
SrinivasaBharath merged 1 commit intoceph:mainfrom
rzarzynski:wip-os-simplify-ostxn

Conversation

@rzarzynski
Copy link
Contributor

os::Transaction::decode_bp() has only one user: _setattrs() of BlueStore. It uses that for optimization purposes: keeping up contigous space instead of potentially fragmented bufferlist that would require rectifying memcpy later.
The problem is _setattrs() also needs to avoid keeping large raw buffers with only small subset being referenced. It achieves this by copying the data if bufferptr:::is_partial() returns true. However, this means the memcpy happens virtually always as it's hard to even imagine the val, decoded from the wire, can fulfill the requirement 0 waste.
Therefore the optimization doesn't make sense; it only imposes costs in terms of complexity breaking the symmetry between encode and decode in os::Transation (there is no encode_bp()).

This commit kills the optimization and simplifies os::Transaction.


This commit has been dissected from a bigger branch of EC-related optimizations. It particularly helps 7858030 which in turn is useful for the @bill-scales's optimization of alignment in rep & ec ops.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

Copy link
Contributor

@ronen-fr ronen-fr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (apart from one minor note)

@rzarzynski rzarzynski force-pushed the wip-os-simplify-ostxn branch from 3abba15 to cf2d9de Compare September 26, 2024 12:22
@rzarzynski
Copy link
Contributor Author

jenkins retest this please

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Dec 22, 2024
@github-actions
Copy link

This pull request has been automatically closed because there has been no activity for 90 days. Please feel free to reopen this pull request (or open a new one) if the proposed change is still appropriate. Thank you for your contribution!

@github-actions github-actions bot closed this Jan 21, 2025
@Matt1360
Copy link
Member

@rzarzynski do we still want this?

@rzarzynski
Copy link
Contributor Author

@Matt1360, oops, yes! Thanks for pinging. Resurrecting the PR.

@SrinivasaBharath, @ljflores: what's the QA status?

@rzarzynski rzarzynski reopened this Jan 24, 2025
@github-actions github-actions bot removed the stale label Jan 24, 2025
@shraddhaag
Copy link
Contributor

Hey folks, while evaluating the QA run that included this PR, I found a couple error that might be related to this PR:

  1. In the job /a/skanta-2024-10-24_23:59:35-rados-wip-bharath3-testing-2024-10-23-1509-distro-default-smithi/7965769, we see the following error:
2024-10-25T00:21:31.491 INFO:tasks.workunit.client.0.smithi105.stderr:*** Caught signal (Segmentation fault) **
2024-10-25T00:21:31.491 INFO:tasks.workunit.client.0.smithi105.stderr: in thread 7f6e93ebe640 thread_name:ceph-objectstor
2024-10-25T00:21:31.491 INFO:tasks.workunit.client.0.smithi105.stderr: ceph version 19.3.0-5721-g0cc901e6 (0cc901e6c3bf85cea9dcd864be72472a5942b7dd) squid (dev)
2024-10-25T00:21:31.491 INFO:tasks.workunit.client.0.smithi105.stderr: 1: /lib64/libc.so.6(+0x3e6f0) [0x7f6e9923e6f0]
2024-10-25T00:21:31.491 INFO:tasks.workunit.client.0.smithi105.stderr: 2: (mempool::pool_t::adjust_count(long, long)+0x52) [0x7f6e9a2fe762]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 3: (ceph::buffer::v15_2_0::ptr::reassign_to_mempool(int)+0x37) [0x7f6e9a4a94c7]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 4: (BlueStore::_setattr(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&)+0x27e) [0x564460a4c15e]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x16d9) [0x564460a3c4a9]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 6: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2ef) [0x564460a3d07f]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 7: ceph-objectstore-tool(+0x38eb18) [0x5644604f8b18]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 8: ceph-objectstore-tool(+0x936bbd) [0x564460aa0bbd]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 9: fuse_fs_create()
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 10: /lib64/libfuse.so.2(+0x15812) [0x7f6e9afae812]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 11: /lib64/libfuse.so.2(+0x13d7c) [0x7f6e9afacd7c]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 12: /lib64/libfuse.so.2(+0x1f9ac) [0x7f6e9afb89ac]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 13: /lib64/libfuse.so.2(+0x109ad) [0x7f6e9afa99ad]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 14: /lib64/libc.so.6(+0x89c52) [0x7f6e99289c52]
2024-10-25T00:21:31.492 INFO:tasks.workunit.client.0.smithi105.stderr: 15: /lib64/libc.so.6(+0x10ec80) [0x7f6e9930ec80]
2024-10-25T00:21:31.541 INFO:tasks.workunit.client.0.smithi105.stderr:/home/ubuntu/cephtest/clone.client.0/qa/workunits/objectstore/test_fuse.sh: line 71: store_test_fuse_mnt/meta/all/#-1:7b3f43c4:::osd_superblock:0#/attr/keya: Software caused connection abort
2024-10-25T00:21:31.543 DEBUG:teuthology.orchestra.run:got remote process result: 1
  1. In the job /a/skanta-2024-10-24_23:59:35-rados-wip-bharath3-testing-2024-10-23-1509-distro-default-smithi/7965782, we see a similar error:
2024-10-25T00:31:57.596 INFO:teuthology.orchestra.run.smithi114.stderr:*** Caught signal (Segmentation fault) **
2024-10-25T00:31:57.596 INFO:teuthology.orchestra.run.smithi114.stderr: in thread 7f5c7c747cc0 thread_name:ceph_test_objec
2024-10-25T00:31:57.600 INFO:teuthology.orchestra.run.smithi114.stderr: ceph version 19.3.0-5721-g0cc901e6 (0cc901e6c3bf85cea9dcd864be72472a5942b7dd) squid (dev)
2024-10-25T00:31:57.600 INFO:teuthology.orchestra.run.smithi114.stderr: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f5c7d94c520]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 2: (mempool::pool_t::adjust_count(long, long)+0x52) [0x7f5c7e156e42]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 3: (ceph::buffer::v15_2_0::ptr::reassign_to_mempool(int)+0x37) [0x7f5c7e2f49f7]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 4: (BlueStore::_setattr(BlueStore::TransContext*, boost::intrusive_ptr<BlueStore::Collection>&, boost::intrusive_ptr<BlueStore::Onode>&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::buffer::v15_2_0::list&)+0x273) [0x55bd3ade8dc3]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 5: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x1366) [0x55bd3add7eb6]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 6: (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2e2) [0x55bd3add8d62]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 7: ceph_test_objectstore(+0x9a1375) [0x55bd3b26b375]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 8: ceph_test_objectstore(+0x37b77c) [0x55bd3ac4577c]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 9: (StoreTest_SimpleAttrTest_Test::TestBody()+0x6a0) [0x55bd3aba2220]
2024-10-25T00:31:57.601 INFO:teuthology.orchestra.run.smithi114.stderr: 10: (testing::Test::Run()+0xe3) [0x55bd3ae83393]
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 11: ceph_test_objectstore(+0x5db115) [0x55bd3aea5115]
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 12: ceph_test_objectstore(+0x5db2e5) [0x55bd3aea52e5]
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 13: (testing::internal::UnitTestImpl::RunAllTests()+0x805) [0x55bd3ae900d5]
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 14: (testing::UnitTest::Run()+0xa3) [0x55bd3ae8c603]
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 15: main()
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 16: /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f5c7d933d90]
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 17: __libc_start_main()
2024-10-25T00:31:57.602 INFO:teuthology.orchestra.run.smithi114.stderr: 18: _start()
2024-10-25T00:31:57.763 DEBUG:teuthology.orchestra.run:got remote process result: None
2024-10-25T00:31:57.764 ERROR:teuthology.run_tasks:Saw exception from tasks.

The functions modified in this PR are shown in the backtrace of both the above failures, so I suspect they might be relevant. Could we please check if the above errors are indeed related to this PR? Meanwhile, I've requested @SrinivasaBharath to re-run the batch excluding this PR as well.

@ljflores
Copy link
Member

ljflores commented Feb 7, 2025

Adding DNM since this PR had problems.

} else {
auto& b = o->onode.attrs[name.c_str()] = val;
b.reassign_to_mempool(mempool::mempool_bluestore_cache_meta);
if (!val.is_contiguous()) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the empty bufferlist case?

  bool buffer::list::is_contiguous() const
  {
    return _num <= 1;
  }

rebuild() doesn't help with that at all – a call to front() still would be invalid.

  void buffer::list::rebuild()
  {
    if (_len == 0) {
      _carriage = &always_empty_bptr;
      _buffers.clear_and_dispose();
      _num = 0;
      return;
    }
    if ((_len & ~CEPH_PAGE_MASK) == 0)
      rebuild(ptr_node::create(buffer::create_page_aligned(_len)));
    else
      rebuild(ptr_node::create(buffer::create(_len)));
  }

Before the change the emaptyiness was handled by creating bufferptr with zero-length data:

  buffer::ptr::ptr(const char *d, unsigned l) : _off(0), _len(l)    // ditto.
  {
    _raw = buffer::copy(d, l).release();
    _raw->nref.store(1, std::memory_order_release);
    bdout << "ptr " << this << " get " << _raw << bendl;
  }
  ceph::unique_leakable_ptr<buffer::raw> buffer::copy(const char *c, unsigned len) {
    auto r = buffer::create_aligned(len, sizeof(size_t));
    memcpy(r->get_data(), c, len);
    return r;
  }
  ceph::unique_leakable_ptr<buffer::raw> buffer::create_aligned(
    unsigned len, unsigned align) {
    return create_aligned_in_mempool(len, align,
                                     mempool::mempool_buffer_anon);
  }
...
  ceph::unique_leakable_ptr<buffer::raw> buffer::create_aligned_in_mempool(
    unsigned len, unsigned align, int mempool)
  {
    // If alignment is a page multiple, use a separate buffer::raw to
    // avoid fragmenting the heap.
    //
    // Somewhat unexpectedly, I see consistently better performance
    // from raw_combined than from raw even when the allocation size is
    // a page multiple (but alignment is not).
    //
    // I also see better performance from a separate buffer::raw once the
    // size passes 8KB.
    if ((align & ~CEPH_PAGE_MASK) == 0 ||
        len >= CEPH_PAGE_SIZE * 2) {
#ifndef __CYGWIN__
      return ceph::unique_leakable_ptr<buffer::raw>(new raw_posix_aligned(len, align));
#else
      return ceph::unique_leakable_ptr<buffer::raw>(new raw_hack_aligned(len, align));
#endif
    }
    return raw_combined::create(len, align, mempool);
  }

`os::Transaction::decode_bp()` has only one user: `_setattrs()`
of `BlueStore`. It uses that for optimization purposes: keeping
up contigous space instead of potentially fragmented `bufferlist`
that would require rectifying memcpy later.
The problem is `_setattrs()` also needs to avoid keeping large
raw buffers with only small subset being referenced. It achieves
this by copying the data if `bufferptr:::is_partial()` returns
`true`. However, this means the memcpy happens virtually always
as it's hard to even imagine the `val`, decoded from the wire,
can fulfill the 0 waste requirement.
Therefore the optimization doesn't make sense; it only imposes
costs in terms of complexity breaking the symmetry between encode
and decode in `os::Transation` (there is no `encode_bp()`).

This commit kills the optimization and simplifies `os::Transaction`.

Signed-off-by: Radoslaw Zarzynski <rzarzyns@redhat.com>
@rzarzynski rzarzynski force-pushed the wip-os-simplify-ostxn branch from cf2d9de to e3a6680 Compare February 8, 2025 15:10
@rzarzynski
Copy link
Contributor Author

Addressed the empty bufferlist case. Thanks for unveiling it!
Let's retrigger the QA.

@rzarzynski rzarzynski added needs-qa and removed DNM labels Feb 8, 2025
@rzarzynski
Copy link
Contributor Author

jenkins test api

1 similar comment
@rzarzynski
Copy link
Contributor Author

jenkins test api

@SrinivasaBharath SrinivasaBharath merged commit 0f73780 into ceph:main Feb 10, 2025
12 checks passed
@ljflores
Copy link
Member

Although this was merged prematurely, it passed QA.

Rados approved: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrackercephcomissues69793

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants