Skip to content

common/async: fix stack-use-after-scope in co_waiter#67568

Open
tchaikov wants to merge 1 commit intoceph:mainfrom
tchaikov:wip-rgw-fix-co_waiter-stack-use-after-scope
Open

common/async: fix stack-use-after-scope in co_waiter#67568
tchaikov wants to merge 1 commit intoceph:mainfrom
tchaikov:wip-rgw-fix-co_waiter-stack-use-after-scope

Conversation

@tchaikov
Copy link
Contributor

@tchaikov tchaikov commented Feb 27, 2026

When co_waiter is destroyed, the cancellation slot may still hold a reference to the op_cancellation callback which captures 'this'. If the cancellation signal is emitted after co_waiter is destroyed (e.g., during co_throttle shutdown), it results in a stack-use-after-scope error.

Fix by adding a destructor that retrieves the cancellation slot from the handler (if still active) and clears it before destruction. This ensures the cancellation callback is removed before the co_waiter object goes out of scope, preventing use-after-scope errors.

The cancellation slot cannot be stored as a member variable because it becomes invalid after the handler is moved out in complete(). Instead, we retrieve it on demand from the handler in the destructor, which is
the only place we need it.

This issue was identified by ASan:

==21453==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7a1364f050c8 at pc 0x603d79ff0d51 bp 0x7ffc1edf78c0 sp 0x7ffc1edf78b8
READ of size 1 at 0x7a1364f050c8 thread T0
    #0 0x603d79ff0d50 in std::_Optional_base_impl<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, std::__exception_ptr::exception_ptr>, std::_Optional_base<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, std::__ex
ception_ptr::exception_ptr>, false, false>>::_M_is_engaged() const /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/optional:471:58
    #1 0x603d79ff8874 in std::optional<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, std::__exception_ptr::exception_ptr>>::operator bool() const /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/optional:985:22
    #2 0x603d79ff9d5f in ceph::async::co_waiter<void, boost::asio::any_io_executor>::cancel() /ceph/src/common/async/co_waiter.h:153:9
    #3 0x603d79ff9c32 in ceph::async::co_waiter<void, boost::asio::any_io_executor>::op_cancellation::operator()(boost::asio::cancellation_type) /ceph/src/common/async/co_waiter.h:112:15
    #4 0x603d79ff9a6e in boost::asio::detail::cancellation_handler<ceph::async::co_waiter<void, boost::asio::any_io_executor>::op_cancellation>::call(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/cancellation_signal.hpp:56:5
    #5 0x603d79fb9125 in boost::asio::cancellation_signal::emit(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/cancellation_signal.hpp:99:17
    #6 0x603d79fd6c31 in boost::asio::cancellation_state::impl<boost::asio::cancellation_filter<(boost::asio::cancellation_type)1>, boost::asio::cancellation_filter<(boost::asio::cancellation_type)1>>::operator()(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/cancellation_state.hpp:222:23
    #7 0x603d79fd696e in boost::asio::detail::cancellation_handler<boost::asio::cancellation_state::impl<boost::asio::cancellation_filter<(boost::asio::cancellation_type)1>, boost::asio::cancellation_filter<(boost::asio::cancellation_type)1>>>::call(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/cancellation_signal.hpp:56:5
    #8 0x603d79fb9125 in boost::asio::cancellation_signal::emit(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/cancellation_signal.hpp:99:17
    #9 0x603d79fee03a in boost::asio::detail::co_spawn_cancellation_handler<boost::asio::cancellation_slot_binder<ceph::async::detail::co_throttle_impl<boost::asio::any_io_executor>::child_completion, boost::asio::cancellation_slot>, boost::asio::any_io_executor, void>::operator()(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/impl/co_spawn.hpp:296:13
    #10 0x603d79fede9e in boost::asio::detail::cancellation_handler<boost::asio::detail::co_spawn_cancellation_handler<boost::asio::cancellation_slot_binder<ceph::async::detail::co_throttle_impl<boost::asio::any_io_executor>::child_completion, boost::asio::cancellation_slot>, boost::asio::any_io_executor, void>>::call(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/cancellation_signal.hpp:56:5
    #11 0x603d79fb9125 in boost::asio::cancellation_signal::emit(boost::asio::cancellation_type) /opt/ceph/include/boost/asio/cancellation_signal.hpp:99:17
    #12 0x603d79fe7135 in ceph::async::detail::co_throttle_impl<boost::asio::any_io_executor>::cancel() /ceph/src/common/async/detail/co_throttle_impl.h:122:17
    #13 0x603d79fe701c in ceph::async::co_throttle<boost::asio::any_io_executor>::cancel() /ceph/src/common/async/co_throttle.h:110:11
    #14 0x603d79fe27a8 in ceph::async::co_throttle<boost::asio::any_io_executor>::~co_throttle() /ceph/src/common/async/co_throttle.h:76:5
    #15 0x603d79f98dce in ceph::async::co_throttle_spawn_shutdown_Test::TestBody()::$_0::operator()() const (.destroy) /ceph/src/test/common/test_async_co_throttle.cc:264:3
    #16 0x603d79fe25ec in std::__n4861::coroutine_handle<void>::destroy() const /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/coroutine:137:30
    #17 0x603d79fe2584 in boost::asio::detail::awaitable_frame_base<boost::asio::any_io_executor>::destroy() /opt/ceph/include/boost/asio/impl/awaitable.hpp:512:11
    #18 0x603d79fb79a9 in boost::asio::awaitable<void, boost::asio::any_io_executor>::~awaitable() /opt/ceph/include/boost/asio/awaitable.hpp:77:15
    #19 0x603d79f7fb0a in boost::asio::awaitable<boost::asio::detail::awaitable_thread_entry_point, boost::asio::any_io_executor> boost::asio::detail::co_spawn_entry_point<ceph::async::capture(std::optional<std::__exception_ptr::exception_ptr>&)::$_0, boost::asio::any_io_executor, boost::asio::detail::awaitable_as_function<void, boost::asio::any_io_executor>>(boost::asio::awaitable<void, boost::asio::any_io_executor>*, boost::asio::detail::co_spawn_state<ceph::async::capture(std::optional<std::__exception_ptr::exception_ptr>&)::$_0, boost::asio::any_io_executor, boost::asio::detail::awaitable_as_function<void, boost::asio::any_io_executor>, void>) (.destroy) /opt/ceph/include/boost/asio/impl/co_spawn.hpp:205:5
    #20 0x603d79fe25ec in std::__n4861::coroutine_handle<void>::destroy() const /usr/lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/coroutine:137:30
    #21 0x603d79fe2584 in boost::asio::detail::awaitable_frame_base<boost::asio::any_io_executor>::destroy() /opt/ceph/include/boost/asio/impl/awaitable.hpp:512:11
    #22 0x603d79fd4fc9 in boost::asio::awaitable<boost::asio::detail::awaitable_thread_entry_point, boost::asio::any_io_executor>::~awaitable() /opt/ceph/include/boost/asio/awaitable.hpp:77:15
    #23 0x603d79fde3d4 in boost::asio::detail::awaitable_thread<boost::asio::any_io_executor>::~awaitable_thread()::'lambda'()::~() /opt/ceph/include/boost/asio/impl/awaitable.hpp:692:11
    #24 0x603d79fdf034 in boost::asio::detail::binder0<boost::asio::detail::awaitable_thread<boost::asio::any_io_executor>::~awaitable_thread()::'lambda'()>::~binder0() /opt/ceph/include/boost/asio/detail/bind_handler.hpp:30:7
    #25 0x603d79fe0501 in void boost::asio::detail::executor_function::complete<boost::asio::detail::binder0<boost::asio::detail::awaitable_thread<boost::asio::any_io_executor>::~awaitable_thread()::'lambda'()>, std::allocator<void>>(boost::asio::detail::executor_function::impl_base*, bool) /opt/ceph/include/boost/asio/detail/executor_function.hpp:115:3
    #26 0x603d79fdc152 in boost::asio::detail::executor_function::~executor_function() /opt/ceph/include/boost/asio/detail/executor_function.hpp:52:7
    #27 0x603d79ffcea8 in boost::asio::detail::executor_op<boost::asio::detail::executor_function, std::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) /opt/ceph/include/boost/asio/detail/executor_op.hpp:73:3
    #28 0x603d79fce17c in boost::asio::detail::scheduler_operation::destroy() /opt/ceph/include/boost/asio/detail/scheduler_operation.hpp:45:5
    #29 0x603d79fd0380 in boost::asio::detail::scheduler::shutdown() /opt/ceph/include/boost/asio/detail/impl/scheduler.ipp:174:10
    #30 0x603d79fd483c in boost::asio::detail::service_registry::shutdown_services() /opt/ceph/include/boost/asio/detail/impl/service_registry.ipp:44:14
    #31 0x603d79fd4735 in boost::asio::execution_context::shutdown() /opt/ceph/include/boost/asio/impl/execution_context.ipp:48:22
    #32 0x603d79fb8c08 in boost::asio::io_context::~io_context() /opt/ceph/include/boost/asio/impl/io_context.ipp:65:3
    #33 0x603d79f4a284 in ceph::async::co_throttle_spawn_shutdown_Test::TestBody() /ceph/src/test/common/test_async_co_throttle.cc:274:1
    #34 0x603d7a0fdd8d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /ceph/src/googletest/googletest/src/gtest.cc:2653:10
    #35 0x603d7a0b49e5 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /ceph/src/googletest/googletest/src/gtest.cc:2689:14
    #36 0x603d7a06f0bd in testing::Test::Run() /ceph/src/googletest/googletest/src/gtest.cc:2728:5

Fixes: https://tracker.ceph.com/issues/75231

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@tchaikov tchaikov requested a review from cbodley February 27, 2026 09:43
@tchaikov
Copy link
Contributor Author

tchaikov commented Feb 27, 2026

@cbody Hey Casey, could you help review this change? I created a tracker ticket to track this bug, as this issue triggers during shutdown/cancellation scenarios when:

  • spawn_group is destroyed while child coroutines are still running
  • A cancellation signal arrives after destruction

this happens in production.

@tchaikov
Copy link
Contributor Author

jenkins test make check arm64

@tchaikov tchaikov mentioned this pull request Feb 27, 2026
14 tasks
When co_waiter is destroyed, the cancellation slot may still hold a
reference to the op_cancellation callback which captures 'this'. If
the cancellation signal is emitted after co_waiter is destroyed (e.g.,
during co_throttle shutdown), it results in a stack-use-after-scope
error.

Fix by adding a destructor that retrieves the cancellation slot from
the handler (if still active) and clears it before destruction. This
ensures the cancellation callback is removed before the co_waiter
object goes out of scope, preventing use-after-scope errors.

The cancellation slot cannot be stored as a member variable because
it becomes invalid after the handler is moved out in complete(). Instead,
we retrieve it on demand from the handler in the destructor, which is
the only place we need it.

Signed-off-by: Kefu Chai <k.chai@proxmox.com>
@tchaikov tchaikov force-pushed the wip-rgw-fix-co_waiter-stack-use-after-scope branch from 8571adb to b760a94 Compare February 27, 2026 11:26
@tchaikov tchaikov added the rgw label Feb 27, 2026
@cbodley
Copy link
Contributor

cbodley commented Mar 2, 2026

jenkins test make check arm64

@tchaikov
Copy link
Contributor Author

@cbodley Hi Casey, thank you for reviewing and approving this change. If I want to add some QA coverage for it, do you have a recommendation for which RGW suite would be the most appropriate to run? My current thought is that, for minimal RGW-specific coverage, qa/workunits/rgw/test_rgw_datalog.sh may be the best fit, since it is small, directly runs ceph_test_datalog, and seems to exercise the RGW datalog async path that is closest to the co_waiter/co_throttle code involved here. Does that sound reasonable to you?

@cbodley
Copy link
Contributor

cbodley commented Mar 19, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants