Bug #73750
closedrados/basic: Segmentation fault during neorados tests
0%
Description
Description: rados:basic/{ceph clusters/{fixed-2} mon_election/connectivity msgr-failures/many msgr/async-v2only objectstore/{bluestore/{alloc$/{avl} base mem$/{low} onode-segment$/{none} write$/{random/{compr$/{yes$/{zlib}} random}}}} rados supported-random-distro$/{rpm_latest} tasks/rados_api_tests}
/a/yaarit-2025-11-06_20:06:52-rados:basic-wip-rocky10-branch-of-the-day-2025-11-05-1762369819-distro-default-smithi/8587283
2025-11-06T20:26:12.710 INFO:tasks.workunit.client.0.smithi032.stdout: snapshots: [ RUN ] NeoRadosSelfManagedSnaps.Rollback 2025-11-06T20:26:12.786 INFO:tasks.workunit.client.0.smithi032.stderr:bash: line 1: 41451 Segmentation fault (core dumped) ceph_test_neorados_snapshots --gtest_output=xml:/home/ubuntu/cephtest/archive/unit_test_xml_report/neorados_snapshots.xml 2>&1 2025-11-06T20:26:12.786 INFO:tasks.workunit.client.0.smithi032.stderr: 41452 Done | tee ceph_test_neorados_snapshots.log 2025-11-06T20:26:12.786 INFO:tasks.workunit.client.0.smithi032.stderr: 41453 Done | sed "s/^/ snapshots: /" 2025-11-06T20:26:14.943 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: Running main() from gmock_main.cc 2025-11-06T20:26:14.943 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [==========] Running 15 tests from 1 test suite. 2025-11-06T20:26:14.943 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [----------] Global test environment set-up. 2025-11-06T20:26:14.943 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [----------] 15 tests from NeoRadosReadOps 2025-11-06T20:26:14.943 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.SetOpFlags 2025-11-06T20:26:14.943 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.SetOpFlags (2311 ms) 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.AssertExists 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.AssertExists (3034 ms) 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.AssertVersion 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.AssertVersion (3011 ms) 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.CmpXattr 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.CmpXattr (2879 ms) 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.Read 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.Read (3047 ms) 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.Checksum 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.Checksum (2964 ms) 2025-11-06T20:26:14.944 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.RWOrderedRead 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.RWOrderedRead (3064 ms) 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.ShortRead 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.ShortRead (3170 ms) 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.Exec 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.Exec (2833 ms) 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.Stat 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.Stat (3127 ms) 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.Omap 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ OK ] NeoRadosReadOps.Omap (2954 ms) 2025-11-06T20:26:14.945 INFO:tasks.workunit.client.0.smithi032.stdout: read_operations: [ RUN ] NeoRadosReadOps.OmapNuls 2025-11-06T20:26:15.112 INFO:tasks.workunit.client.0.smithi032.stderr:bash: line 1: 41434 Segmentation fault (core dumped) ceph_test_neorados_read_operations --gtest_output=xml:/home/ubuntu/cephtest/archive/unit_test_xml_report/neorados_read_operations.xml 2>&1 2025-11-06T20:26:15.113 INFO:tasks.workunit.client.0.smithi032.stderr: 41435 Done | tee ceph_test_neorados_read_operations.log 2025-11-06T20:26:15.113 INFO:tasks.workunit.client.0.smithi032.stderr: 41436 Done | sed "s/^/ read_operations: /"
This occurred during initial rocky 10 testing, which has not yet been officially added to the suite. This test does use rocky 10 packages, so it could be related.
Two coredumps are available at /a/yaarit-2025-11-06_20:06:52-rados:basic-wip-rocky10-branch-of-the-day-2025-11-05-1762369819-distro-default-smithi/8587283/remote/smithi032/coredump.
Updated by Laura Flores 4 months ago
- Related to Bug #50371: Segmentation fault (core dumped) ceph_test_rados_api_watch_notify_pp added
Updated by Laura Flores 4 months ago
Added a similar bug we solved in the past; perhaps the same type of analysis can be used here.
Updated by Laura Flores 4 months ago · Edited
Steps to create an environment to analyze the coredump:
1. Visit https://quay.ceph.io/repository/ceph-ci/ceph 2. Click "Tags" 3. Search for the branch name "wip-rocky10-branch-of-the-day-2025-11-05-1762369819" in "Filter Tags..." 4. Select "Fetch Tags" dropdown and copy podman/docker command (I used podman) 5. Go to a machine docker/podman and with access to the core file (use "scp" to copy it onto a machine if needed) 6. Run: $ podman pull quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:wip-rocky10-branch-of-the-day-2025-11-05-1762369819-rockylinux-10 $ podman run -it quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:wip-rocky10-branch-of-the-day-2025-11-05-1762369819-rockylinux-10 7. Create a repo for the packages: $ vi /etc/yum.repos.d/ceph-dev.repo 8. Copy the following into the file: [ceph] name=ceph packages for $basearch baseurl=https://1.chacra.ceph.com/r/ceph/wip-rocky10-branch-of-the-day-2025-11-05-1762369819/e96cbb7d09b133c651085a973a70c7d75650b6a0/rocky/10/flavors/default/$basearch enabled=1 gpgcheck=0 type=rpm-md [ceph-noarch] name=ceph noarch packages baseurl=https://1.chacra.ceph.com/r/ceph/wip-rocky10-branch-of-the-day-2025-11-05-1762369819/e96cbb7d09b133c651085a973a70c7d75650b6a0/rocky/10/flavors/default/noarch enabled=1 gpgcheck=0 type=rpm-md [ceph-source] name=ceph source packages baseurl=https://1.chacra.ceph.com/r/ceph/wip-rocky10-branch-of-the-day-2025-11-05-1762369819/e96cbb7d09b133c651085a973a70c7d75650b6a0/rocky/10/flavors/default/SRPMS enabled=1 gpgcheck=0 type=rpm-md 9. Update package manager: $ dnf update 10. Install executable and debuginfo: $ dnf install ceph-test ceph-test-debuginfo 11. Run gdb: $ gdb /usr/bin/ceph_test_neorados_snapshots -c 1762460772.41451.core -d /root/rpmbuild/BUILD/ceph-20.3.0-3904-ge96cbb7d
For the first core file, I got the following backtrace. "ss" seems to possibly have corrupted memory:
$ gdb /usr/bin/ceph_test_neorados_snapshots -c 1762460772.41451.core -d /root/rpmbuild/BUILD/ceph-20.3.0-3904-ge96cbb7d
...
...
Core was generated by `ceph_test_neorados_snapshots --gtest_output=xml:/home/ubuntu/cephtest/archive/u'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000558710f219b4 in NeoRadosSelfManagedSnaps_Rollback_Test::CoTestBody(_ZN38NeoRadosSelfManagedSnaps_Rollback_Test10CoTestBodyEv.Frame *) (
frame_ptr=0x55874cb9d470) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/test/neorados/snapshots.cc:193
193 neorados::SnapSet ss;
[Current thread is 1 (LWP 41451)]
(gdb) bt
#0 0x0000558710f219b4 in NeoRadosSelfManagedSnaps_Rollback_Test::CoTestBody(_ZN38NeoRadosSelfManagedSnaps_Rollback_Test10CoTestBodyEv.Frame *) (
frame_ptr=0x55874cb9d470) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/test/neorados/snapshots.cc:193
#1 0x0000558710f408e7 in std::__n4861::coroutine_handle<void>::resume (this=<optimized out>) at /usr/include/c++/14/coroutine:137
#2 boost::asio::detail::awaitable_frame_base<boost::asio::any_io_executor>::resume (this=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/awaitable.hpp:499
#3 boost::asio::detail::awaitable_thread<boost::asio::any_io_executor>::pump (this=0x7fffca0ea7d0)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/awaitable.hpp:769
#4 boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, boost::system::error_code>::operator()<boost::system::error_code> (this=0x7fffca0ea7d0,
arg=...) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/use_awaitable.hpp:103
#5 0x0000558710f345c9 in boost::asio::detail::consign_handler<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, boost::system::error_code>, std::pair<boost::asio::executor_work_guard<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>, void, void>, std::shared_ptr<neorados::detail::Client> > >::operator()<boost::system::error_code> (this=0x7fffca0ea7d0)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/consign.hpp:49
#6 boost::asio::detail::any_completion_handler_impl<boost::asio::detail::consign_handler<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, boost::system::error_code>, std::pair<boost::asio::executor_work_guard<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>, void, void>, std::shared_ptr<neorados::detail::Client> > > >::call<boost::system::error_code> (this=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/any_completion_handler.hpp:190
#7 boost::asio::detail::any_completion_handler_call_fn<void (boost::system::error_code)>::impl<boost::asio::detail::consign_handler<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, boost::system::error_code>, std::pair<boost::asio::executor_work_guard<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>, void, void>, std::shared_ptr<neorados::detail::Client> > > >(boost::asio::detail::any_completion_handler_impl_base*, boost::system::error_code) (
impl=<optimized out>, args#0=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/any_completion_handler.hpp:220
#8 0x00007f17b4cb4fac in void boost::asio::detail::executor_function::complete<boost::asio::detail::binder0<boost::asio::detail::append_handler<boost::asio::any_completion_handler<void (boost::system::error_code)>, boost::system::error_code> >, std::allocator<void> >(boost::asio::detail::executor_function::impl_base*, bool) ()
from /usr/lib64/ceph/libceph-common.so.2
#9 0x0000558710f34798 in boost::asio::detail::executor_function::operator() (this=<synthetic pointer>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/executor_function.hpp:61
#10 boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>::execute<boost::asio::detail::executor_function> (this=0x55874ca67520, f=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/io_context.hpp:192
#11 0x00007f17b4cb1e88 in boost::asio::detail::work_dispatcher<boost::asio::detail::append_handler<boost::asio::any_completion_handler<void (boost::system::error_code)>, boost::system::error_code>, boost::asio::any_completion_executor, void>::operator()() () from /usr/lib64/ceph/libceph-common.so.2
#12 0x00007f17b4cb6caa in boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<boost::asio::detail::append_handler<boost::asio::any_completion_handler<void (boost::system::error_code)>, boost::system::error_code>, boost::asio::any_completion_executor, void>, boost::asio::any_completion_handler_allocator<void, void (boost::system::error_code)>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () from /usr/lib64/ceph/libceph-common.so.2
#13 0x0000558710fcc38d in boost::asio::detail::scheduler_operation::complete (this=0x7f1714039b90, owner=0x55874b551bf0, ec=..., bytes_transferred=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#14 boost::asio::detail::scheduler::do_run_one (this=0x55874b551bf0, lock=..., this_thread=..., ec=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:492
#15 boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] (this=0x55874b551bf0, ec=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:208
#16 0x0000558710f19277 in boost::asio::io_context::run (this=0x55874ca43598)
--Type <RET> for more, q to quit, c to continue without paging--c
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/io_context.ipp:71
#17 CoroTest::TestBody (this=0x55874ca43580) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/test/neorados/common_tests.h:226
#18 0x0000558710fc4082 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x558710fe3a9d "the test body",
object=0x55874ca43580, method=<optimized out>) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2653
#19 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) [clone .constprop.0] (
object=0x55874ca43580, method=<optimized out>, location=0x558710fe3a9d "the test body")
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2689
#20 0x0000558710fb5163 in testing::Test::Run (this=0x55874ca43580)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2728
#21 testing::Test::Run (this=0x55874ca43580) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2718
#22 0x0000558710fb541d in testing::TestInfo::Run (this=0x55874b5d50c0)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2874
#23 0x0000558710fc3e1a in testing::TestSuite::Run (this=0x55874b564700)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:3052
#24 0x0000558710fbe89a in testing::TestSuite::Run (this=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:3007
#25 testing::internal::UnitTestImpl::RunAllTests (this=this@entry=0x55874b585be0)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:6004
#26 0x0000558710fbeee0 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x558710fecd90 "auxiliary test code (environments or event listeners)", object=0x55874b585be0, method=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2642
#27 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x558710fecd90 "auxiliary test code (environments or event listeners)", object=0x55874b585be0, method=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2689
#28 testing::UnitTest::Run (this=<optimized out>) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:5583
#29 0x0000558710f1326d in RUN_ALL_TESTS () at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/include/gtest/gtest.h:2334
#30 main (argc=<optimized out>, argv=0x7fffca0eb058) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googlemock/src/gmock_main.cc:71
...
...
(gdb) list 185,200
185 readioc.set_read_snap(neorados::snap_dir);
186
187 co_await new_selfmanaged_snap(rados(), my_snaps, ioc);
188 const auto bl1 = filled_buffer_list(0xcc, len);
189 co_await execute(oid, WriteOp{}.write(0, bl1), ioc);
190 co_await execute(oid, WriteOp{}.write(len, bl1), ioc);
191 co_await execute(oid, WriteOp{}.write(len * 2, bl1), ioc);
192
193 neorados::SnapSet ss;
194 co_await execute(oid, ReadOp{}.list_snaps(&ss), readioc);
195 EXPECT_EQ(1u, ss.clones.size());
196 EXPECT_EQ(neorados::snap_head, ss.clones[0].cloneid);
197 EXPECT_EQ(0u, ss.clones[0].snaps.size());
198 EXPECT_EQ(0u, ss.clones[0].overlap.size());
199 EXPECT_EQ(len * 3, ss.clones[0].size);
200
(gdb) info locals
my_snaps = std::vector of length 1, capacity 1 = {2}
ioc = {static impl_size = 128, impl = {data = {149, 0, 0, 0, 0, 0, 0, 0, 8, 213, 185, 76, 135, 85, 0 <repeats 18 times>, 107, 101, 121, 0, 95, 111, 115, 100, 40,
213, 185, 76, 135, 85, 0 <repeats 18 times>, 40, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 254, 255, 255, 255, 255, 255, 255, 255, 2, 0, 0,
0, 0, 0, 0, 0, 224, 182, 187, 76, 135, 85, 0, 0, 232, 182, 187, 76, 135, 85, 0, 0, 232, 182, 187, 76, 135, 85, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}}}
readioc = {static impl_size = 128, impl = {data = {149, 0, 0, 0, 0, 0, 0, 0, 136, 213, 185, 76, 135, 85, 0 <repeats 18 times>, 160, 213, 185, 76, 135, 85, 0, 0, 168,
213, 185, 76, 135, 85, 0 <repeats 26 times>, 255 <repeats 16 times>, 0 <repeats 40 times>}}}
bl1 = {_buffers = {_root = {next = 0x55874ca622b0}, _tail = 0x55874ca622b0}, _carriage = 0x55871102e760 <ceph::buffer::v15_2_0::list::always_empty_bptr>, _len = 128,
_num = 1, static always_empty_bptr = {<ceph::buffer::v15_2_0::ptr_hook> = {next = 0x0}, <ceph::buffer::v15_2_0::ptr> = {_raw = 0x0, _off = 0,
_len = 0}, <No data fields>}}
ss = {clones = std::vector of length 350204646, capacity 350204656 = {<error reading variable: Cannot access memory at address 0x558214cd1c0d>
bl2 = {_buffers = {_root = {next = 0x55874cb9d640}, _tail = 0x8}, _carriage = 0x72676d5f73706163, _len = 0, _num = 0,
static always_empty_bptr = {<ceph::buffer::v15_2_0::ptr_hook> = {next = 0x0}, <ceph::buffer::v15_2_0::ptr> = {_raw = 0x0, _off = 0, _len = 0}, <No data fields>}}
resbl = {_buffers = {_root = {next = 0x55874cb9d660}, _tail = 0x7}, _carriage = 0x7220776f6c6c61, _len = 0, _num = 0,
static always_empty_bptr = {<ceph::buffer::v15_2_0::ptr_hook> = {next = 0x0}, <ceph::buffer::v15_2_0::ptr> = {_raw = 0x0, _off = 0, _len = 0}, <No data fields>}}
_Coro_resume_fn = 0x558710f20c20 <NeoRadosSelfManagedSnaps_Rollback_Test::CoTestBody(_ZN38NeoRadosSelfManagedSnaps_Rollback_Test10CoTestBodyEv.Frame *)>
_Coro_destroy_fn = 0x558710f24e00 <NeoRadosSelfManagedSnaps_Rollback_Test::CoTestBody(_ZN38NeoRadosSelfManagedSnaps_Rollback_Test10CoTestBodyEv.Frame *)>
this = 0x55874ca43580
_Coro_promise = {<boost::asio::detail::awaitable_frame_base<boost::asio::any_io_executor>> = {coro_ = {_M_fr_ptr = 0x55874cb9d470},
attached_thread_ = 0x7fffca0ea7d0, caller_ = 0x7f172805ff80, pending_exception_ = {_M_exception_object = 0x0},
resume_context_ = 0x7fffca0ea740}, <No data fields>}
_Coro_self_handle = {_M_fr_ptr = 0x55874cb9d470}
...
...
(gdb) p ss
$1 = {clones = std::vector of length 350204646, capacity 350204656 = {<error reading variable: Cannot access memory at address 0x558214cd1c0d>
I got this backtrace for the second core file:
$ gdb /usr/bin/ceph_test_neorados_read_operations -c 1762460774.41434.core -d /root/rpmbuild/BUILD/ceph-20.3.0-3904-ge96cbb7d
...
...
Core was generated by `ceph_test_neorados_read_operations --gtest_output=xml:/home/ubuntu/cephtest/arc'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 NeoRadosReadOps_OmapNuls_Test::CoTestBody(_ZN29NeoRadosReadOps_OmapNuls_Test10CoTestBodyEv.Frame *) (frame_ptr=0x55b425a2a750)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/test/neorados/read_operations.cc:546
546 while (truncated) {
[Current thread is 1 (LWP 41434)]
(gdb) bt
#0 NeoRadosReadOps_OmapNuls_Test::CoTestBody(_ZN29NeoRadosReadOps_OmapNuls_Test10CoTestBodyEv.Frame *) (frame_ptr=0x55b425a2a750)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/test/neorados/read_operations.cc:546
#1 0x000055b3edfa7216 in std::__n4861::coroutine_handle<void>::resume (this=<optimized out>) at /usr/include/c++/14/coroutine:137
#2 boost::asio::detail::awaitable_frame_base<boost::asio::any_io_executor>::resume (this=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/awaitable.hpp:499
#3 boost::asio::detail::awaitable_thread<boost::asio::any_io_executor>::pump (this=0x7ffed47af930)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/awaitable.hpp:769
#4 0x000055b3edfa714b in boost::asio::detail::consign_handler<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, boost::system::error_code>, std::pair<boost::asio::executor_work_guard<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>, void, void>, std::shared_ptr<neorados::detail::Client> > >::operator()<boost::system::error_code> (this=0x7ffed47af930)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/consign.hpp:49
#5 boost::asio::detail::any_completion_handler_impl<boost::asio::detail::consign_handler<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, boost::system::error_code>, std::pair<boost::asio::executor_work_guard<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>, void, void>, std::shared_ptr<neorados::detail::Client> > > >::call<boost::system::error_code> (this=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/any_completion_handler.hpp:190
#6 boost::asio::detail::any_completion_handler_call_fn<void (boost::system::error_code)>::impl<boost::asio::detail::consign_handler<boost::asio::detail::awaitable_handler<boost::asio::any_io_executor, boost::system::error_code>, std::pair<boost::asio::executor_work_guard<boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>, void, void>, std::shared_ptr<neorados::detail::Client> > > >(boost::asio::detail::any_completion_handler_impl_base*, boost::system::error_code) (
impl=<optimized out>, args#0=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/any_completion_handler.hpp:220
#7 0x00007fdaab6b4fac in void boost::asio::detail::executor_function::complete<boost::asio::detail::binder0<boost::asio::detail::append_handler<boost::asio::any_completion_handler<void (boost::system::error_code)>, boost::system::error_code> >, std::allocator<void> >(boost::asio::detail::executor_function::impl_base*, bool) ()
from /usr/lib64/ceph/libceph-common.so.2
#8 0x000055b3edfa7f38 in boost::asio::detail::executor_function::operator() (this=<synthetic pointer>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/executor_function.hpp:61
#9 boost::asio::io_context::basic_executor_type<std::allocator<void>, 0ul>::execute<boost::asio::detail::executor_function> (this=0x7fd99003c380, f=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/io_context.hpp:192
#10 0x00007fdaab6b1e88 in boost::asio::detail::work_dispatcher<boost::asio::detail::append_handler<boost::asio::any_completion_handler<void (boost::system::error_code)>, boost::system::error_code>, boost::asio::any_completion_executor, void>::operator()() () from /usr/lib64/ceph/libceph-common.so.2
#11 0x00007fdaab6b6caa in boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<boost::asio::detail::append_handler<boost::asio::any_completion_handler<void (boost::system::error_code)>, boost::system::error_code>, boost::asio::any_completion_executor, void>, boost::asio::any_completion_handler_allocator<void, void (boost::system::error_code)>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () from /usr/lib64/ceph/libceph-common.so.2
#12 0x000055b3ee05873d in boost::asio::detail::scheduler_operation::complete (this=0x7fd99009a180, owner=0x55b423b8c0e0, ec=..., bytes_transferred=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#13 boost::asio::detail::scheduler::do_run_one (this=0x55b423b8c0e0, lock=..., this_thread=..., ec=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:492
#14 boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] (this=0x55b423b8c0e0, ec=...)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:208
#15 0x000055b3edf8772c in boost::asio::io_context::run (this=0x55b4258d7918)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/redhat-linux-build/boost/include/boost/asio/impl/io_context.ipp:71
#16 CoroTest::TestBody (this=0x55b4258d7900) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/test/neorados/common_tests.h:226
--Type <RET> for more, q to quit, c to continue without paging--
#17 0x000055b3ee038162 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (location=0x55b3ee09b90c "the test body",
object=0x55b4258d7900, method=<optimized out>) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2653
#18 testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) [clone .constprop.0] (
object=0x55b4258d7900, method=<optimized out>, location=0x55b3ee09b90c "the test body")
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2689
#19 0x000055b3ee0293a3 in testing::Test::Run (this=0x55b4258d7900)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2728
#20 testing::Test::Run (this=0x55b4258d7900) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2718
#21 0x000055b3ee02965d in testing::TestInfo::Run (this=0x55b423b7cbf0)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2874
#22 0x000055b3ee037efa in testing::TestSuite::Run (this=0x55b423bb5700)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:3052
#23 0x000055b3ee032c9a in testing::TestSuite::Run (this=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:3007
#24 testing::internal::UnitTestImpl::RunAllTests (this=this@entry=0x55b423bb0be0)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:6004
#25 0x000055b3ee0332e0 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x55b3ee0a53c8 "auxiliary test code (environments or event listeners)", object=0x55b423bb0be0, method=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2642
#26 testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (
location=0x55b3ee0a53c8 "auxiliary test code (environments or event listeners)", object=0x55b423bb0be0, method=<optimized out>)
at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:2689
#27 testing::UnitTest::Run (this=<optimized out>) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/src/gtest.cc:5583
#28 0x000055b3edf8176d in RUN_ALL_TESTS () at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googletest/include/gtest/gtest.h:2334
#29 main (argc=<optimized out>, argv=0x7ffed47b01b8) at /usr/src/debug/ceph-20.3.0-3904.ge96cbb7d.el10.x86_64/src/googletest/googlemock/src/gmock_main.cc:71
...
...
(gdb) list 535,555
535 }));
536 }
537
538 // Check iteration and truncation
539 {
540 std::unordered_set<std::string> keys;
541 for (const auto& [key, value] : omap) {
542 keys.insert(key);
543 }
544 bool truncated = true;
545 std::optional<std::string> lastkey;
546 while (truncated) {
547 ctnr::flat_set<std::string> keys2;
548 ctnr::flat_map<std::string, buffer::list> omap2;
549 bool truncated2;
550 ReadOp op;
551 op.get_omap_vals(lastkey, {}, 1, &omap2, &truncated);
552 op.get_omap_keys(lastkey, 1, &keys2, &truncated2);
553 co_await execute(oid, std::move(op));
554 EXPECT_EQ(1, std::ssize(keys2));
555 EXPECT_EQ(1, std::ssize(omap2));
...
...
(gdb) info locals
keys = std::unordered_set with 3 elements = {[0] = "3baa\000rr", [1] = "2baar", [2] = "1\000bar"}
truncated = true
lastkey = std::optional [no contained value]
omap = {m_flat_tree = {
m_data = {<boost::container::dtl::flat_tree_value_compare<std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list>, boost::container::dtl::select1st<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >> = {<std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >> = {<std::binary_function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>> = {<No data fields>}, <No data fields>}, <No data fields>}, m_seq = {
m_holder = {<boost::container::new_allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, ceph::buffer::v15_2_0::list> >> = {<No data fields>}, m_start = 0x7fda800fd110, m_size = 3, m_capacity = 3}}}, static has_stored_allocator_type = true}}
_Coro_resume_fn = 0x55b3edf9a060 <NeoRadosReadOps_OmapNuls_Test::CoTestBody(_ZN29NeoRadosReadOps_OmapNuls_Test10CoTestBodyEv.Frame *)>
_Coro_destroy_fn = 0x55b3edf9ee50 <NeoRadosReadOps_OmapNuls_Test::CoTestBody(_ZN29NeoRadosReadOps_OmapNuls_Test10CoTestBodyEv.Frame *)>
this = 0x55b4258d7900
_Coro_promise = {<boost::asio::detail::awaitable_frame_base<boost::asio::any_io_executor>> = {coro_ = {_M_fr_ptr = 0x55b425a2a750},
attached_thread_ = 0x7ffed47af930, caller_ = 0x7fd9a80a3ab0, pending_exception_ = {_M_exception_object = 0x0},
resume_context_ = 0x7ffed47af8e0}, <No data fields>}
_Coro_self_handle = {_M_fr_ptr = 0x55b425a2a750}
_Coro_resume_index = 10
_Coro_frame_needs_free = true
_Coro_initial_await_resume_called = true
Updated by Laura Flores 4 months ago
It looks like both tests are using `co_await`, so it's possible that something in the C library changed between centos/ubuntu and rocky10.
@Adam Emerson WDYT?
Updated by Laura Flores 4 months ago
- Status changed from New to In Progress
- Assignee set to Adam Emerson
@Adam Emerson assigning to you.
Updated by Adam Emerson 4 months ago
Just as an update I am currently investigating this and may have found a local reproducer that I'm hammering on.
Updated by Yaarit Hatuka 4 months ago
- Blocks Bug #73930: ceph-mgr modules rely on deprecated python subinterpreters added
Updated by Casey Bodley 4 months ago
i wasn't able to reproduce the crashes until i added the rpm hardening flags used by our shaman package builds:
CXXFLAGS="-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Wno-complain-wrong-lang -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -m64 -march=x86-64-v3 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2"
of these, i narrowed the culprit down to -march=x86-64-v3. with that present, i see the same crashes on fedora 43 with gcc 15.2.1
cbodley@fedora ~/ceph/build $ gdb --args bin/ceph_test_neorados_read_operations --gtest_repeat=100 --gtest_filter=NeoRadosReadOps.Omap
Thread 1 "ceph_test_neora" received signal SIGSEGV, Segmentation fault.
0x000055555558700d in NeoRadosReadOps_Omap_Test::CoTestBody (frame_ptr=0x55555584e650)
at /home/cbodley/ceph/src/test/neorados/read_operations.cc:412
412 std::optional<std::string> lastkey;
(gdb) disassemble /s
Dump of assembler code for function NeoRadosReadOps_Omap_Test::CoTestBody():
...
/home/cbodley/ceph/src/test/neorados/read_operations.cc:
411 bool truncated = true;
0x0000555555586fcc <+4204>: lea 0x20b8(%r15),%rax
0x0000555555586fd3 <+4211>: movb $0x1,0x20b8(%r15)
412 std::optional<std::string> lastkey;
0x0000555555586fdb <+4219>: vpxor %xmm0,%xmm0,%xmm0
0x0000555555586fdf <+4223>: lea 0x2400(%r15),%rbx
0x0000555555586fe6 <+4230>: mov %rax,-0x4f0(%rbp)
0x0000555555586fed <+4237>: lea 0x2130(%r15),%rax
0x0000555555586ff4 <+4244>: lea 0x2118(%r15),%r13
0x0000555555586ffb <+4251>: movq $0x0,0x20e0(%r15)
413 while (truncated) {
0x0000555555587006 <+4262>: mov %rax,-0x4c0(%rbp)
412 std::optional<std::string> lastkey;
=> 0x000055555558700d <+4269>: vmovdqa %ymm0,0x20c0(%r15)
0x0000555555587016 <+4278>: vzeroupper
...
(gdb) p $r15
$1 = 93824995354192
(gdb) p/a $r15
$2 = 0x55555584e650
(gdb) p lastkey
$3 = std::optional [no contained value]
(gdb) p &lastkey
$4 = (std::optional<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > *) 0x555555850710
from
vmovdqa %ymm0,0x20c0(%r15), the r15 register corresponds to NeoRadosReadOps_Omap_Test::CoTestBody (frame_ptr=0x55555584e650), and 0x20c0 is the offset to the lastkey variable. the size of optional<string> here is 40 bytes, so its end offset is 0x20e8. so this instruction is copying the 32-byte ymm0 register into the low 32 bytes of memory for lastkey
vpxor %xmm0,%xmm0,%xmm0 above is zeroing the 16 byte xmm0 register which corresponds to the low 16 bytes of ymm0, but the high 16 bytes of ymm0 may be uninitialized? the vzeroupper instruction would zero those, but it comes after vmovdqa. regardless, that instruction shouldn't crash on uninitialized bits in the register. it's just copying those into lastkey's memory address, which should be a valid pointer to stack memory
movq $0x0,0x20e0(%r15) is what zeroes the final 8 bytes of lastkey
the same test without -march specified doesn't crash, with code gen:
/home/cbodley/ceph/src/test/neorados/read_operations.cc:
411 bool truncated = true;
0x00005555555a8f30 <+7808>: lea 0x2130(%r14),%rax
0x00005555555a8f37 <+7815>: movb $0x1,0x20b8(%r14)
412 std::optional<std::string> lastkey;
=> 0x00005555555a8f3f <+7823>: pxor %xmm0,%xmm0
0x00005555555a8f43 <+7827>: lea 0x2400(%r14),%rbx
0x00005555555a8f4a <+7834>: mov %rax,-0x388(%rbp)
0x00005555555a8f51 <+7841>: lea 0x2118(%r14),%r12
0x00005555555a8f58 <+7848>: movq $0x0,0x20e0(%r14)
413 while (truncated) {
0x00005555555a8f63 <+7859>: movaps %xmm0,0x20c0(%r14)
0x00005555555a8f6b <+7867>: movaps %xmm0,0x20d0(%r14)
the same test on the tentacle branch does not crash with
-march=x86-64-v3, so i'm working to bisectUpdated by Casey Bodley 4 months ago
i started bisect at commit 83a82c51682caafaea5cd9ccf8e77b7250448c81, just before an early March pr https://github.com/ceph/ceph/pull/61084 which bumped boost to 1.87:
$ git reset --hard origin/main $ git bisect start $ git bisect bad $ git bisect good 83a82c51682caafaea5cd9ccf8e77b7250448c81 Bisecting: 2893 revisions left to test after this (roughly 12 steps) [bc7600417e9a8e60225f66fa4b3fca14f6e8af3f] Merge pull request #64069 from phlogistonjohn/jjm-bwc-test-tweak $ git bisect bad Bisecting: 1440 revisions left to test after this (roughly 11 steps) warning: unable to rmdir 'src/breakpad': Directory not empty warning: unable to rmdir 'src/lss': Directory not empty [84a42f7c76b19f9532136b22d4a64b2aad8b3257] Merge pull request #56336 from pritha-srivastava/wip-rgw-d4n-next $ git bisect good Bisecting: 720 revisions left to test after this (roughly 10 steps) [ef03debd4c6fc1e3866f6daed1cf449050fbb191] Merge pull request #63629 from zdover23/wip-doc-2025-06-02-mgr-localpool-63419-followup $ git bisect good Bisecting: 359 revisions left to test after this (roughly 9 steps) [94478005e72c0f9ca496b828c439b4843c88f3a0] Merge pull request #58881 from cbodley/wip-gcc-13-lto $ git bisect bad Bisecting: 180 revisions left to test after this (roughly 8 steps) [406a8e8e7643fa1735d6d354ce25c33a3a5ffe7a] Merge pull request #63160 from yuvalif/wip-yuval-71219 $ git bisect good Bisecting: 90 revisions left to test after this (roughly 7 steps) [ed2b694e4d9479eb7a9bef84863d5e9b253f4099] Merge pull request #63975 from tchaikov/wip-cmake-find_program $ git bisect good Bisecting: 45 revisions left to test after this (roughly 6 steps) [6aa6c77961548a14b058e81507f3b93205955095] Merge pull request #64007 from tchaikov/wip-update-ceph-object-corpus $ git bisect good Bisecting: 19 revisions left to test after this (roughly 5 steps) [db2a4a672e6622481e75e7a9d96744e9ab9faec5] Merge pull request #63149 from ajarr/wip-ajarr-fix-mirror-image-get-mode $ git bisect good Bisecting: 9 revisions left to test after this (roughly 3 steps) [18c98576fc2cb67e9ecaaa38ca1b935fa2f92fc3] Merge pull request #62568 from Matan-B/wip-matanb-fmt-11.1.4 $ git bisect good Bisecting: 5 revisions left to test after this (roughly 2 steps) [2cddd4d09178b69babbd2412672d0620932d786b] Merge pull request #64042 from tchaikov/wip-rgw-no-aligned_storage $ git bisect good Bisecting: 0 revisions left to test after this (roughly 1 step) [ebceb95ffc1907014ad8d22344fac6de15eba3d2] Merge pull request #64037 from tchaikov/wip-neorados-alignedas $ git bisect bad Bisecting: 0 revisions left to test after this (roughly 0 steps) [ba7b42983cc6e1966f9149cc6160a4ae6154f9e0] neorados: avoid using std::aligned_storage_t
which came from https://github.com/ceph/ceph/pull/64037 for c++23 support
Updated by Casey Bodley 4 months ago · Edited
[ba7b42983cc6e1966f9149cc6160a4ae6154f9e0] neorados: avoid using std::aligned_storage_t
after this change, alignof(neorados::Op) is significantly different. the aligned_storage template introduced by Kefu defaults to Alignment = std::bit_ceil(S), which gives 1024 for S = 680
but the original std::aligned_storage_t<680> defaults to 16-byte alignment:
ceph/src/include/neorados/RADOS.hpp:393:37: warning: ‘using std::aligned_storage_t = struct std::aligned_storage<680, 16>::typ
e’ is deprecated [-Wdeprecated-declarations]
393 | std::aligned_storage_t<impl_size> impl;
| ^~~~
the crash goes away if i specify the alignment as 16 (which also matches alignof(OpImpl)):
static constexpr std::size_t impl_size = 85 * 8;
- detail::aligned_storage<impl_size> impl;
+ detail::aligned_storage<impl_size, 16> impl;
from vmovdqa %ymm0,0x20c0(%r15), the r15 register corresponds to NeoRadosReadOps_Omap_Test::CoTestBody (frame_ptr=0x55555584e650), and 0x20c0 is the offset to the lastkey variable. the size of optional<string> here is 40 bytes, so its end offset is 0x20e8. so this instruction is copying the 32-byte ymm0 register into the low 32 bytes of memory for lastkey
vpxor %xmm0,%xmm0,%xmm0 above is zeroing the 16 byte xmm0 register which corresponds to the low 16 bytes of ymm0, but the high 16 bytes of ymm0 may be uninitialized? the vzeroupper instruction would zero those, but it comes after vmovdqa. regardless, that instruction shouldn't crash on uninitialized bits in the register. it's just copying those into lastkey's memory address, which should be a valid pointer to stack memory
@Shilpa MJ pointed out that vmovdqa on a 32-byte register probably expects the memory to have 32-byte alignment, but the address of lastkey (0x555555850710) only has 16-byte alignment
@Matt Benjamin found https://www.cs.ubbcluj.ro/~vancea/asc/practic/html/MOVDQA.html which says for VMOVDQA,
When the source or destination operand is a memory operand, the operand must be aligned on a 32-byte boundary or a general-protection exception (#GP) will be generated. To move integer data to and from unaligned memory locations, use the VMOVDQU instruction.
so i assume the use of VMOVDQA over VMOVDQU is due to a compiler bug?
edit: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104177 looks potentially related, specific to coroutine frames
edit2: that bug talks about "extended alignment" which is greater than alignof(std::max_align_t) = 16
Updated by Laura Flores 4 months ago
Scrub note: Approved, but needs to be tested.
Updated by Laura Flores 4 months ago
- Related to QA Run #73749: wip-lflores-testing-4-2025-12-01-1527 (old wip-rocky10-branch-of-the-day-2025-11-05-1762369819) added
Updated by Casey Bodley 4 months ago
Casey Bodley wrote in #note-15:
edit: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104177 looks potentially related, specific to coroutine frames
edit2: that bug talks about "extended alignment" which is greater than
alignof(std::max_align_t) = 16
digging further, it sounds like gcc is conforming to the c++ standard which mandates the use of a specific (unaligned) overload of operator new for coroutine frames. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2014r2.html proposes the use of std::align_val_t overloads of operator new/delete for coroutine frames that require extended alignment. however, there's been no movement on that paper since 2024 and according to https://github.com/cplusplus/papers/issues/750#issuecomment-2657897866,
Author says the paper is no longer pursued.
i guess we just need to be very careful about inventing/using types with "extended alignment"
Updated by Radoslaw Zarzynski 3 months ago
- Status changed from In Progress to Fix Under Review
Updated by Laura Flores 3 months ago
Scrub note: Checking with Nitzan about test results for this.
Updated by Yaarit Hatuka 3 months ago
QA run results in https://tracker.ceph.com/issues/74070 might be related, need to take a look.
Updated by Laura Flores 2 months ago
Scrub note: QA evaluation delayed due to lab migration.
Updated by Yaarit Hatuka about 1 month ago
- Has duplicate Bug #73758: rocky 10: test_rgw_datalog.sh fails with segfault added
Updated by Upkeep Bot about 6 hours ago
- Status changed from Fix Under Review to Resolved
- Merge Commit set to 1330473ff40fa90dbbd11ff75a20bf27cc262e4c
- Fixed In set to v20.3.0-6282-g1330473ff4
- Upkeep Timestamp set to 2026-03-20T21:51:33+00:00