Bug #63682
closed"rbd migration prepare" crashes when importing from http stream
0%
Description
Seen on 22.04 and 9.stream so far:
2023-11-28T19:19:24.045 DEBUG:teuthology.orchestra.run.smithi106:> sudo TESTDIR=/home/ubuntu/cephtest bash -c 'echo '"'"'{"type":"qcow","stream":{"type":"http","url":"http://download.ceph.com/qa/ubuntu-12.04.qcow2"}}'"'"' | rbd migration prepare --import-only --source-spec-path - client.0.0'
2023-11-28T19:19:24.105 INFO:teuthology.orchestra.run.smithi106.stderr:*** Caught signal (Segmentation fault) **
2023-11-28T19:19:24.106 INFO:teuthology.orchestra.run.smithi106.stderr: in thread 7f26e1941640 thread_name:io_context_pool
2023-11-28T19:19:24.126 DEBUG:teuthology.orchestra.run:got remote process result: 139
2023-11-28T19:19:24.126 INFO:teuthology.orchestra.run.smithi106.stderr: ceph version 18.0.0-7498-g369173db (369173db14b6995b2bd07c60ec5f63d01cf21631) squid (dev)
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f26e5987520]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 2: /lib/librbd.so.1(+0x2e1a81) [0x7f26e6d08a81]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 3: /lib/librados.so.2(+0x111e2e) [0x7f26e69ade2e]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 4: /lib/librados.so.2(+0xc268f) [0x7f26e695e68f]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc2b3) [0x7f26e5d522b3]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 6: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f26e59d9b43]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 7: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f26e5a6ba00]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr:2023-11-28T19:19:24.105+0000 7f26e1941640 -1 *** Caught signal (Segmentation fault) **
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: in thread 7f26e1941640 thread_name:io_context_pool
Updated by Ilya Dryomov over 2 years ago
The crash is in the bowels of ASIO while trying to connect:
2023-12-03T13:38:28.424+0000 7f44ebd49e00 10 librbd::Migration: prepare_import: {"type":"qcow","stream":{"type":"http","url":"http://download.ceph.com/qa/ubuntu-12.04.qcow2"}} -> test1/image1, opts=[]
2023-12-03T13:38:28.424+0000 7f44ebd49e00 10 librbd::migration::OpenSourceImageRequest: 0x55be3843c590 OpenSourceImageRequest:
2023-12-03T13:38:28.424+0000 7f44ebd49e00 10 librbd::migration::OpenSourceImageRequest: 0x55be3843c590 open_source:
2023-12-03T13:38:28.424+0000 7f44ebd49e00 20 librbd::asio::ContextWQ: 0x55be38695880 ContextWQ:
2023-12-03T13:38:28.424+0000 7f44ebd49e00 20 librbd::AsioEngine: 0x55be38695840 AsioEngine:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::ImageCtx: 0x55be38645fb0 ImageCtx: image_name=, image_id=
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=9
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::QueueImageDispatch: 0x55be38697530 QueueImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=1
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::QosImageDispatch: 0x55be38695eb0 QosImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=2
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::RefreshImageDispatch: 0x55be38697ad0 RefreshImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=4
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::WriteBlockImageDispatch: 0x55be38415120 WriteBlockImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=7
2023-12-03T13:38:28.428+0000 7f44ebd49e00 5 librbd::io::Dispatcher: 0x55be386959b0 register_dispatch: dispatch_layer=6
2023-12-03T13:38:28.428+0000 7f44ebd49e00 15 librbd::migration::OpenSourceImageRequest: 0x55be3843c590 open_source: source_spec={"type":"qcow","stream":{"type":"http","url":"http://download.ceph.com/qa/ubuntu-12.04.qcow2"}}, source_snap_id=18446744073709551614, import_only=1
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::SourceSpecBuilder: 0x7ffc3b729e98 parse_source_spec:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::SourceSpecBuilder: 0x7ffc3b729e98 build_format:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::QCOWFormat: 0x55be38481620 open:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::SourceSpecBuilder: 0x7ffc3b729e98 build_stream:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::HttpStream: 0x55be386950d0 open: url=http://download.ceph.com/qa/ubuntu-12.04.qcow2
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::HttpClient: 0x55be3847e050 open: url=http://download.ceph.com/qa/ubuntu-12.04.qcow2
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::util::parse_url: url=http://download.ceph.com/qa/ubuntu-12.04.qcow2
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient: 0x55be3847e050 create_http_session:
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::HttpSession 0x7f44c4001d40 init:
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::HttpSession 0x7f44c4001d40 resolve_host:
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::HttpSession 0x7f44c4001d40 handle_resolve_host: r=0
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::PlainHttpSession 0x7f44c4001d40 connect:
Thread 10 "io_context_pool" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff27bc640 (LWP 32368)]
boost::asio::detail::scheduler::compensating_work_started (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:332
332 ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp: No such file or directory.
(gdb) bt
#0 boost::asio::detail::scheduler::compensating_work_started (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:332
#1 boost::asio::detail::epoll_reactor::perform_io_cleanup_on_block_exit::~perform_io_cleanup_on_block_exit (this=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:751
#2 boost::asio::detail::epoll_reactor::descriptor_state::perform_io (events=<optimized out>, this=0x7fffd4001da0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:803
#3 boost::asio::detail::epoll_reactor::descriptor_state::do_complete (bytes_transferred=<optimized out>, ec=..., base=0x7fffd4001da0, owner=0x5555559dc6c0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:813
#4 boost::asio::detail::epoll_reactor::descriptor_state::do_complete (owner=0x5555559dc6c0, base=0x7fffd4001da0, ec=..., bytes_transferred=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:805
#5 0x00007ffff7827e2e in boost::asio::detail::scheduler_operation::complete (bytes_transferred=16, ec=..., owner=0x5555559dc6c0, this=0x7fffd4001da0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#6 boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=<synthetic pointer>..., this=0x5555559dc6c0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:493
#7 boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] [clone .isra.0] (this=0x5555559dc6c0, ec=...) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:210
#8 0x00007ffff77d868f in boost::asio::io_context::run (this=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/impl/io_context.ipp:61
#9 ceph::async::io_context_pool::start(short)::{lambda()#1}::operator()() const (__closure=0x555555af83b8) at ./src/common/async/context_pool.h:63
#10 std::__invoke_impl<void, ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_other, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#11 std::__invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#12 std::invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/functional:97
#13 _ZZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEESt6threadSt17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_ENKUlSA_SD_E_clIS3_JEEEDaSA_SD_ (fun=..., __closure=0x555555af83c0) at ./src/common/Thread.h:79
#14 _ZSt13__invoke_implIvZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEESt6threadSt17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSB_SE_E_JS4_EESA_St14__invoke_otherOT0_DpOT1_ (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#15 _ZSt8__invokeIZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEESt6threadSt17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSB_SE_E_JS4_EENSt15__invoke_resultISA_JDpSC_EE4typeESB_SE_ (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#16 _ZNSt6thread8_InvokerISt5tupleIJZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEES_St17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSC_SF_E_S6_EEE9_M_invokeIJLm0ELm1EEEEvSt12_Index_tupleIJXspT_EEE (this=0x555555af83b8) at /usr/include/c++/11/bits/std_thread.h:253
#17 _ZNSt6thread8_InvokerISt5tupleIJZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEES_St17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSC_SF_E_S6_EEEclEv (this=0x555555af83b8) at /usr/include/c++/11/bits/std_thread.h:260
#18 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEES_St17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSD_SG_E_S7_EEEEE6_M_runEv (this=0x555555af83b0) at /usr/include/c++/11/bits/std_thread.h:211
#19 0x00007ffff6bcd2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007ffff6854b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#21 0x00007ffff68e6a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Updated by Ilya Dryomov over 2 years ago
First observed in https://pulpito.ceph.com/yuriw-2023-11-10_19:55:46-rbd-wip-yuri5-testing-2023-11-10-0828-distro-default-smithi/ on 9.stream.
Updated by Ilya Dryomov over 2 years ago
There were no changes in librbd in this area. I suspect https://github.com/ceph/ceph/pull/50821.
Updated by Ilya Dryomov over 2 years ago
Ilya Dryomov wrote:
I suspect https://github.com/ceph/ceph/pull/50821.
Confirmed: https://github.com/ceph/ceph/pull/50821#issuecomment-1838184215
Updated by Casey Bodley over 2 years ago
trying to retrace Jason's steps from https://github.com/ceph/ceph/pull/38000#pullrequestreview-526802351:
~/ceph/build $ nm -C lib/librbd.so | grep -w top_ 00000000007ff4b0 b guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 00000000007ff4b8 b guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_ 0000000000800760 b guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_ 00000000007ff4c4 b boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 00000000007ff4c8 b boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_ 0000000000800768 b boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_ ~/ceph/build $ nm -C lib/librados.so | grep -w top_ 00000000001847d8 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 00000000001847e0 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_ 00000000001847ec u boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 00000000001847f0 u boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_ ~/ceph/build $ nm -C lib/libceph-common.so | grep -w top_ 0000000000f79668 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 0000000000f49c78 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_ 0000000000f79674 u boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 0000000000f49c84 u boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
i now see librbd's symbols with 'b' instead of 'u'
with the commits you mentioned (05c341b30deab327444eac464e24a840dae25083 and f479bbc4bc118989849fb663b5dcf5f46f05097d) reverted, i see them as 'u' again:
~/ceph/build $ nm -C lib/librbd.so | grep -w top_ 0000000000811258 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 0000000000811260 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_ 0000000000811248 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_ 0000000000811274 u boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_ 0000000000811278 u boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_ 000000000081126c u boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_
Updated by Casey Bodley over 2 years ago
i'm not familiar with the --version-script stuff outside of what we did in librados.map. i tried adding a similar librbd.map in https://github.com/ceph/ceph/pull/54788, but couldn't manage to turn those 'b's into 'u's
Updated by Ilya Dryomov over 2 years ago
Casey Bodley wrote:
i tried adding a similar
librbd.mapin https://github.com/ceph/ceph/pull/54788, but couldn't manage to turn those 'b's into 'u's
I think the issue might be that static initialization is now split between librados and librbd, with one of these problematic variables going one way and others the other way. I would look in the direction of restoring some of the ASIO "heaviness" in librados, so that all of it continues to happen there and gets covered by librados.map.
Updated by Casey Bodley over 2 years ago
Ilya Dryomov wrote:
I think the issue might be that static initialization is now split between librados and librbd, with one of these problematic variables going one way and others the other way.
i had assumed that the 'guard variable for ...' was related to that static initialization. if those were globally unique, would it matter which library accesses them first?
Updated by Casey Bodley over 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 54839
Updated by Ilya Dryomov over 2 years ago
- Status changed from Fix Under Review to Resolved
Updated by Upkeep Bot 8 months ago
- Merge Commit set to d5122b9c793e599ce82481cc926563c06964d397
- Fixed In set to v19.0.0-62-gd5122b9c793
- Released In set to v19.2.0~1108
- Upkeep Timestamp set to 2025-07-11T18:11:24+00:00