Project

General

Profile

Actions

Bug #63682

closed

"rbd migration prepare" crashes when importing from http stream

Added by Ilya Dryomov over 2 years ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Fixed In:
v19.0.0-62-gd5122b9c793
Released In:
v19.2.0~1108
Upkeep Timestamp:
2025-07-11T18:11:24+00:00

Description

Seen on 22.04 and 9.stream so far:

2023-11-28T19:19:24.045 DEBUG:teuthology.orchestra.run.smithi106:> sudo TESTDIR=/home/ubuntu/cephtest bash -c 'echo '"'"'{"type":"qcow","stream":{"type":"http","url":"http://download.ceph.com/qa/ubuntu-12.04.qcow2"}}'"'"' | rbd migration prepare --import-only --source-spec-path - client.0.0'
2023-11-28T19:19:24.105 INFO:teuthology.orchestra.run.smithi106.stderr:*** Caught signal (Segmentation fault) **
2023-11-28T19:19:24.106 INFO:teuthology.orchestra.run.smithi106.stderr: in thread 7f26e1941640 thread_name:io_context_pool
2023-11-28T19:19:24.126 DEBUG:teuthology.orchestra.run:got remote process result: 139
2023-11-28T19:19:24.126 INFO:teuthology.orchestra.run.smithi106.stderr: ceph version 18.0.0-7498-g369173db (369173db14b6995b2bd07c60ec5f63d01cf21631) squid (dev)
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 1: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f26e5987520]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 2: /lib/librbd.so.1(+0x2e1a81) [0x7f26e6d08a81]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 3: /lib/librados.so.2(+0x111e2e) [0x7f26e69ade2e]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 4: /lib/librados.so.2(+0xc268f) [0x7f26e695e68f]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 5: /lib/x86_64-linux-gnu/libstdc++.so.6(+0xdc2b3) [0x7f26e5d522b3]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 6: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f26e59d9b43]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: 7: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f26e5a6ba00]
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr:2023-11-28T19:19:24.105+0000 7f26e1941640 -1 *** Caught signal (Segmentation fault) **
2023-11-28T19:19:24.127 INFO:teuthology.orchestra.run.smithi106.stderr: in thread 7f26e1941640 thread_name:io_context_pool

https://pulpito.ceph.com/yuriw-2023-11-28_18:52:19-rbd-wip-yuri10-testing-2023-11-22-1112-distro-default-smithi/

Actions #1

Updated by Ilya Dryomov over 2 years ago

The crash is in the bowels of ASIO while trying to connect:

2023-12-03T13:38:28.424+0000 7f44ebd49e00 10 librbd::Migration: prepare_import: {"type":"qcow","stream":{"type":"http","url":"http://download.ceph.com/qa/ubuntu-12.04.qcow2"}} -> test1/image1, opts=[]
2023-12-03T13:38:28.424+0000 7f44ebd49e00 10 librbd::migration::OpenSourceImageRequest: 0x55be3843c590 OpenSourceImageRequest:
2023-12-03T13:38:28.424+0000 7f44ebd49e00 10 librbd::migration::OpenSourceImageRequest: 0x55be3843c590 open_source:
2023-12-03T13:38:28.424+0000 7f44ebd49e00 20 librbd::asio::ContextWQ: 0x55be38695880 ContextWQ:
2023-12-03T13:38:28.424+0000 7f44ebd49e00 20 librbd::AsioEngine: 0x55be38695840 AsioEngine:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::ImageCtx: 0x55be38645fb0 ImageCtx: image_name=, image_id=
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=9
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::QueueImageDispatch: 0x55be38697530 QueueImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=1
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::QosImageDispatch: 0x55be38695eb0 QosImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=2
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::RefreshImageDispatch: 0x55be38697ad0 RefreshImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=4
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::WriteBlockImageDispatch: 0x55be38415120 WriteBlockImageDispatch: ictx=0x55be38645fb0
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::Dispatcher: 0x55be38697470 register_dispatch: dispatch_layer=7
2023-12-03T13:38:28.428+0000 7f44ebd49e00  5 librbd::io::Dispatcher: 0x55be386959b0 register_dispatch: dispatch_layer=6
2023-12-03T13:38:28.428+0000 7f44ebd49e00 15 librbd::migration::OpenSourceImageRequest: 0x55be3843c590 open_source: source_spec={"type":"qcow","stream":{"type":"http","url":"http://download.ceph.com/qa/ubuntu-12.04.qcow2"}}, source_snap_id=18446744073709551614, import_only=1
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::SourceSpecBuilder: 0x7ffc3b729e98 parse_source_spec:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::SourceSpecBuilder: 0x7ffc3b729e98 build_format:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::QCOWFormat: 0x55be38481620 open:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::SourceSpecBuilder: 0x7ffc3b729e98 build_stream:
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::HttpStream: 0x55be386950d0 open: url=http://download.ceph.com/qa/ubuntu-12.04.qcow2
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::HttpClient: 0x55be3847e050 open: url=http://download.ceph.com/qa/ubuntu-12.04.qcow2
2023-12-03T13:38:28.428+0000 7f44ebd49e00 10 librbd::migration::util::parse_url: url=http://download.ceph.com/qa/ubuntu-12.04.qcow2
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient: 0x55be3847e050 create_http_session:
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::HttpSession 0x7f44c4001d40 init:
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::HttpSession 0x7f44c4001d40 resolve_host:
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::HttpSession 0x7f44c4001d40 handle_resolve_host: r=0
2023-12-03T13:38:28.428+0000 7f44e8844640 15 librbd::migration::HttpClient::PlainHttpSession 0x7f44c4001d40 connect:
Thread 10 "io_context_pool" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff27bc640 (LWP 32368)]
boost::asio::detail::scheduler::compensating_work_started (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:332
332    ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp: No such file or directory.
(gdb) bt
#0  boost::asio::detail::scheduler::compensating_work_started (this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:332
#1  boost::asio::detail::epoll_reactor::perform_io_cleanup_on_block_exit::~perform_io_cleanup_on_block_exit (this=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:751
#2  boost::asio::detail::epoll_reactor::descriptor_state::perform_io (events=<optimized out>, this=0x7fffd4001da0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:803
#3  boost::asio::detail::epoll_reactor::descriptor_state::do_complete (bytes_transferred=<optimized out>, ec=..., base=0x7fffd4001da0, owner=0x5555559dc6c0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:813
#4  boost::asio::detail::epoll_reactor::descriptor_state::do_complete (owner=0x5555559dc6c0, base=0x7fffd4001da0, ec=..., bytes_transferred=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/epoll_reactor.ipp:805
#5  0x00007ffff7827e2e in boost::asio::detail::scheduler_operation::complete (bytes_transferred=16, ec=..., owner=0x5555559dc6c0, this=0x7fffd4001da0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/scheduler_operation.hpp:40
#6  boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=<synthetic pointer>..., this=0x5555559dc6c0) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:493
#7  boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] [clone .isra.0] (this=0x5555559dc6c0, ec=...) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/detail/impl/scheduler.ipp:210
#8  0x00007ffff77d868f in boost::asio::io_context::run (this=<optimized out>, this=<optimized out>) at ./obj-x86_64-linux-gnu/boost/include/boost/asio/impl/io_context.ipp:61
#9  ceph::async::io_context_pool::start(short)::{lambda()#1}::operator()() const (__closure=0x555555af83b8) at ./src/common/async/context_pool.h:63
#10 std::__invoke_impl<void, ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_other, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#11 std::__invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#12 std::invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/functional:97
#13 _ZZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEESt6threadSt17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_ENKUlSA_SD_E_clIS3_JEEEDaSA_SD_ (fun=..., __closure=0x555555af83c0) at ./src/common/Thread.h:79
#14 _ZSt13__invoke_implIvZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEESt6threadSt17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSB_SE_E_JS4_EESA_St14__invoke_otherOT0_DpOT1_ (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#15 _ZSt8__invokeIZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEESt6threadSt17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSB_SE_E_JS4_EENSt15__invoke_resultISA_JDpSC_EE4typeESB_SE_ (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#16 _ZNSt6thread8_InvokerISt5tupleIJZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEES_St17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSC_SF_E_S6_EEE9_M_invokeIJLm0ELm1EEEEvSt12_Index_tupleIJXspT_EEE (this=0x555555af83b8) at /usr/include/c++/11/bits/std_thread.h:253
#17 _ZNSt6thread8_InvokerISt5tupleIJZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEES_St17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSC_SF_E_S6_EEEclEv (this=0x555555af83b8) at /usr/include/c++/11/bits/std_thread.h:260
#18 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZ17make_named_threadIZN4ceph5async15io_context_pool5startEsEUlvE_JEES_St17basic_string_viewIcSt11char_traitsIcEEOT_DpOT0_EUlSD_SG_E_S7_EEEEE6_M_runEv (this=0x555555af83b0) at /usr/include/c++/11/bits/std_thread.h:211
#19 0x00007ffff6bcd2b3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007ffff6854b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#21 0x00007ffff68e6a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Actions #3

Updated by Ilya Dryomov over 2 years ago

There were no changes in librbd in this area. I suspect https://github.com/ceph/ceph/pull/50821.

Actions #5

Updated by Casey Bodley over 2 years ago

trying to retrace Jason's steps from https://github.com/ceph/ceph/pull/38000#pullrequestreview-526802351:

~/ceph/build $ nm -C lib/librbd.so | grep -w top_
00000000007ff4b0 b guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
00000000007ff4b8 b guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
0000000000800760 b guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_
00000000007ff4c4 b boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
00000000007ff4c8 b boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
0000000000800768 b boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_
~/ceph/build $ nm -C lib/librados.so | grep -w top_
00000000001847d8 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
00000000001847e0 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
00000000001847ec u boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
00000000001847f0 u boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
~/ceph/build $ nm -C lib/libceph-common.so | grep -w top_
0000000000f79668 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
0000000000f49c78 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
0000000000f79674 u boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
0000000000f49c84 u boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_

i now see librbd's symbols with 'b' instead of 'u'

with the commits you mentioned (05c341b30deab327444eac464e24a840dae25083 and f479bbc4bc118989849fb663b5dcf5f46f05097d) reverted, i see them as 'u' again:

~/ceph/build $ nm -C lib/librbd.so | grep -w top_
0000000000811258 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
0000000000811260 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
0000000000811248 u guard variable for boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_
0000000000811274 u boost::asio::detail::call_stack<boost::asio::detail::strand_service::strand_impl, unsigned char>::top_
0000000000811278 u boost::asio::detail::call_stack<boost::asio::detail::thread_context, boost::asio::detail::thread_info_base>::top_
000000000081126c u boost::asio::detail::call_stack<boost::asio::detail::strand_executor_service::strand_impl, unsigned char>::top_

Actions #6

Updated by Casey Bodley over 2 years ago

i'm not familiar with the --version-script stuff outside of what we did in librados.map. i tried adding a similar librbd.map in https://github.com/ceph/ceph/pull/54788, but couldn't manage to turn those 'b's into 'u's

Actions #7

Updated by Ilya Dryomov over 2 years ago

Casey Bodley wrote:

i tried adding a similar librbd.map in https://github.com/ceph/ceph/pull/54788, but couldn't manage to turn those 'b's into 'u's

I think the issue might be that static initialization is now split between librados and librbd, with one of these problematic variables going one way and others the other way. I would look in the direction of restoring some of the ASIO "heaviness" in librados, so that all of it continues to happen there and gets covered by librados.map.

Actions #8

Updated by Casey Bodley over 2 years ago

Ilya Dryomov wrote:

I think the issue might be that static initialization is now split between librados and librbd, with one of these problematic variables going one way and others the other way.

i had assumed that the 'guard variable for ...' was related to that static initialization. if those were globally unique, would it matter which library accesses them first?

Actions #9

Updated by Casey Bodley over 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 54839
Actions #10

Updated by Ilya Dryomov over 2 years ago

  • Assignee set to Casey Bodley
Actions #11

Updated by Ilya Dryomov over 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions #12

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to d5122b9c793e599ce82481cc926563c06964d397
  • Fixed In set to v19.0.0-62-gd5122b9c793
  • Released In set to v19.2.0~1108
  • Upkeep Timestamp set to 2025-07-11T18:11:24+00:00
Actions

Also available in: Atom PDF