Bug #42884
closedOSDMapTest.CleanPGUpmaps failure
0%
Description
During a make check build for a PR, I got this crash during OSD testing:
https://jenkins.ceph.com/job/ceph-pull-requests/38964/console
[ OK ] OSDMapTest.PrimaryAffinity (882 ms) [ RUN ] OSDMapTest.get_osd_crush_node_flags [ OK ] OSDMapTest.get_osd_crush_node_flags (1 ms) [ RUN ] OSDMapTest.parse_osd_id_list Expected option value to be integer, got 'foo'invalid osd id 'foo'expected numerical value, got: -12invalid osd id '-12'[ OK ] OSDMapTest.parse_osd_id_list (0 ms) [ RUN ] OSDMapTest.CleanPGUpmaps pure virtual method called terminate called without an active exception *** Caught signal (Aborted) ** in thread 7fc6950aa700 thread_name:clean_upmap_tp ceph version Development (no_version) octopus (dev) 1: /home/jenkins-build/build/workspace/ceph-pull-requests/build/bin/unittest_osdmap() [0x62fa00] 2: (()+0x11390) [0x7fc6a70ff390] 3: (gsignal()+0x38) [0x7fc69bbb2428] 4: (abort()+0x16a) [0x7fc69bbb402a] 5: (()+0x998ae) [0x7fc69c1f88ae] 6: (()+0xa54b6) [0x7fc69c2044b6] 7: (()+0xa5521) [0x7fc69c204521] 8: (()+0xa626f) [0x7fc69c20526f] 9: (ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_dequeue()+0x23) [0x5def41] 10: (ThreadPool::worker(ThreadPool::WorkThread*)+0x517) [0x7fc69de88e2b] 11: (ThreadPool::WorkThread::entry()+0x32) [0x7fc69de8cff0] 12: (Thread::entry_wrapper()+0x78) [0x7fc69de68bac] 13: (Thread::_entry_func(void*)+0x18) [0x7fc69de68b2a] 14: (()+0x76ba) [0x7fc6a70f56ba] 15: (clone()+0x6d) [0x7fc69bc8441d] 2019-11-19T13:31:17.884+0000 7fc6950aa700 -1 *** Caught signal (Aborted) ** in thread 7fc6950aa700 thread_name:clean_upmap_tp
Build is based on commit 05d685dd37b34f2a0, with some cephfs patches on top (nothing that should affect OSD tests).
Updated by Jeff Layton over 6 years ago
- Project changed from Ceph to RADOS
- Subject changed from crash workqueue code to OSDMapTest.CleanPGUpmaps failure
Updated by Kefu Chai over 6 years ago
not reproducible locally. i am testing 9b61479da4f89014b6d1857287102bbc9db13e6e
Updated by Laura Flores over 3 years ago
- Tags set to test-failure
Updated by Rongqi Sun over 1 year ago
Hi @MOHIT AGRAWAL , would you mind having a loot at this too?
[ RUN ] OSDMapTest.CleanPGUpmaps
pure virtual method called
terminate called without an active exception
*** Caught signal (Aborted) **
in thread ffff9ace5d60 thread_name:clean_upmap_tp
ceph version Development (no_version) squid (dev)
1: /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/build/bin/unittest_osdmap(+0x2e5c88) [0xaaaac36c5c88]
2: __kernel_rt_sigreturn()
3: /lib/aarch64-linux-gnu/libc.so.6(+0x7f200) [0xffffa184f200]
4: raise()
5: abort()
6: (__gnu_cxx::__verbose_terminate_handler()+0x124) [0xffffa1afb364]
7: /lib/aarch64-linux-gnu/libstdc++.so.6(+0xa8a0c) [0xffffa1af8a0c]
8: /lib/aarch64-linux-gnu/libstdc++.so.6(+0xa8a70) [0xffffa1af8a70]
9: __cxa_deleted_virtual()
10: (ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_dequeue()+0x20) [0xaaaac36136dc]
11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x5f8) [0xffffa3a63bfc]
12: (ThreadPool::WorkThread::entry()+0x24) [0xffffa3a693a8]
13: (Thread::entry_wrapper()+0xa0) [0xffffa3a3a31c]
14: (Thread::_entry_func(void*)+0x18) [0xffffa3a3a268]
15: /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8) [0xffffa184d5c8]
16: /lib/aarch64-linux-gnu/libc.so.6(+0xe5edc) [0xffffa18b5edc]
2024-06-28T17:03:41.116-0400 ffff9ace5d60 -1 *** Caught signal (Aborted) **
in thread ffff9ace5d60 thread_name:clean_upmap_tp
ceph version Development (no_version) squid (dev)
1: /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/build/bin/unittest_osdmap(+0x2e5c88) [0xaaaac36c5c88]
2: __kernel_rt_sigreturn()
3: /lib/aarch64-linux-gnu/libc.so.6(+0x7f200) [0xffffa184f200]
4: raise()
5: abort()
6: (__gnu_cxx::__verbose_terminate_handler()+0x124) [0xffffa1afb364]
7: /lib/aarch64-linux-gnu/libstdc++.so.6(+0xa8a0c) [0xffffa1af8a0c]
8: /lib/aarch64-linux-gnu/libstdc++.so.6(+0xa8a70) [0xffffa1af8a70]
9: __cxa_deleted_virtual()
10: (ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_dequeue()+0x20) [0xaaaac36136dc]
11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x5f8) [0xffffa3a63bfc]
12: (ThreadPool::WorkThread::entry()+0x24) [0xffffa3a693a8]
13: (Thread::entry_wrapper()+0xa0) [0xffffa3a3a31c]
14: (Thread::_entry_func(void*)+0x18) [0xffffa3a3a268]
15: /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8) [0xffffa184d5c8]
16: /lib/aarch64-linux-gnu/libc.so.6(+0xe5edc) [0xffffa18b5edc]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
--- begin dump of recent events ---
-26> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command assert hook 0xaaaaded90070
-25> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command abort hook 0xaaaaded90070
-24> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command leak_some_memory hook 0xaaaaded90070
-23> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command perfcounters_dump hook 0xaaaaded90070
-22> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command 1 hook 0xaaaaded90070
-21> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command perf dump hook 0xaaaaded90070
-20> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command perfcounters_schema hook 0xaaaaded90070
-19> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command perf histogram dump hook 0xaaaaded90070
-18> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command 2 hook 0xaaaaded90070
-17> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command perf schema hook 0xaaaaded90070
-16> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command counter dump hook 0xaaaaded90070
-15> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command counter schema hook 0xaaaaded90070
-14> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command perf histogram schema hook 0xaaaaded90070
-13> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command perf reset hook 0xaaaaded90070
-12> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command config show hook 0xaaaaded90070
-11> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command config help hook 0xaaaaded90070
-10> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command config set hook 0xaaaaded90070
-9> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command config unset hook 0xaaaaded90070
-8> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command config get hook 0xaaaaded90070
-7> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command config diff hook 0xaaaaded90070
-6> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command config diff get hook 0xaaaaded90070
-5> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command injectargs hook 0xaaaaded90070
-4> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command log flush hook 0xaaaaded90070
-3> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command log dump hook 0xaaaaded90070
-2> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command log reopen hook 0xaaaaded90070
-1> 2024-06-28T17:03:38.712-0400 ffffa4ae5020 5 asok(0xaaaadee11660) register_command dump_mempools hook 0xaaaadeec0a88
0> 2024-06-28T17:03:41.116-0400 ffff9ace5d60 -1 *** Caught signal (Aborted) **
in thread ffff9ace5d60 thread_name:clean_upmap_tp
ceph version Development (no_version) squid (dev)
1: /home/jenkins-build/build/workspace/ceph-pull-requests-arm64/build/bin/unittest_osdmap(+0x2e5c88) [0xaaaac36c5c88]
2: __kernel_rt_sigreturn()
3: /lib/aarch64-linux-gnu/libc.so.6(+0x7f200) [0xffffa184f200]
4: raise()
5: abort()
6: (__gnu_cxx::__verbose_terminate_handler()+0x124) [0xffffa1afb364]
7: /lib/aarch64-linux-gnu/libstdc++.so.6(+0xa8a0c) [0xffffa1af8a0c]
8: /lib/aarch64-linux-gnu/libstdc++.so.6(+0xa8a70) [0xffffa1af8a70]
9: __cxa_deleted_virtual()
10: (ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_dequeue()+0x20) [0xaaaac36136dc]
11: (ThreadPool::worker(ThreadPool::WorkThread*)+0x5f8) [0xffffa3a63bfc]
12: (ThreadPool::WorkThread::entry()+0x24) [0xffffa3a693a8]
13: (Thread::entry_wrapper()+0xa0) [0xffffa3a3a31c]
14: (Thread::_entry_func(void*)+0x18) [0xffffa3a3a268]
15: /lib/aarch64-linux-gnu/libc.so.6(+0x7d5c8) [0xffffa184d5c8]
16: /lib/aarch64-linux-gnu/libc.so.6(+0xe5edc) [0xffffa18b5edc]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Updated by Laura Flores over 1 year ago
- Assignee set to MOHIT AGRAWAL
@MOHIT AGRAWAL this looks similar to the other issue you were working on; maybe you have an idea.
Updated by MOHIT AGRAWAL over 1 year ago
The test case is getting crash during calling of clean_pg_upmap, the workflow of the function is like below
1) Create a threadpool object
2) Start a thread pool
3) Create a ParallelPGMapper object(mapper) and pass threadpool object reference
4) Create a CleanUmapJob object
5) Call mapper.queue function and insert the job
6) call job.wait
7) stop threadpool
If we do check the process core it is showing like below and it is not showing any object is corrupted.
Thread 2 (Thread 0x7f7848256200 (LWP 24355)):
#0 0x00007f7847eab470 in __lll_lock_wait () from /lib64/libc.so.6
#1 0x00007f7847eb1e61 in pthread_mutex_lock@@GLIBC_2.2.5 () from /lib64/libc.so.6
#2 0x00007f7849201913 in ceph::mutex_debug_detail::mutex_debug_impl<false>::lock_impl (this=this@entry=0x7ffdbce66660) at /nvme0/ceph/src/common/mutex_debug.h:122
#3 0x00007f7849201bf1 in ceph::mutex_debug_detail::mutex_debug_impl<false>::lock (this=0x7ffdbce66660, no_lockdep=no_lockdep@entry=false) at /nvme0/ceph/src/common/mutex_debug.h:188
#4 0x00007f78495e171c in ThreadPool::WorkQueue<ParallelPGMapper::Item>::queue (this=this@entry=0x7ffdbce66458, item=item@entry=0x565057f74dd0) at /nvme0/ceph/src/common/WorkQueue.h:251
#5 0x00007f78495e0dd8 in ParallelPGMapper::queue (this=this@entry=0x7ffdbce66400, job=job@entry=0x7ffdbce664a0, pgs_per_item=pgs_per_item@entry=256, input_pgs=std::vector of length 3, capacity 3 = {...}) at /nvme0/ceph/src/osd/OSDMapMapping.cc:189
#6 0x0000565057913f8f in OSDMapTest::clean_pg_upmaps (this=this@entry=0x565057f66740, cct=0x565057e1b080, om=..., pending_inc=...) at /nvme0/ceph/src/test/osd/TestOSDMap.cc:319
#7 0x00005650578e7453 in OSDMapTest_BUG_51842_Test::TestBody (this=0x565057f66740) at /nvme0/ceph/src/test/osd/TestOSDMap.cc:2295
#8 0x0000565057931cad in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x565057f66740, method=<optimized out>, location=location@entry=0x565057957748 "the test body") at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:2605
#9 0x000056505793a080 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=object@entry=0x565057f66740, method=&virtual testing::Test::TestBody(), location=location@entry=0x565057957748 "the test body") at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:2641
--Type <RET> for more, q to quit, c to continue without paging--
#10 0x000056505792af9a in testing::Test::Run (this=this@entry=0x565057f66740) at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:2680
#11 0x000056505792b0a0 in testing::TestInfo::Run (this=0x565057f5cc80) at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:2858
#12 0x000056505792b154 in testing::TestSuite::Run (this=0x565057dcf040) at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:3012
#13 0x000056505792c80b in testing::internal::UnitTestImpl::RunAllTests (this=0x565057e142a0) at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:5723
#14 0x0000565057931f59 in testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=object@entry=0x565057e142a0, method=<optimized out>, location=location@entry=0x565057958898 "auxiliary test code (environments or event listeners)") at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:2605
#15 0x000056505793a5dd in testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> (object=0x565057e142a0, method=(bool (testing::internal::UnitTestImpl::*)(testing::internal::UnitTestImpl * const)) 0x56505792c4a0 <testing::internal::UnitTestImpl::RunAllTests()>, location=location@entry=0x565057958898 "auxiliary test code (environments or event listeners)") at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:2641
#16 0x000056505792b26f in testing::UnitTest::Run (this=0x565057999c00 <testing::UnitTest::GetInstance()::instance>) at /nvme0/ceph/src/googletest/googletest/src/gtest.cc:5306
#17 0x00005650578ea2b8 in RUN_ALL_TESTS () at /nvme0/ceph/src/googletest/googletest/include/gtest/gtest.h:2486
#18 0x00005650578c3d84 in main (argc=<optimized out>, argv=0x7ffdbce68f68) at /nvme0/ceph/src/test/osd/TestOSDMap.cc:32
Thread 1 (Thread 0x7f783eb676c0 (LWP 24477)):
#0 0x00007f7847eb0884 in _pthread_kill_implementation () from /lib64/libc.so.6
#1 0x00007f7847e5fafe in raise () from /lib64/libc.so.6
#2 0x00005650579486ee in reraise_fatal (signum=signum@entry=6) at /nvme0/ceph/src/global/signal_handler.cc:88
#3 0x0000565057949a3c in handle_oneshot_fatal_signal (signum=6) at /nvme0/ceph/src/global/signal_handler.cc:367
#4 <signal handler called>
#5 0x00007f7847eb0884 in __pthread_kill_implementation () from /lib64/libc.so.6
#6 0x00007f7847e5fafe in raise () from /lib64/libc.so.6
#7 0x00007f7847e4887f in abort () from /lib64/libc.so.6
#8 0x00007f78480a4d39 in __gnu_cxx::_verbose_terminate_handler() [clone .cold] () from /lib64/libstdc++.so.6
#9 0x00007f78480b4f6c in _cxxabiv1::_terminate(void (*)()) () from /lib64/libstdc++.so.6
#10 0x00007f78480b4fd7 in std::terminate() () from /lib64/libstdc++.so.6
#11 0x00007f78480b5d15 in __cxa_pure_virtual () from /lib64/libstdc++.so.6
#12 0x00005650578e952e in ThreadPool::WorkQueue<ParallelPGMapper::Item>::_void_dequeue (this=<optimized out>) at /nvme0/ceph/src/common/WorkQueue.h:226
#13 0x00007f7849278c81 in ThreadPool::worker (this=0x7ffdbce665f0, wt=<optimized out>) at /nvme0/ceph/src/common/WorkQueue.cc:111
#14 0x00007f784927b835 in ThreadPool::WorkThread::entry (this=<optimized out>) at /nvme0/ceph/src/common/WorkQueue.h:401
#15 0x00007f7849267e51 in Thread::entry_wrapper (this=0x565057f74d50) at /nvme0/ceph/src/common/Thread.cc:87
#16 0x00007f7849267e69 in Thread::_entry_func (arg=<optimized out>) at /nvme0/ceph/src/common/Thread.cc:74
#17 0x00007f7847eae947 in start_thread () from /lib64/libc.so.6
#18 0x00007f7847f34860 in clone3 () from /lib64/libc.so.6
gdb) f 13
#13 0x00007f7849278c81 in ThreadPool::worker (this=0x7ffdbce665f0, wt=<optimized out>) at /nvme0/ceph/src/common/WorkQueue.cc:111
111 void item = wq->void_dequeue();
(gdb) p wq
$1 = (ThreadPool::WorkQueue *) 0x7ffdbce66458
(gdb) p *wq
$2 = {_vptr.WorkQueue_ = 0x565057993b10 <vtable for ParallelPGMapper::WQ+16>, name = "ParallelPGMapper::WQ",
timeout_interval = std::atomic<std::chrono::duration<unsigned long, std::ratio<1, 1000000000> >> = { std::chrono::duration = { 60000000000ns } },
suicide_interval = std::atomic<std::chrono::duration<unsigned long, std::ratio<1, 1000000000> >> = { std::chrono::duration = { 0ns } }}
(gdb) p ((ParallelPGMapper)0x7ffdbce66400)
$8 = (ParallelPGMapper ) 0x7ffdbce66400
(gdb) p ((ParallelPGMapper)0x7ffdbce66400)->wq
$9 = {<ThreadPool::WorkQueue<ParallelPGMapper::Item>> = {<ThreadPool::WorkQueue_> = {_vptr.WorkQueue_ = 0x565057993b10 <vtable for ParallelPGMapper::WQ+16>,
name = "ParallelPGMapper::WQ",
timeout_interval = std::atomic<std::chrono::duration<unsigned long, std::ratio<1, 1000000000> >> = { std::chrono::duration = { 60000000000ns } },
suicide_interval = std::atomic<std::chrono::duration<unsigned long, std::ratio<1, 1000000000> >> = { std::chrono::duration = { 0ns } }},
pool = 0x7ffdbce665f0}, m = 0x7ffdbce66400}
(gdb) p &((ParallelPGMapper*)0x7ffdbce66400)->wq
$10 = (ParallelPGMapper::WQ ) 0x7ffdbce66458
p *((ParallelPGMapper)0x7ffdbce66400)->wq->pool
$13 = {<ceph::md_config_obs_impl<ceph::common::ConfigProxy>> = {_vptr.md_config_obs_impl = 0x7f7849a99928 <vtable for ThreadPool+16>}, cct = 0x565057e1b080,
name = "BUG_40104::clean_upmap_tp", thread_name = "clean_upmap_tp", lockname = "BUG_40104::clean_upmap_tp::lock",
lock = {<ceph::mutex_debug_detail::mutex_debugging_base> = {group = "BUG_40104::clean_upmap_tp::lock", id = -1, lockdep = true, backtrace = false,
nlock = std::atomic<int> = { 1 }, locked_by = {_M_thread = 140154424948416}}, m = {_data = {__lock = 2, _count = 0, __owner = 24477, __nusers = 6,
__kind = 2, __spins = 0, __elision = 0, __list = {_prev = 0x0, _next = 0x0}},
__size = "\002\000\000\000\000\000\000\000\235\000\000\006\000\000\000\002", '\000' <repeats 22 times>, _align = 2}}, _cond = {cond = {_data = {
_wseq = {_value64 = 10, _value32 = {_low = 10, _high = 0}}, __g1_start = {_value64 = 0, _value32 = {_low = 0, _high = 0}}, __g_refs = {10,
0}, __g_size = {0, 0}, __g1_orig_size = 0, __wrefs = 40, __g_signals = {0, 0}},
__size = "\n", '\000' <repeats 15 times>, "\n", '\000' <repeats 19 times>, "(\000\000\000\000\000\000\000\000\000\000", __align = 10},
waiter_mutex = 0x7ffdbce66660}, _stop = false, _pause = 0, _draining = 0, _wait_cond = {cond = {_data = {__wseq = {__value64 = 0, _value32 = {
__low = 0, __high = 0}}, __g1_start = {_value64 = 0, _value32 = {_low = 0, _high = 0}}, __g_refs = {0, 0}, __g_size = {0, 0},
__g1_orig_size = 0, __wrefs = 0, __g_signals = {0, 0}}, __size = '\000' <repeats 47 times>, __align = 0}, waiter_mutex = 0x0}, _num_threads = 8,
_thread_num_option = "", _conf_keys = 0x565058024cc0, work_queues = std::vector of length 1, capacity 1 = {0x7ffdbce66458}, next_work_queue = 1,
_threads = std::set with 8 elements = {[0] = 0x565057f663d0, [1] = 0x565057f74d50, [2] = 0x565057f83500, [3] = 0x565057f83550, [4] = 0x565057f835a0,
[5] = 0x565057f835f0, [6] = 0x565057f83670, [7] = 0x5650580252b0}, _old_threads = empty std::_cxx11::list, processing = 0}
There are two threads (thread 1 and 2) and thread 2 is waiting on mapper.queue function because lock is already held by thread1(worker queue) and thread 1 is getting crashed while calling dequeue function.
Actually here the main issue is we are starting threadpool before creating a mapper object. Basically as we do start a threadpool eventually it call ThreadPool::worker function that function first access a workqueue object and try to dequeue the job.Here we are inserting a workqueue object during ParallelPGMapper object construction at the initialization(after calling pool->add_work_queue by WorkerQueue constructor). We can;t the say the object is fully constructed but at the same time if a worker thread fetch the object(wq) and try to call a function (void_deque_access()) it will crash because the object is not constructed completely and it is trying to access a virtual function and it throws an error "_cxa_pure_virtual". There are multiple ways to avoid it but i think the best option is we can construct a ParallelPGMapper object before starting a thread pool so that while worker thread try to fetch an object(wq) the object is fully constructed.I have tested a patch after calling a test case 100 times in a loop, I am not getting any crashes. Without a patch the test case is crashing almost 15 times.
Updated by MOHIT AGRAWAL over 1 year ago
- Status changed from New to Fix Under Review
Updated by Laura Flores over 1 year ago
QA testing ongoing here: https://tracker.ceph.com/issues/66955
Updated by Yuri Weinstein over 1 year ago
Updated by Radoslaw Zarzynski over 1 year ago
- Status changed from Fix Under Review to Pending Backport
- Backport set to quincy,reef,squid
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67235: reef: OSDMapTest.CleanPGUpmaps failure added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67236: quincy: OSDMapTest.CleanPGUpmaps failure added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67237: squid: OSDMapTest.CleanPGUpmaps failure added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 582e882c439e9f7acadd4caf4996089cefd12e07
- Fixed In set to v19.3.0-3759-g582e882c439
- Upkeep Timestamp set to 2025-07-09T18:56:48+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3759-g582e882c439 to v19.3.0-3759-g582e882c43
- Upkeep Timestamp changed from 2025-07-09T18:56:48+00:00 to 2025-07-14T18:12:48+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2384
- Upkeep Timestamp changed from 2025-07-14T18:12:48+00:00 to 2025-11-01T00:58:51+00:00
Updated by MOHIT AGRAWAL 3 months ago
- Status changed from Pending Backport to Closed
The pull request is merged.