Project

General

Profile

Actions

Bug #70553

open

Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount()

Added by Xavi Hernandez about 1 year ago. Updated 7 months ago.

Status:
Pending Backport
Priority:
High
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
other
Backport:
tentacle,squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-2194-g416c794dad
Released In:
Upkeep Timestamp:
2025-08-11T03:18:35+00:00

Description

During some tests using the libcephfs proxy, we have seen an issue that happens when ceph_unmount() is called. All operations seem to work fine until we try to unmount the volume. At this point the request gets blocked.

We are using a recent version from main branch (commit 844e80bf80).

This is the stack trace at this point:

(gdb) i th
  Id   Target Id                                            Frame 
* 1    Thread 0x7fb4c280d540 (LWP 657460) "libcephfsd"      0x00007fb4c270f83f in accept () from /lib64/libc.so.6
  2    Thread 0x7fb4bf005640 (LWP 657461) "libcephfsd"      0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  3    Thread 0x7fb4be804640 (LWP 657676) "libcephfsd"      0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  4    Thread 0x7fb4a0ff9640 (LWP 657678) "log"             0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  5    Thread 0x7fb4bd57b640 (LWP 657679) "io_context_pool" 0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  6    Thread 0x7fb4b5ffb640 (LWP 657680) "io_context_pool" 0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  7    Thread 0x7fb4be003640 (LWP 657681) "msgr-worker-0"   0x00007fb4c270e6ae in epoll_wait () from /lib64/libc.so.6
  8    Thread 0x7fb4bcd7a640 (LWP 657682) "msgr-worker-1"   0x00007fb4c270e6ae in epoll_wait () from /lib64/libc.so.6
  9    Thread 0x7fb4b7fff640 (LWP 657683) "msgr-worker-2"   0x00007fb4c270e6ae in epoll_wait () from /lib64/libc.so.6
  10   Thread 0x7fb4b77fe640 (LWP 657687) "service"         0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  11   Thread 0x7fb4b6ffd640 (LWP 657688) "ceph_timer"      0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  12   Thread 0x7fb4b67fc640 (LWP 657689) "fn_anonymous"    0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  13   Thread 0x7fb4b57fa640 (LWP 657690) "safe_timer"      0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  14   Thread 0x7fb4b4ff9640 (LWP 657691) "fn_anonymous"    0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  15   Thread 0x7fb4a3fff640 (LWP 657692) "flusher"         0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  16   Thread 0x7fb4a37fe640 (LWP 657693) "ms_dispatch"     0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  17   Thread 0x7fb4a2ffd640 (LWP 657694) "ms_local"        0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  18   Thread 0x7fb4a27fc640 (LWP 657695) "safe_timer"      0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  19   Thread 0x7fb4a1ffb640 (LWP 657696) "libcephfsd"      0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
  20   Thread 0x7fb4a17fa640 (LWP 657697) "libcephfsd"      0x00007fb4c2703a1d in readv () from /lib64/libc.so.6
(gdb) t a a bt

Thread 20 (Thread 0x7fb4a17fa640 (LWP 657697) "libcephfsd"):
#0  0x00007fb4c2703a1d in readv () from /lib64/libc.so.6
#1  0x000000000040bdd6 in proxy_link_recv (sd=sd@entry=4, iov=0x7fb4a17f5ed0, iov@entry=0x7fb4a17f5fa0, count=count@entry=1) at proxy_link.c:729
#2  0x000000000040bfed in proxy_link_req_recv (sd=4, iov=iov@entry=0x7fb4a17f5fa0, count=count@entry=2) at proxy_link.c:787
#3  0x0000000000403f9a in serve_binary (client=0x1fcdcd0) at libcephfsd.c:1685
#4  serve_connection (worker=0x1fcdcd0) at libcephfsd.c:1745
#5  0x000000000040e550 in proxy_worker_start (arg=0x1fcdcd0) at proxy_manager.c:66
#6  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#7  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 19 (Thread 0x7fb4a1ffb640 (LWP 657696) "libcephfsd"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689a10 in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
#2  0x00007fb4c3135abb in std::__condvar::wait_until (__abs_time=..., __clock=1, __m=..., this=<optimized out>) at /usr/include/c++/11/bits/std_mutex.h:169
#3  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., __lock=..., __atime=..., this=<optimized out>) at /usr/include/c++/11/condition_variable:201
#4  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=<optimized out>) at /usr/include/c++/11/condition_variable:111
#5  std::condition_variable::wait_for<long, std::ratio<1l, 1000000000l> > (__rtime=..., __rtime=..., __lock=..., this=<optimized out>) at /usr/include/c++/11/condition_variable:163
#6  operator() (__closure=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/client/Client.cc:7308
#7  std::__invoke_impl<void, Client::start_tick_thread()::<lambda()> > (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#8  std::__invoke<Client::start_tick_thread()::<lambda()> > (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#9  std::thread::_Invoker<std::tuple<Client::start_tick_thread()::<lambda()> > >::_M_invoke<0> (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
#10 std::thread::_Invoker<std::tuple<Client::start_tick_thread()::<lambda()> > >::operator() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
#11 std::thread::_State_impl<std::thread::_Invoker<std::tuple<Client::start_tick_thread()::<lambda()> > > >::_M_run(void) (this=0x7fb4b80d9c80) at /usr/include/c++/11/bits/std_thread.h:211
#12 0x00007fb4c14dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#13 0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#14 0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 18 (Thread 0x7fb4a27fc640 (LWP 657695) "safe_timer"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689a10 in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
#2  0x00007fb4c1e95e23 in CommonSafeTimer<std::mutex>::timer_thread() () from /usr/lib64/ceph/libceph-common.so.2
#3  0x00007fb4c1e965f1 in CommonSafeTimerThread<std::mutex>::entry() () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#5  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 17 (Thread 0x7fb4a2ffd640 (LWP 657694) "ms_local"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c14d56c0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#3  0x00007fb4c1f9e9f9 in DispatchQueue::run_local_delivery() () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c203c251 in DispatchQueue::LocalDeliveryThread::entry() () from /usr/lib64/ceph/libceph-common.so.2
#5  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#6  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 16 (Thread 0x7fb4a37fe640 (LWP 657693) "ms_dispatch"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c14d56c0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#3  0x00007fb4c1f9ee72 in DispatchQueue::entry() () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c203c231 in DispatchQueue::DispatchThread::entry() () from /usr/lib64/ceph/libceph-common.so.2
#5  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#6  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 15 (Thread 0x7fb4a3fff640 (LWP 657692) "flusher"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689a10 in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
#2  0x00007fb4c3148c70 in std::__condvar::wait_until (__abs_time=..., __clock=1, __m=..., this=0x7fb4b8046c38) at /usr/include/c++/11/bits/std_mutex.h:169
#3  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., __lock=..., __atime=..., this=0x7fb4b8046c38) at /usr/include/c++/11/condition_variable:201
#4  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=<optimized out>, this=<optimized out>, __lock=..., __atime=...) at /usr/include/c++/11/condition_variable:111
#5  std::condition_variable::wait_for<long, std::ratio<1l, 1l> > (__rtime=..., __rtime=..., __lock=..., this=0x7fb4b8046c38) at /usr/include/c++/11/condition_variable:163
#6  ObjectCacher::flusher_entry (this=0x7fb4b8046a40) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/osdc/ObjectCacher.cc:2013
#7  0x00007fb4c313a7b1 in ObjectCacher::FlusherThread::entry (this=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/osdc/ObjectCacher.h:441
#8  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#9  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 14 (Thread 0x7fb4b4ff9640 (LWP 657691) "fn_anonymous"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c14d56c0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#3  0x00007fb4c1e5bd4e in Finisher::finisher_thread_entry() () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#5  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 13 (Thread 0x7fb4b57fa640 (LWP 657690) "safe_timer"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c14d56c0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#3  0x00007fb4c1e95c8b in CommonSafeTimer<std::mutex>::timer_thread() () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c1e965f1 in CommonSafeTimerThread<std::mutex>::entry() () from /usr/lib64/ceph/libceph-common.so.2
#5  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#6  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 12 (Thread 0x7fb4b67fc640 (LWP 657689) "fn_anonymous"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c14d56c0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#3  0x00007fb4c1e5bd4e in Finisher::finisher_thread_entry() () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#5  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 11 (Thread 0x7fb4b6ffd640 (LWP 657688) "ceph_timer"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689a10 in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
#2  0x00007fb4c220fca3 in ceph::timer<ceph::coarse_mono_clock>::timer_thread() () from /usr/lib64/ceph/libceph-common.so.2
#3  0x00007fb4c14dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#4  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#5  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 10 (Thread 0x7fb4b77fe640 (LWP 657687) "service"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689a10 in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
#2  0x00007fb4c1eb418e in ceph::common::CephContextServiceThread::entry() () from /usr/lib64/ceph/libceph-common.so.2
#3  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#4  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 9 (Thread 0x7fb4b7fff640 (LWP 657683) "msgr-worker-2"):
#0  0x00007fb4c270e6ae in epoll_wait () from /lib64/libc.so.6
#1  0x00007fb4c2087221 in EpollDriver::event_wait(std::vector<FiredFileEvent, std::allocator<FiredFileEvent> >&, timeval*) () from /usr/lib64/ceph/libceph-common.so.2
#2  0x00007fb4c20853e2 in EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) () from /usr/lib64/ceph/libceph-common.so.2
#3  0x00007fb4c2085e56 in std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c14dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#5  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#6  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 8 (Thread 0x7fb4bcd7a640 (LWP 657682) "msgr-worker-1"):
#0  0x00007fb4c270e6ae in epoll_wait () from /lib64/libc.so.6
#1  0x00007fb4c2087221 in EpollDriver::event_wait(std::vector<FiredFileEvent, std::allocator<FiredFileEvent> >&, timeval*) () from /usr/lib64/ceph/libceph-common.so.2
#2  0x00007fb4c20853e2 in EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) () from /usr/lib64/ceph/libceph-common.so.2
#3  0x00007fb4c2085e56 in std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c14dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#5  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#6  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 7 (Thread 0x7fb4be003640 (LWP 657681) "msgr-worker-0"):
#0  0x00007fb4c270e6ae in epoll_wait () from /lib64/libc.so.6
#1  0x00007fb4c2087221 in EpollDriver::event_wait(std::vector<FiredFileEvent, std::allocator<FiredFileEvent> >&, timeval*) () from /usr/lib64/ceph/libceph-common.so.2
#2  0x00007fb4c20853e2 in EventCenter::process_events(unsigned int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*) () from /usr/lib64/ceph/libceph-common.so.2
#3  0x00007fb4c2085e56 in std::_Function_handler<void (), NetworkStack::add_thread(Worker*)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c14dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#5  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#6  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 6 (Thread 0x7fb4b5ffb640 (LWP 657680) "io_context_pool"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c3163eb0 in boost::asio::detail::posix_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock> (lock=<synthetic pointer>..., this=0x7fb4b806bde0) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/posix_event.hpp:119
#3  boost::asio::detail::conditionally_enabled_event::wait (lock=<synthetic pointer>..., this=0x7fb4b806bdd8) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/conditionally_enabled_event.hpp:97
#4  boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=<synthetic pointer>..., this=0x7fb4b806bd70) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:502
#5  boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] [clone .isra.0] (this=0x7fb4b806bd70, ec=...) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:210
#6  0x00007fb4c3087a27 in boost::asio::io_context::run (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/impl/io_context.ipp:64
#7  ceph::async::io_context_pool::start(short)::{lambda()#1}::operator()() const (__closure=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/common/async/context_pool.h:69
#8  std::__invoke_impl<void, ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_other, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#9  std::__invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#10 std::invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/functional:97
#11 make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}::operator()<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) const (fun=..., __closure=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/common/Thread.h:79
#12 std::__invoke_impl<void, make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_other, make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}&&, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#13 std::__invoke<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#14 std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
#15 std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}> >::operator()() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
#16 std::thread::_State_impl<std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}> > >::_M_run() (this=0x7fb4b8183570) at /usr/include/c++/11/bits/std_thread.h:211
#17 0x00007fb4c14dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#18 0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#19 0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 5 (Thread 0x7fb4bd57b640 (LWP 657679) "io_context_pool"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c3163eb0 in boost::asio::detail::posix_event::wait<boost::asio::detail::conditionally_enabled_mutex::scoped_lock> (lock=<synthetic pointer>..., this=0x7fb4b806bde0) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/posix_event.hpp:119
#3  boost::asio::detail::conditionally_enabled_event::wait (lock=<synthetic pointer>..., this=0x7fb4b806bdd8) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/conditionally_enabled_event.hpp:97
#4  boost::asio::detail::scheduler::do_run_one (ec=..., this_thread=..., lock=<synthetic pointer>..., this=0x7fb4b806bd70) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:502
#5  boost::asio::detail::scheduler::run(boost::system::error_code&) [clone .constprop.0] [clone .isra.0] (this=0x7fb4b806bd70, ec=...) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/detail/impl/scheduler.ipp:210
#6  0x00007fb4c3087a27 in boost::asio::io_context::run (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/redhat-linux-build/boost/include/boost/asio/impl/io_context.ipp:64
#7  ceph::async::io_context_pool::start(short)::{lambda()#1}::operator()() const (__closure=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/common/async/context_pool.h:69
#8  std::__invoke_impl<void, ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_other, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#9  std::__invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#10 std::invoke<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/functional:97
#11 make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}::operator()<ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&) const (fun=..., __closure=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/common/Thread.h:79
#12 std::__invoke_impl<void, make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::__invoke_other, make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}&&, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__f=...) at /usr/include/c++/11/bits/invoke.h:61
#13 std::__invoke<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}>(ceph::async::io_context_pool::start(short)::{lambda()#1}&&, ceph::async::io_context_pool::start(short)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/11/bits/invoke.h:96
#14 std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}> >::_M_invoke<0ul, 1ul>(std::_Index_tuple<0ul, 1ul>) (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:259
#15 std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}> >::operator()() (this=<optimized out>) at /usr/include/c++/11/bits/std_thread.h:266
#16 std::thread::_State_impl<std::thread::_Invoker<std::tuple<make_named_thread<ceph::async::io_context_pool::start(short)::{lambda()#1}>(std::basic_string_view<char, std::char_traits<char> >, ceph::async::io_context_pool::start(short)::{lambda()#1}&&)::{lambda(auto:1&&, (auto:2&&)...)#1}, ceph::async::io_context_pool::start(short)::{lambda()#1}> > >::_M_run() (this=0x7fb4b8182d90) at /usr/include/c++/11/bits/std_thread.h:211
#17 0x00007fb4c14dbae4 in execute_native_thread_routine () from /lib64/libstdc++.so.6
#18 0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#19 0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 4 (Thread 0x7fb4a0ff9640 (LWP 657678) "log"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007fb4c14d56c0 in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /lib64/libstdc++.so.6
#3  0x00007fb4c20b43e2 in ceph::logging::Log::entry() () from /usr/lib64/ceph/libceph-common.so.2
#4  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#5  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 3 (Thread 0x7fb4be804640 (LWP 657676) "libcephfsd"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689a10 in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
#2  0x00007fb4c30e0f45 in std::__condvar::wait_until (__abs_time=..., __clock=1, __m=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/bits/std_mutex.h:169
#3  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., __lock=..., __atime=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/condition_variable:201
#4  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/condition_variable:111
#5  std::condition_variable::wait_for<long, std::ratio<1l, 1000000000l> > (__rtime=..., __rtime=..., __lock=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/condition_variable:163
#6  Client::_unmount (this=0x7fb4b819c590, abort=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/client/Client.cc:7134
#7  0x00007fb4c305ffb0 in Client::unmount (this=0x7fb4b819c590) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/client/Client.cc:7180
#8  ceph_mount_info::shutdown (this=0x7fb4b8182880) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/libcephfs.cc:222
#9  0x00007fb4c30604d3 in ceph_mount_info::unmount (this=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/libcephfs.cc:207
#10 ceph_unmount (cmount=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/libcephfs.cc:443
#11 0x000000000040c689 in proxy_instance_unmount (pinstance=pinstance@entry=0x7fb4b8154800) at proxy_mount.c:1094
#12 0x000000000040def7 in proxy_mount_unmount (mount=mount@entry=0x7fb4b8154800) at proxy_mount.c:1238
#13 0x0000000000404a73 in libcephfsd_unmount (client=0x1fd9620, req=<optimized out>, data=<optimized out>, data_size=<optimized out>) at libcephfsd.c:317
#14 0x0000000000403f52 in serve_binary (client=0x1fd9620) at libcephfsd.c:1695
#15 serve_connection (worker=0x1fd9620) at libcephfsd.c:1745
#16 0x000000000040e550 in proxy_worker_start (arg=0x1fd9620) at proxy_manager.c:66
#17 0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#18 0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 2 (Thread 0x7fb4bf005640 (LWP 657461) "libcephfsd"):
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007fb4c2689632 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x000000000040e6d1 in proxy_condition_wait (mutex=0x7ffd9d2847f0, condition=0x7ffd9d284818) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/libcephfs_proxy/proxy_helpers.h:318
#3  proxy_manager_start (arg=0x7ffd9d2847c0) at proxy_manager.c:103
#4  0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
#5  0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6

Thread 1 (Thread 0x7fb4c280d540 (LWP 657460) "libcephfsd"):
#0  0x00007fb4c270f83f in accept () from /lib64/libc.so.6
#1  0x000000000040bc0d in proxy_link_server (link=link@entry=0x7ffd9d284610, path=0x7ffd9d2865ea "/run/samba/libcephfsd.sock", start=start@entry=0x403c90 <accept_connection>, stop=stop@entry=0x403a50 <check_stop>) at proxy_link.c:663
#2  0x0000000000403ab2 in server_start (manager=<optimized out>) at libcephfsd.c:1826
#3  0x000000000040e87c in proxy_manager_run (manager=manager@entry=0x7ffd9d2847c0, start=start@entry=0x403a90 <server_start>) at proxy_manager.c:202
#4  0x00000000004038b8 in main (argc=1, argv=0x7ffd9d2849a8) at libcephfsd.c:1885

The problematic thread seems to be thread 3:
(gdb) t 3
[Switching to thread 3 (Thread 0x7fb4be804640 (LWP 657676))]
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
(gdb) bt -full
#0  0x00007fb4c26870da in __futex_abstimed_wait_common () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fb4c2689a10 in pthread_cond_clockwait@GLIBC_2.30 () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fb4c30e0f45 in std::__condvar::wait_until (__abs_time=..., __clock=1, __m=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/bits/std_mutex.h:169
No locals.
#3  std::condition_variable::__wait_until_impl<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__lock=..., __lock=..., __atime=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/condition_variable:201
        __s = <optimized out>
        __ns = <optimized out>
        __ts = {tv_sec = 76069, tv_nsec = 606940657}
        __s = <optimized out>
        __ns = <optimized out>
        __ts = <optimized out>
#4  std::condition_variable::wait_until<std::chrono::duration<long, std::ratio<1l, 1000000000l> > > (__atime=..., __lock=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/condition_variable:111
No locals.
#5  std::condition_variable::wait_for<long, std::ratio<1l, 1000000000l> > (__rtime=..., __rtime=..., __lock=..., this=0x7fb4b819d6a8) at /usr/include/c++/11/condition_variable:163
No locals.
#6  Client::_unmount (this=0x7fb4b819c590, abort=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/client/Client.cc:7134
        r = <optimized out>
        mref_writer = {S = @0x7fb4b819c840, satisfied = true, first_writer = true, is_reader = false}
        lock = {_M_device = 0x7fb4b819c710, _M_owns = true}
        global_realm = <optimized out>
        assert_data_ctx = {assertion = 0x7fb4c320db81 "in", 
          file = 0x7fb4c31f3678 "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-516-g844e80bf/rpm/el9/BUILD/ceph-20.0.0-51"..., line = 7089, 
          function = 0x7fb4c320002f "void Client::_unmount(bool)"}
        assert_data_ctx = {assertion = 0x7fb4c320bb52 "lru.lru_get_size() == 0", 
          file = 0x7fb4c31f3678 "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-516-g844e80bf/rpm/el9/BUILD/ceph-20.0.0-51"..., line = 7139, 
          function = 0x7fb4c320002f "void Client::_unmount(bool)"}
        assert_data_ctx = {assertion = 0x7fb4c31f9974 "inode_map.empty()", 
          file = 0x7fb4c31f3678 "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-516-g844e80bf/rpm/el9/BUILD/ceph-20.0.0-51"..., line = 7140, 
          function = 0x7fb4c320002f "void Client::_unmount(bool)"}
        assert_data_ctx = {assertion = 0x7fb4c3200017 "global_realm->nref == 1", 
          file = 0x7fb4c31f3678 "/home/jenkins-build/build/workspace/ceph-dev-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos9/DIST/centos9/MACHINE_SIZE/gigantic/release/20.0.0-516-g844e80bf/rpm/el9/BUILD/ceph-20.0.0-51"..., line = 7157, 
          function = 0x7fb4c320002f "void Client::_unmount(bool)"}
        __PRETTY_FUNCTION__ = <optimized out>
#7  0x00007fb4c305ffb0 in Client::unmount (this=0x7fb4b819c590) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/client/Client.cc:7180
        __func__ = <optimized out>
#8  ceph_mount_info::shutdown (this=0x7fb4b8182880) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/libcephfs.cc:222
No locals.
#9  0x00007fb4c30604d3 in ceph_mount_info::unmount (this=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/libcephfs.cc:207
No locals.
#10 ceph_unmount (cmount=<optimized out>) at /usr/src/debug/ceph-20.0.0-516.g844e80bf.el9.x86_64/src/libcephfs.cc:443
No locals.
#11 0x000000000040c689 in proxy_instance_unmount (pinstance=pinstance@entry=0x7fb4b8154800) at proxy_mount.c:1094
        instance = 0x7fb4b815e600
        sibling = 0x0
        err = <optimized out>
#12 0x000000000040def7 in proxy_mount_unmount (mount=mount@entry=0x7fb4b8154800) at proxy_mount.c:1238
No locals.
#13 0x0000000000404a73 in libcephfsd_unmount (client=0x1fd9620, req=<optimized out>, data=<optimized out>, data_size=<optimized out>) at libcephfsd.c:317
        ans = {header = {header_len = 49209, flags = 64, result = 0, data_len = 3196059416}}
        ans_iov = {{iov_base = 0x7fb4be7fff28, iov_len = 12}}
        ans_count = 1
        mount = <optimized out>
        err = 0
#14 0x0000000000403f52 in serve_binary (client=0x1fd9620) at libcephfsd.c:1695
        req_iov = {{iov_base = 0x7fb4be7fffc8, iov_len = 8}, {iov_base = 0x7fb4b80482e0, iov_len = 65536}}
        buffer = 0x0
        err = <optimized out>
        req = {header = {header_len = 16, op = 12, data_len = 0}, version = {header = {header_len = 16, op = 12, data_len = 0}}, userperm_new = {header = {header_len = 16, op = 12, data_len = 0}, uid = 2960643303, gid = 4175335717, groups = 583669680}, 
          userperm_destroy = {header = {header_len = 16, op = 12, data_len = 0}, userperm = 17932930357296354535}, create = {header = {header_len = 16, op = 12, data_len = 0}, id = -11033}, release = {header = {header_len = 16, op = 12, data_len = 0}, 
            cmount = 17932930357296354535}, conf_read_file = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, path = 6064}, conf_get = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, 
            size = 583669680, option = 28924}, conf_set = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, option = 6064, value = 8906}, init = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535}, 
          select_filesystem = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, fs = 6064}, mount = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, root = 6064}, unmount = {header = {header_len = 16, 
              op = 12, data_len = 0}, cmount = 17932930357296354535}, ll_statfs = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, inode = 2084446430760146864}, ll_lookup = {header = {header_len = 16, op = 12, data_len = 0}, 
            cmount = 17932930357296354535, userperm = 2084446430760146864, parent = 2084446430760146864, want = 4095, flags = 0, name = 2}, ll_lookup_inode = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, ino = {
              val = 2084446430760146864}}, ll_lookup_root = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535}, ll_put = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, inode = 2084446430760146864}, 
          ll_walk = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, want = 583669680, flags = 485323004, path = 4095}, chdir = {header = {header_len = 16, op = 12, data_len = 0}, 
            cmount = 17932930357296354535, path = 6064}, getcwd = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535}, readdir = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, 
            dir = 2084446430760146864}, rewinddir = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, dir = 2084446430760146864}, ll_open = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, 
            userperm = 2084446430760146864, inode = 2084446430760146864, flags = 4095}, ll_create = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, parent = 2084446430760146864, mode = 4095, 
            oflags = 0, want = 2, flags = 2001, name = 0}, ll_mknod = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, parent = 2084446430760146864, mode = 4095, rdev = 8594229559298, want = 1593835520, 
            flags = 21895, name = 40240}, ll_close = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, fh = 2084446430760146864}, ll_rename = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, 
            userperm = 2084446430760146864, old_parent = 2084446430760146864, new_parent = 4095, old_name = 2, new_name = 0}, ll_lseek = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, fh = 2084446430760146864, 
            offset = 2084446430760146864, whence = 4095}, ll_read = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, fh = 2084446430760146864, offset = 2084446430760146864, len = 4095}, ll_write = {header = {header_len = 16, op = 12, 
              data_len = 0}, cmount = 17932930357296354535, fh = 2084446430760146864, offset = 2084446430760146864, len = 4095}, ll_link = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, 
            inode = 2084446430760146864, parent = 4095, name = 2}, ll_unlink = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, parent = 2084446430760146864, name = 4095}, ll_getattr = {header = {
              header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, inode = 2084446430760146864, want = 4095, flags = 0}, ll_setattr = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, 
            userperm = 2084446430760146864, inode = 2084446430760146864, mask = 4095}, ll_fallocate = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, fh = 2084446430760146864, offset = 2084446430760146864, length = 4095, mode = 2}, 
          ll_fsync = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, fh = 2084446430760146864, dataonly = 583669680}, ll_listxattr = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, 
            userperm = 2084446430760146864, inode = 2084446430760146864, size = 4095}, ll_getxattr = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, inode = 2084446430760146864, size = 4095, name = 2}, 
          ll_setxattr = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, inode = 2084446430760146864, size = 4095, flags = 2, name = 2001}, ll_removexattr = {header = {header_len = 16, op = 12, 
              data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, inode = 2084446430760146864, name = 4095}, ll_readlink = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, 
            inode = 2084446430760146864, size = 4095}, ll_symlink = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, parent = 2084446430760146864, want = 4095, flags = 0, name = 2, target = 0}, 
          ll_opendir = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, inode = 2084446430760146864}, ll_mkdir = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, 
            userperm = 2084446430760146864, parent = 2084446430760146864, mode = 4095, want = 0, flags = 2, name = 2001}, ll_rmdir = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, userperm = 2084446430760146864, 
            parent = 2084446430760146864, name = 4095}, ll_releasedir = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, dir = 2084446430760146864}, mount_perms = {header = {header_len = 16, op = 12, data_len = 0}, 
            cmount = 17932930357296354535}, ll_nonblocking_rw = {header = {header_len = 16, op = 12, data_len = 0}, cmount = 17932930357296354535, info = 2084446430760146864, fh = 2084446430760146864, off = 4095, size = 8594229559298, write = false, fsync = false, 
            syncdataonly = false}}
        size = 65536
        req = <optimized out>
        req_iov = <optimized out>
        buffer = <optimized out>
        size = <optimized out>
        err = <optimized out>
#15 serve_connection (worker=0x1fd9620) at libcephfsd.c:1745
        client = 0x1fd9620
        err = <optimized out>
#16 0x000000000040e550 in proxy_worker_start (arg=0x1fd9620) at proxy_manager.c:66
        worker = 0x1fd9620
#17 0x00007fb4c268a002 in start_thread () from /lib64/libc.so.6
No symbol table info available.
#18 0x00007fb4c270f070 in clone3 () from /lib64/libc.so.6
No symbol table info available.

Related issues 4 (2 open2 closed)

Related to CephFS - Bug #64502: pacific/quincy/v18.2.0: client: ceph-fuse fails to unmount after upgrade to mainNewMahesh Mohan

Actions
Copied to CephFS - Backport #72503: reef: Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount()RejectedDhairya ParmarActions
Copied to CephFS - Backport #72504: squid: Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount()QA TestingDhairya ParmarActions
Copied to CephFS - Backport #72505: tentacle: Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount()ResolvedJos CollinActions
Actions #1

Updated by Xavi Hernandez 12 months ago

I've been able to reproduce this issue with a simple program:

#define _GNU_SOURCE
#define _FILE_OFFSET_BITS 64

#include <stdio.h>
#include <stdlib.h>
#include <cephfs/libcephfs.h>
#include <time.h>
#include <errno.h>
#include <string.h>

#define TEST(_delta, _x) \
    ({ \
        struct timespec _start; \
        clock_gettime(CLOCK_MONOTONIC, &_start); \
        typeof(_x) _ret = _x; \
        clock_gettime(CLOCK_MONOTONIC, &_delta); \
        _delta.tv_nsec -= _start.tv_nsec; \
        if (_delta.tv_nsec < 0) { \
            _delta.tv_nsec += 1000000000; \
            _delta.tv_sec--; \
        } \
        _delta.tv_sec -= _start.tv_sec; \
        _ret; \
    })

#define TEST_RET(_x) \
    do { \
        struct timespec _delta; \
        int _ret = TEST(_delta, _x); \
        printf("[%3lu.%03lu] (%4d) " #_x "\n", _delta.tv_sec, _delta.tv_nsec / 1000000, _ret); \
        if (_ret < 0) { \
            exit(1); \
        } \
    } while (0)

#define TEST_PTR(_x) \
    ({ \
        struct timespec _delta; \
        void *_ptr = TEST(_delta, _x); \
        printf("[%3lu.%03lu] (%s) " #_x "\n", _delta.tv_sec, _delta.tv_nsec / 1000000, _ptr == NULL ? "NULL" : "    "); \
        if (_ptr == NULL) { \
            exit(1); \
        } \
        _ptr; \
    })

int main(int argc, char *argv[])
{
    struct ceph_statx stx;
    struct ceph_mount_info *mnt;
    UserPerm *perms;
    Inode *inode_root, *inode_file, *inode_tmp;
    Fh *handle;

    if (argc < 4) {
        fprintf(stderr, "Usage: %s <user> <config> <file>\n", argv[0]);
        return 1;
    }

    TEST_RET(ceph_create(&mnt, argv[1]));
    TEST_RET(ceph_conf_read_file(mnt, argv[2]));
    TEST_RET(ceph_mount(mnt, NULL));

    perms = TEST_PTR(ceph_userperm_new(0, 0, 0, NULL));

    TEST_RET(ceph_ll_lookup_root(mnt, &inode_root));

    TEST_RET(ceph_ll_create(mnt, inode_root, argv[3], 0777, O_CREAT | O_TRUNC | O_RDWR, &inode_file, &handle, &stx, CEPH_STATX_INO, 0, perms));
    TEST_RET(ceph_ll_close(mnt, handle));

    TEST_RET(ceph_ll_lookup(mnt, inode_file, ".", &inode_tmp, &stx, CEPH_STATX_INO, 0, perms));
    TEST_RET(ceph_ll_put(mnt, inode_tmp));

    TEST_RET(ceph_ll_unlink(mnt, inode_root, argv[3], perms));

    TEST_RET(ceph_ll_put(mnt, inode_file));
    TEST_RET(ceph_ll_put(mnt, inode_root));

    ceph_userperm_destroy(perms);

    printf("Before ceph_unmount()\n");
    TEST_RET(ceph_unmount(mnt));
    printf("After ceph_unount()\n");

    TEST_RET(ceph_release(mnt));

    return 0;
}

When this program is run, the message after ceph_unmount() is never printed. Additionally, if you kill the process and run it again immediately, the program gets blocked for a long time (around 20 seconds) on ceph_ll_create(), and then proceeds normally until it gets blocked on ceph_unmount() again.

Removing the ceph_ll_unlink() or the previous ceph_ll_lookup()/ceph_ll_put() avoids the hang on unmount.

The reproducer is suspiciously similar to the one I found for another bug a while back: https://tracker.ceph.com/issues/63461

Actions #2

Updated by Patrick Donnelly 12 months ago

  • Related to Bug #64502: pacific/quincy/v18.2.0: client: ceph-fuse fails to unmount after upgrade to main added
Actions #3

Updated by Patrick Donnelly 12 months ago

From the title, wondering if this may be related to #64502

Actions #4

Updated by Xavi Hernandez 12 months ago

Patrick Donnelly wrote in #note-3:

From the title, wondering if this may be related to #64502

It could be related, buy I can't tell for sure. Is there anything I can check that could be useful to determine if it's the same issue ?

Besides that, I want to note that if I replace:

ceph_ll_lookup(mnt, inode_file, ".", &inode_tmp, &stx, CEPH_STATX_INO, 0, perms);

by:

ceph_ll_lookup(mnt, inode_root, argv[3], &inode_tmp, &stx, CEPH_STATX_INO, 0, perms);

It doesn't trigger the hang on unmount.

The hang is also prevented if I replace the ceph_ll_lookup and ceph_ll_put by ceph_ll_getattr, which is functionally equivalent in this case.

In fact we are using this special lookup in some places because we also need a new reference. It would be very useful to have a ceph_ll_get, or similar, in libcephfs for these cases.

Actions #5

Updated by Milind Changire 12 months ago

  • Assignee set to Dhairya Parmar
Actions #6

Updated by John Mulligan 12 months ago

  • Subject changed from Client hangs during ceph_unmont() to Client hangs during ceph_unmount()
Actions #7

Updated by Xavi Hernandez 12 months ago

I've seen that ceph_ll_lookup() also accepts an empty string to return the same inode but with an additional reference. Using an empty string also seems to avoid the bug.

ceph_ll_lookup(mnt, inode_file, "", &inode_tmp, &stx, CEPH_STATX_INO, 0, perms);
Actions #8

Updated by Dhairya Parmar 12 months ago

Xavi Hernandez wrote in #note-7:

I've seen that ceph_ll_lookup() also accepts an empty string to return the same inode but with an additional reference. Using an empty string also seems to avoid the bug.

[...]

there was a recent issue patched for ceph_ll_lookup https://tracker.ceph.com/issues/69624, however the patch is not yet merged. I'm not sure it's the same issue. I'm taking a look.

Actions #9

Updated by Dhairya Parmar 12 months ago

wrote a test case based on the reproducer shared by @Xavi Hernandez here https://github.com/ceph/ceph/pull/62488, easily reproducible in vstart cluster.

Actions #10

Updated by Dhairya Parmar 12 months ago

surprisingly if you change the order of:

    TEST_RET(ceph_ll_lookup(mnt, inode_file, ".", &inode_tmp, &stx, CEPH_STATX_INO, 0, perms));
    TEST_RET(ceph_ll_put(mnt, inode_tmp));

    TEST_RET(ceph_ll_unlink(mnt, inode_root, argv[3], perms));

i.e.:
    TEST_RET(ceph_ll_unlink(mnt, inode_root, argv[3], perms));

    TEST_RET(ceph_ll_lookup(mnt, inode_file, ".", &inode_tmp, &stx, CEPH_STATX_INO, 0, perms));
    TEST_RET(ceph_ll_put(mnt, inode_tmp));

the unmount works fine 0_o

Actions #11

Updated by Dhairya Parmar 12 months ago

  • Subject changed from Client hangs during ceph_unmount() to Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount()
Actions #12

Updated by Dhairya Parmar 12 months ago · Edited

I also tried syncing the fs and/or disabling the client cache (client_cache_size = 0) in hope of force flushing the metadata but the unmount still stalled. So I guess the issue is something else. Time for diving into the logs now.

Actions #13

Updated by Venky Shankar 12 months ago

  • Category set to Correctness/Safety
  • Status changed from New to Triaged
  • Priority changed from Normal to High
  • Target version set to v20.0.0
  • Source set to other
  • Backport set to reef,squid
Actions #14

Updated by Anoop C S 12 months ago

I would like to share a different angle to the whole story. It might also be the case that the behaviour is only seen with the inode for regular files i.e, ll_lookup() on "." with regular file inode as parent. Obviously if we replace the parent inode in ceph_ll_lookup() with inode_root in the reproducer, it works.

IOW, semantically, how are we supposed to treat such a request for lookup on "." with regular file inode as parent? Should we disallow the operation?

Actions #15

Updated by Dhairya Parmar 12 months ago

Anoop C S wrote in #note-14:

I would like to share a different angle to the whole story. It might also be the case that the behaviour is only seen with the inode for regular files i.e, ll_lookup() on "." with regular file inode as parent. Obviously if we replace the parent inode in ceph_ll_lookup() with inode_root in the reproducer, it works.

IOW, semantically, how are we supposed to treat such a request for lookup on "." with regular file inode as parent? Should we disallow the operation?

you're right, technically it should check if the the parent is a dirinode, i think this should definitely be part of the patch but i want to first verify why does it allow me do so if i alter the lookup and unlink call order in the reproducer.

Actions #16

Updated by Dhairya Parmar 12 months ago · Edited

root inode is holding caps (caps=pAsLsXs(0=pAsLsXs) and nref=2):

2025-03-26T15:37:55.162+0530 7fadb8d41640  1 client.4293 dump_inode: DISCONNECTED inode 0x1 #0x1 ref 2 0x1.head(faked_ino=0 nref=2 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2025-03-26T15:37:39.967146+0530 mtime=2025-03-26T15:37:45.113229+0530 ctime=2025-03-26T15:37:45.136266+0530 change_attr=3 caps=pAsLsXs(0=pAsLsXs) COMPLETE has_dir_layout 0x7fad78006970)

umount is waiting for the caps to be released:
2025-03-26T15:37:55.162+0530 7fadb8d41640  2 client.4293 cache still has 1+1 items, waiting (for caps to release?)

As anoopcs mentioned in previous comment - we're doing a pathwalk at a non-dir inode. Technically it's incorrect, neither this is POSIX compliance. ceph_openat() at a non-dir inode returns ENOTDIR, when I checked the code paths for lookup call, the arg expects a dirinode throughout the flow, I think that's what is causing the issue. We need to add a check that if it's a non-dir inode then return ENOTDIR just like ceph_openat() does.

Actions #17

Updated by Venky Shankar 12 months ago · Edited

Anoop C S wrote in #note-14:

I would like to share a different angle to the whole story. It might also be the case that the behaviour is only seen with the inode for regular files i.e, ll_lookup() on "." with regular file inode as parent. Obviously if we replace the parent inode in ceph_ll_lookup() with inode_root in the reproducer, it works.

IOW, semantically, how are we supposed to treat such a request for lookup on "." with regular file inode as parent? Should we disallow the operation?

Normally, all path components except the last path component should be a directory inode type. I guess the recent path walk changes handles this special case and returns the inode of the file itself. Technically, you are looking up "." (current directory) under a parent inode which is a regular file. IMO, that should throw -ENOTDIR.

Actions #18

Updated by Anoop C S 12 months ago

Dhairya Parmar wrote in #note-16:

root inode is holding caps (caps=pAsLsXs(0=pAsLsXs) and nref=2):
[...]
umount is waiting for the caps to be released:
[...]

As anoopcs mentioned in previous comment - we're doing a pathwalk at a non-dir inode. Technically it's incorrect, neither this is POSIX compliance. ceph_openat() at a non-dir inode returns ENOTDIR, when I checked the code paths for lookup call, the arg expects a dirinode throughout the flow, I think that's what is causing the issue. We need to add a check that if it's a non-dir inode then return ENOTDIR just like ceph_openat() does.

I would like to make a clarification that libcephfs doesn't return ENOTDIR in this case i.e, be it ceph_ll_lookup() with non-directory parent inode or ceph_openat() with non-directory file descriptor, it is successfully returned for ".". Whereas kernel client returns expected ENOTDIR.

Actions #19

Updated by Dhairya Parmar 12 months ago

Anoop C S wrote in #note-18:

Dhairya Parmar wrote in #note-16:

root inode is holding caps (caps=pAsLsXs(0=pAsLsXs) and nref=2):
[...]
umount is waiting for the caps to be released:
[...]

As anoopcs mentioned in previous comment - we're doing a pathwalk at a non-dir inode. Technically it's incorrect, neither this is POSIX compliance. ceph_openat() at a non-dir inode returns ENOTDIR, when I checked the code paths for lookup call, the arg expects a dirinode throughout the flow, I think that's what is causing the issue. We need to add a check that if it's a non-dir inode then return ENOTDIR just like ceph_openat() does.

I would like to make a clarification that libcephfs doesn't return ENOTDIR in this case i.e, be it ceph_ll_lookup() with non-directory parent inode or ceph_openat() with non-directory file descriptor, it is successfully returned for ".". Whereas kernel client returns expected ENOTDIR.

This doesn't align with POSIX, I'll push a patch for this soon, it's an easy fix but first i'm trying to figure why is there a cap leak for ll_lookup + ll_unlink (in tandem exclusively, inversed order or removal of any one of the call doesn't stall unmount).

Actions #20

Updated by Dhairya Parmar 12 months ago · Edited

JFYI folks, this behaviour can be seen with high level APIs too, ceph_openat() using a file fd at . stalls unmount. Test case added to the PR linked with this tracker

Actions #21

Updated by Dhairya Parmar 12 months ago

  • Pull request ID set to 62488
Actions #22

Updated by Xavi Hernandez 12 months ago

Dhairya Parmar wrote in #note-20:

JFYI folks, this behaviour can be seen with high level APIs too, ceph_openat() using a file fd at . stalls unmount. Test case added to the PR linked with this tracker

If you check the issue I referenced in the description (https://tracker.ceph.com/issues/63461), a very similar thing also happens with kernel mounts and using regular posix requests.

Note that some time ago (before the recent changes to path walk), the only way to get a new reference to an existing inode was to pass "." to a ceph_ll_lookup(). I'm not sure if this was done on purpose for this use case, or it's just an unwanted side effect.

Now we can pass "" to get the same result (and it doesn't cause any hang), but is this the right way to do it ? shouldn't it be better to provide a ceph_ll_get() or similar for this use case ?

Actions #23

Updated by Venky Shankar 12 months ago

@Dhairya Parmar You did mention in cephfs standup that the underlying issue has been identified (bug in path_walk). Could you please keep this tracker updated with the details please?

Actions #24

Updated by Dhairya Parmar 11 months ago

Venky Shankar wrote in #note-23:

@Dhairya Parmar You did mention in cephfs standup that the underlying issue has been identified (bug in path_walk). Could you please keep this tracker updated with the details please?

added a detailed comment in PR, linking it here https://github.com/ceph/ceph/pull/62488#discussion_r2038226736

Actions #25

Updated by Patrick Donnelly 9 months ago

  • Target version deleted (v20.0.0)
Actions #26

Updated by Venky Shankar 7 months ago

  • Status changed from Triaged to Pending Backport
  • Target version set to v21.0.0
  • Backport changed from reef,squid to tentacle,squid,reef
Actions #27

Updated by Upkeep Bot 7 months ago

  • Merge Commit set to 416c794dad975666f03425537f494f325518ac81
  • Fixed In set to v20.3.0-2194-g416c794dad
  • Upkeep Timestamp set to 2025-08-11T03:18:35+00:00
Actions #28

Updated by Upkeep Bot 7 months ago

  • Copied to Backport #72503: reef: Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount() added
Actions #29

Updated by Upkeep Bot 7 months ago

  • Copied to Backport #72504: squid: Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount() added
Actions #30

Updated by Upkeep Bot 7 months ago

  • Copied to Backport #72505: tentacle: Client: ceph_ll_unlink() after ceph_ll_lookup() hangs ceph_unmount() added
Actions #31

Updated by Upkeep Bot 7 months ago

  • Tags (freeform) set to backport_processed
Actions

Also available in: Atom PDF