Bug #65971
closedread operation hung in Client::get_caps(Same case as issue 65455)
100%
Description
reproduced in CEPH17 versionSame case as issue 65455)
The case is same as issue 65455. I tried to reproduce this case on the CEPH 17 version( ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) ), the issue still exists.
How to reproduce the scene
1. I used two nfs ganesha+libcephfs as the nfs server (server1, server2), and used the same subdirectory from cephfs as the nfs export directory
2. One nfs client client1 mounts to server1 through NFS v3.0, and another client2 mounts to server2 through NFS v3.0
3. Next, I will write a file through client2:
dd if=/dev/zero of=testfile bs=1M count=10000 of lag=direct status=progress
4. After the file is writed completed, I read it through client1, the reading may be hung:
dd if=testfile of=/dev/null bs=4k iflag=direct count=1000000 status=progress
5. The hung reading thread stack on server1:
Thread 36 (Thread 0x7f8546ce5700 (LWP 83644)):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x7f8546ce24d8) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7f85c5f4c0f0, cond=0x7f8546ce24b0) at pthread_cond_wait.c:502
#2 __pthread_cond_wait (cond=0x7f8546ce24b0, mutex=0x7f85c5f4c0f0) at pthread_cond_wait.c:655
#3 0x00007f85c67773bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4 0x00007f85bc053d66 in Client::wait_on_list (this=0x7f85c5f4c000, ls=Python Exception <class 'ValueError'> Cannot find type class std::__cxx11::list<std::condition_variable*, std::allocator<std::condition_variable*> >::_Node:
std::__cxx11::list) at ./src/client/Client.cc:3832
#5 0x00007f85bc0b131f in Client::get_caps (this=0x7f85c5f4c000, fh=0x7f85a7ce7900, need=2048, want=1024, phave=0x7f8546ce26fc, endoff=-1) at ./src/client/Client.cc:3342
#6 0x00007f85bc0c40fb in Client::_read (this=0x7f85c5f4c000, f=0x7f85a7ce7900, offset=49152, size=4096, bl=0x7f8546ce2830) at ./src/client/Client.cc:9296
#7 0x00007f85bc0c4a53 in Client::ll_read (this=0x7f85c5f4c000, fh=0x7f85a7ce7900, off=off@entry=49152, len=4096, bl=bl@entry=0x7f8546ce2830) at ./src/client/Client.cc:13487
#8 0x00007f85bc04ba7a in ceph_ll_read (cmount=<optimized out>, filehandle=<optimized out>, off=off@entry=49152, len=<optimized out>, buf=0x7f8557860000 "") at ./src/libcephfs.cc:1670
#9 0x00007f85bc16a47b in ceph_fsal_read2 (obj_hdl=0x7f85a8154cc0, bypass=<optimized out>, done_cb=0x7f85c7481460 <mdc_read_cb>, read_arg=0x7f85a3282e80, caller_arg=0x7f85a32343e0) at ./src/FSAL/FSAL_CEPH/handle.c:1890
#10 0x00007f85c7482a30 in mdcache_read2 (obj_hdl=0x7f85a8047d38, bypass=<optimized out>, done_cb=<optimized out>, read_arg=0x7f85a3282e80, caller_arg=<optimized out>) at ./src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_file.c:620
#11 0x00007f85c7467dca in nfs3_read (arg=<optimized out>, req=0x7f85a3256000, res=0x7f83277b5500) at ./src/Protocols/NFS/nfs3_read.c:367
#12 0x00007f85c73ad145 in nfs_rpc_process_request (reqdata=0x7f85a3256000, retry=<optimized out>) at ./src/MainNFSD/nfs_worker_thread.c:1518
#13 0x00007f85c732f5cb in svc_request (xprt=0x7f85a840ac00, xdrs=0x7f85a85418c0) at ./src/svc_rqst.c:1209
#14 0x00007f85c732c8b1 in svc_rqst_xprt_task_recv (wpe=<optimized out>) at ./src/svc_rqst.c:1183
#15 0x00007f85c732d298 in svc_rqst_epoll_loop (wpe=0x7f85c5f53600) at ./src/svc_rqst.c:1571
#16 0x00007f85c7338778 in work_pool_thread (arg=0x7f85a7c00960) at ./src/work_pool.c:183
#17 0x00007f85c6be4fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#18 0x00007f85c693e06f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
6. the caps held by the two server:
"client_caps": [
{
"client_id": 36812824, -> server2
"pending": "pAsLsXsFrw",
"issued": "pAsLsXsFrw",
"wanted": "pAsxXsxFsxcrwb",
"last_sent": 18169114
},
{
"client_id": 39570896, -> server1
"pending": "pAsLsXsFr",
"issued": "pAsLsXsFr",
"wanted": "pFscr",
"last_sent": 5239872
}
],
Updated by tod chen almost 2 years ago
Case Analysis:
1. When client2 mount server2 through nfsv3, if it create a file and write it, since the nfsv3 protocol has no close opertion, the file inode will always keeps some unwanted caps in the cephfs client cache (issued: pAsxLsXsxFsxcrwb, wanted: pAsxXsxFsxcrwb)
2. Then when client1 read the file through another nfs server1, nfs server1 will perform two operations: getattr and read, at which point server1 will try to get caps (pAsLsXsFscr)
3. At this time, mds will try to recall the Fcwb caps from server2 to meet the caps requirements of server1
4. After server1 completes its getattr and read requests, the mds locker will soon reclaim the Fsc caps from server1 to meet the Fw caps requirements of server2
5. In this way, if a series of getattr and read operations on server1 are executed, mds will repeatedly grant and reclaim Fsc caps to server1
how the read hung case occurs on server1:
1. In the above case, each pair of getattr and reading request on server1 is serial. When server1 receives the pAsLsXsFscr caps granted by mds, it can perform an aysnc reading and a pre-reading to the file. Then in the _read_async function, it will calls get_cap_ref,which result to:in- > cap_refs [CEPH_CAP_FILE_CACHE] ++
2. After the _read_async function completed,the pre-reading may still wait for the osd response, so the in->cap_refs [CEPH_CAP_FILE_CACHE] is still not zero.
3. At which time if the mds locker reclaims the Fsc caps from server1, in the handle_cap_grant function, since the in-> cap_refs[CEPH_CAP_FILE_CACHE] is not 0, which means that check_caps cannot be called to further update the local caps and respond to mds.
4. Then if a newly reading request is executed in server 1, it will wait in Client::get_caps:
int Client::get_caps(Fh *fh, int need, int want, int *phave, loff_t endoff)
{
Inode *in = fh->inode.get();
int r = check_pool_perm(in, need);
if (r < 0)
return r;
while (1) {
...
int implemented;
int have = in->caps_issued(&implemented);
bool waitfor_caps = false;
bool waitfor_commit = false;
...
if (!waitfor_caps && !waitfor_commit) {
/* have: pAsLsXsFr, need: Fr */
if ((have & need) == need) {
int revoking = implemented & ~have;
ldout(cct, 10) << "get_caps " << *in << " have " << ccap_string(have)
<< " need " << ccap_string(need) << " want " << ccap_string(want)
<< " revoking " << ccap_string(revoking)
<< dendl;
/* revoking: Fsc, want: Fc */
if ((revoking & want) == 0) {
*phave = need | (have & want);
in->get_cap_ref(need);
return 0;
}
}
ldout(cct, 10) << "waiting for caps " << *in << " need " << ccap_string(need) << " want " << ccap_string(want) << dendl;
waitfor_caps = true;
}
...
if (waitfor_caps)
wait_on_list(in->waitfor_caps);
else if (waitfor_commit)
wait_on_list(in->waitfor_commit);
}
}
how to fix this bug: when the client drop CEPH_CAP_FILE_CACHE caps,send signal to nodify the waitfor_caps readers
@@ -3240,6 +3240,9 @@ void Client::put_cap_ref(Inode *in, int cap)
}
if (drop)
check_caps(in, 0);
+ if (last & CEPH_CAP_FILE_CACHE) {
+ signal_cond_list(in->waitfor_caps); // wake up blocked readers
+ }
if (put_nref)
put_inode(in, put_nref);
}
Updated by Venky Shankar almost 2 years ago
- Category set to Correctness/Safety
- Status changed from New to Triaged
- Assignee set to Venky Shankar
- Target version set to v20.0.0
- Backport set to quincy,reef,squid
- Severity changed from 3 - minor to 2 - major
Updated by tod chen over 1 year ago
@Venky Shankar
hello,
How has this issue progressed? Do you need me to provide more information?
Updated by Venky Shankar over 1 year ago
tod chen wrote in #note-3:
@Venky Shankar
hello,
How has this issue progressed? Do you need me to provide more information?
Thanks for your patience. I'm swamped with other tracker - will get to this sometime this week.
Updated by Venky Shankar over 1 year ago
Thanks for the detailed reproducer in https://tracker.ceph.com/issues/65971#note-1 - it really helps. I've started look into this.
Updated by Venky Shankar over 1 year ago
- Related to Bug #65455: read operation hung in Client::get_caps added
Updated by Venky Shankar over 1 year ago
So, this is a three-way (cap related) deadlock between an asynchronous read which got triggered when the client initially got and Fscr cap, then a cap revoke for Fsc by the MDS (which didn't release the cap, since Fc was in use due to in pending asynchronous read) and finally a read request that wants Fc cap. You suggested fix is to signal the cap waiters in put_cap_ref() seems sensible. Would you mind sending the fix as a pull request, @hit1944 ?
Also, have you checked if the same issue exist with write requests?
Updated by tod chen over 1 year ago
Venky Shankar wrote in #note-7:
So, this is a three-way (cap related) deadlock between an asynchronous read which got triggered when the client initially got and
Fscrcap, then a cap revoke forFscby the MDS (which didn't release the cap, sinceFcwas in use due to in pending asynchronous read) and finally a read request that wantsFccap. You suggested fix is to signal the cap waiters input_cap_ref()seems sensible. Would you mind sending the fix as a pull request, @hit1944 ?Also, have you checked if the same issue exist with write requests?
ok,i will send the pr.
There is no such problem in the write flow, because in put_cap_ref, it checks whether the subtracted last caps reference contains Fwb, and if so, it will signal the write waiter.
@void Client::put_cap_ref(Inode *in, int cap)
{
int last = in->put_cap_ref(cap);
if (last) {
int put_nref = 0;
int drop = last & ~in->caps_issued();
if (in->snapid == CEPH_NOSNAP) {
if ((last & CEPH_CAP_FILE_WR) &&
!in->cap_snaps.empty() &&
in->cap_snaps.rbegin()->second.writing) {
ldout(cct, 10) << func << " finishing pending cap_snap on " << *in << dendl;
in->cap_snaps.rbegin()->second.writing = 0;
finish_cap_snap(in, in->cap_snaps.rbegin()->second, get_caps_used(in));
signal_cond_list(in->waitfor_caps); // wake up blocked sync writers
}
...
}@
Updated by Venky Shankar over 1 year ago
- Status changed from Triaged to Fix Under Review
- Source set to Community (user)
- Pull request ID set to 59027
Updated by Venky Shankar over 1 year ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68692: quincy: read operation hung in Client::get_caps(Same case as issue 65455) added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68693: reef: read operation hung in Client::get_caps(Same case as issue 65455) added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68694: squid: read operation hung in Client::get_caps(Same case as issue 65455) added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 6910e4c010dda1b77134153c0faaa65278c416ae
- Fixed In set to v19.3.0-5736-g6910e4c010d
- Upkeep Timestamp set to 2025-07-08T22:38:34+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-5736-g6910e4c010d to v19.3.0-5736-g6910e4c010dd
- Upkeep Timestamp changed from 2025-07-08T22:38:34+00:00 to 2025-07-14T15:47:05+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-5736-g6910e4c010dd to v19.3.0-5736-g6910e4c010
- Upkeep Timestamp changed from 2025-07-14T15:47:05+00:00 to 2025-07-14T21:38:05+00:00
Updated by Konstantin Shalygin 8 months ago
- Status changed from Pending Backport to Resolved
- % Done changed from 0 to 100
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~1737
- Upkeep Timestamp changed from 2025-07-14T21:38:05+00:00 to 2025-11-01T01:32:33+00:00