Bug #71510
closedclient: crash with concurrent nonblocking fsync and write
0%
Description
The asynchronous fsync state machine execution can be halted (request put a wait queue), if Fb caps are in use (i.e., ref count for Fb caps >0). Let's call this stage1. Now, before this stage is reached, if the execution context had to wait for unsafe operations. a ref is incremented in the request and put on wait queue (req->waitfor_safe). Let's call this stage0.
When stage0 request is woken up, the execution context moves to stage1, where the reference is dropped. Now the wait in stage1 does not increment the reference count of the request, however, stage1 execution context can be retried (if Fb caps is already in use), where the reference will be dropped again.
Client crash backtrace
0x00007f3115b2452c in __pthread_kill_implementation () from /lib64/libc.so.6
0x00007f3115ad7686 in raise () from /lib64/libc.so.6
0x00007f3115ac1833 in abort () from /lib64/libc.so.6
0x00007f3113375d0a in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:74
0x00007f3113375e6f in ceph::__ceph_assert_fail (ctx=...) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/assert.cc:79
0x00007f311237db1d in xlist<MetaRequest*>::item::~item (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/xlist.h:31
MetaRequest::~MetaRequest (this=<optimized out>, this=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/MetaRequest.cc:65
Client::put_request (this=0x564b491726c0, request=0x7f301c0165c0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:2140
0x00007f31123c88ad in Client::C_nonblocking_fsync_state::advance (this=0x7f307002e9f0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11905
0x00007f3112331ccd in Context::complete (this=0x7f3070009250, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f311246a964 in Client::signal_context_list(std::__cxx11::list<Context*, std::allocator<Context*> >&) [clone .constprop.0] (ls=std::__cxx11::list = {...}, this=<optimized out>)
at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:4257
0x00007f3112395f45 in Client::put_cap_ref (this=0x564b491726c0, in=0x7f306807be90, cap=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:3611
0x00007f31123331f3 in Client::C_Write_Finisher::finish_io (r=0, this=0x7f30240442d0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11381
Client::CWF_iofinish::finish (this=<optimized out>, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.h:1481
0x00007f3112331ccd in Context::complete (this=0x7f302401afd0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31123c5242 in Client::C_Lock_Client_Finisher::finish (this=0x7f302403c9d0, r=0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/client/Client.cc:11372
0x00007f3112331ccd in Context::complete (this=0x7f302403c9d0, r=<optimized out>) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/include/Context.h:99
0x00007f31134374ad in Finisher::finisher_thread_entry (this=0x564b491730b0) at /usr/src/debug/ceph-19.2.0-124.el9cp.x86_64/src/common/Finisher.cc:72
0x00007f3115b227e2 in start_thread () from /lib64/libc.so.6
0x00007f3115ba7800 in clone3 () from /lib64/libc.so.6
0x0000000000000000 in ?? ()
Updated by Venky Shankar 10 months ago
See Client::C_nonblocking_fsync_state::advance(), case 0 and case 1.
Updated by Venky Shankar 10 months ago
- Status changed from New to Fix Under Review
- Pull request ID set to 63619
Updated by Venky Shankar 10 months ago
- Related to Bug #71515: qa: add test to validate fix for crash sue to asynchronous write and fsync running concurrently added
Updated by Venky Shankar 9 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot 9 months ago
- Copied to Backport #71708: tentacle: client: crash with concurrent nonblocking fsync and write added
Updated by Upkeep Bot 9 months ago
- Copied to Backport #71709: squid: client: crash with concurrent nonblocking fsync and write added
Updated by Upkeep Bot 9 months ago
- Merge Commit set to f484edf976c350b2f4b42fe15e0498fb30cc449a
- Fixed In set to v20.3.0-968-gf484edf976c
- Upkeep Timestamp set to 2025-07-02T14:27:16+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.3.0-968-gf484edf976c to v20.3.0-968-gf484edf976c3
- Upkeep Timestamp changed from 2025-07-02T14:27:16+00:00 to 2025-07-14T15:20:03+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.3.0-968-gf484edf976c3 to v20.3.0-968-gf484edf976
- Upkeep Timestamp changed from 2025-07-14T15:20:03+00:00 to 2025-07-14T20:44:43+00:00
Updated by Upkeep Bot 6 months ago
- Status changed from Pending Backport to Resolved
- Upkeep Timestamp changed from 2025-07-14T20:44:43+00:00 to 2025-09-23T12:45:53+00:00