Project

General

Profile

Actions

Bug #67491

closed

Race condition when printing Inode in the ll_sync_inode function

Added by Chengen Du over 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Community (dev)
Backport:
squid,reef,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Crash signature (v1):
Crash signature (v2):
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-4519-gb3896f4849
Released In:
v20.2.0~2156
Upkeep Timestamp:
2025-11-01T01:32:25+00:00

Description

In the ll_sync_inode function, the entire Inode structure is printed without holding a lock, which may lead to the following core trace:

#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140705682900544) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140705682900544) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140705682900544, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffa92094476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffa9207a7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffa910783c3 in ceph::__ceph_assert_fail (assertion=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>) at ./src/common/assert.cc:75
#6  0x00007ffa91078525 in ceph::__ceph_assert_fail (ctx=...) at ./src/common/assert.cc:80
#7  0x00007ffa7049f602 in xlist<ObjectCacher::Object*>::size (this=0x7ffa20734638, this=0x7ffa20734638) at ./src/include/xlist.h:87
#8  operator<< (os=..., out=warning: RTTI symbol not found for class 'StackStringStream<4096ul>'
...) at ./src/osdc/ObjectCacher.h:760
#9  operator<< (out=warning: RTTI symbol not found for class 'StackStringStream<4096ul>'
..., in=...) at ./src/client/Inode.cc:80
#10 0x00007ffa7045545f in Client::ll_sync_inode (this=0x55958b8a5c60, in=in@entry=0x7ffa20734270, syncdataonly=syncdataonly@entry=false) at ./src/client/Client.cc:14717
#11 0x00007ffa703d0f75 in ceph_ll_sync_inode (cmount=cmount@entry=0x55958b0bd0d0, in=in@entry=0x7ffa20734270, syncdataonly=syncdataonly@entry=0) at ./src/libcephfs.cc:1865
#12 0x00007ffa9050ddc5 in fsal_ceph_ll_setattr (creds=<optimized out>, mask=<optimized out>, stx=0x7ff8983f25a0, i=<optimized out>, cmount=<optimized out>)
    at ./src/FSAL/FSAL_CEPH/statx_compat.h:209
#13 ceph_fsal_setattr2 (obj_hdl=0x7fecc8fefbe0, bypass=<optimized out>, state=<optimized out>, attrib_set=0x7ff8983f2830) at ./src/FSAL/FSAL_CEPH/handle.c:2410
#14 0x00007ffa92371da0 in mdcache_setattr2 (obj_hdl=0x7fecc9e98778, bypass=<optimized out>, state=0x7fef0d64c9b0, attrs=0x7ff8983f2830)
    at ../FSAL/Stackable_FSALs/FSAL_MDCACHE/./src/FSAL/Stackable_FSALs/FSAL_MDCACHE/mdcache_handle.c:1012
#15 0x00007ffa922b2bbc in fsal_setattr (obj=0x7fecc9e98778, bypass=<optimized out>, state=0x7fef0d64c9b0, attr=0x7ff8983f2830) at ./src/FSAL/fsal_helper.c:573
#16 0x00007ffa9234c7bd in nfs4_op_setattr (op=0x7fecad7ac510, data=0x7fecac314a10, resp=0x7fecad1be200) at ../Protocols/NFS/./src/Protocols/NFS/nfs4_op_setattr.c:212
#17 0x00007ffa9232e413 in process_one_op (data=data@entry=0x7fecac314a10, status=status@entry=0x7ff8983f2a2c) at ../Protocols/NFS/./src/Protocols/NFS/nfs4_Compound.c:920
#18 0x00007ffa9232f9e0 in nfs4_Compound (arg=<optimized out>, req=0x7fecad491620, res=0x7fecac054580) at ../Protocols/NFS/./src/Protocols/NFS/nfs4_Compound.c:1327
#19 0x00007ffa922cb0ff in nfs_rpc_process_request (reqdata=0x7fecad491620) at ./src/MainNFSD/nfs_worker_thread.c:1508
#20 0x00007ffa92029be7 in svc_request (xprt=0x7fed640504d0, xdrs=<optimized out>) at ./src/svc_rqst.c:1202
#21 0x00007ffa9202df9a in svc_rqst_xprt_task_recv (wpe=<optimized out>) at ./src/svc_rqst.c:1183
#22 0x00007ffa9203344d in svc_rqst_epoll_loop (wpe=0x559594308e60) at ./src/svc_rqst.c:1564
#23 0x00007ffa920389e1 in work_pool_thread (arg=0x7feeb802ea10) at ./src/work_pool.c:184
#24 0x00007ffa920e6b43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#25 0x00007ffa92178a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Upon further analysis of the call trace using GDB, both the _front and _back member variables in xlist<ObjectCacher::Object*> are set to zero, yet an assertion failure is still triggered.
(gdb) frame 7
#7  0x00007ffa7049f602 in xlist<ObjectCacher::Object*>::size (this=0x7ffa20734638, this=0x7ffa20734638) at ./src/include/xlist.h:87
87    ./src/include/xlist.h: No such file or directory.
(gdb) p *this
$1 = {_front = 0x0, _back = 0x0, _size = 0}
(gdb) frame 6
#6  0x00007ffa91078525 in ceph::__ceph_assert_fail (ctx=...) at ./src/common/assert.cc:80
80    ./src/common/assert.cc: No such file or directory.
(gdb) p ctx
$2 = (const ceph::assert_data &) @0x7ffa70587900: {assertion = 0x7ffa70530598 "(bool)_front == (bool)_size", file = 0x7ffa705305b4 "./src/include/xlist.h", line = 87, 
  function = 0x7ffa7053b410 "size_t xlist<T>::size() const [with T = ObjectCacher::Object*; size_t = long unsigned int]"}

A race condition occurred, leading to abnormal behavior in the judgment.
It may not be necessary to print the entire Inode structure; simply printing the inode number should be sufficient.


Related issues 3 (0 open3 closed)

Copied to Ceph - Backport #67739: quincy: Race condition when printing Inode in the ll_sync_inode functionRejectedPonnuvel PActions
Copied to Ceph - Backport #67740: reef: Race condition when printing Inode in the ll_sync_inode functionResolvedPonnuvel PActions
Copied to Ceph - Backport #67741: squid: Race condition when printing Inode in the ll_sync_inode functionResolvedPonnuvel PActions
Actions #1

Updated by Chengen Du over 1 year ago

Chengen Du wrote:

In the ll_sync_inode function, the entire Inode structure is printed without holding a lock, which may lead to the following core trace:
[...]
Upon further analysis of the call trace using GDB, both the _front and _back member variables in xlist<ObjectCacher::Object*> are set to zero, yet an assertion failure is still triggered.
[...]
A race condition occurred, leading to abnormal behavior in the judgment.
It may not be necessary to print the entire Inode structure; simply printing the inode number should be sufficient.

I’m currently working on this issue.

Actions #2

Updated by Chengen Du over 1 year ago

Chengen Du wrote in #note-1:

Chengen Du wrote:

In the ll_sync_inode function, the entire Inode structure is printed without holding a lock, which may lead to the following core trace:
[...]
Upon further analysis of the call trace using GDB, both the _front and _back member variables in xlist<ObjectCacher::Object*> are set to zero, yet an assertion failure is still triggered.
[...]
A race condition occurred, leading to abnormal behavior in the judgment.
It may not be necessary to print the entire Inode structure; simply printing the inode number should be sufficient.

I’m currently working on this issue.

PR: https://github.com/ceph/ceph/pull/59162

Actions #3

Updated by Xiubo Li over 1 year ago

  • Status changed from New to Fix Under Review
  • Backport set to squid,reef,quincy
Actions #4

Updated by Xiubo Li over 1 year ago

  • Pull request ID set to 59162
Actions #5

Updated by Patrick Donnelly over 1 year ago

  • Assignee set to Chengen Du
  • Target version set to v20.0.0
  • Source set to Community (dev)
Actions #6

Updated by Patrick Donnelly over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #7

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #67739: quincy: Race condition when printing Inode in the ll_sync_inode function added
Actions #8

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #67740: reef: Race condition when printing Inode in the ll_sync_inode function added
Actions #9

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #67741: squid: Race condition when printing Inode in the ll_sync_inode function added
Actions #10

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #11

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to b3896f484955daa9ede66906dac71b0d4db91c2f
  • Fixed In set to v19.3.0-4519-gb3896f48495
  • Upkeep Timestamp set to 2025-07-08T18:45:39+00:00
Actions #12

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-4519-gb3896f48495 to v19.3.0-4519-gb3896f484955
  • Upkeep Timestamp changed from 2025-07-08T18:45:39+00:00 to 2025-07-14T15:45:58+00:00
Actions #13

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-4519-gb3896f484955 to v19.3.0-4519-gb3896f4849
  • Upkeep Timestamp changed from 2025-07-14T15:45:58+00:00 to 2025-07-14T21:10:10+00:00
Actions #14

Updated by Ponnuvel P 5 months ago

  • Status changed from Pending Backport to Resolved
Actions #15

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2156
  • Upkeep Timestamp changed from 2025-07-14T21:10:10+00:00 to 2025-11-01T01:32:25+00:00
Actions

Also available in: Atom PDF