mds: only authpin on wrlock when not a locallock#58861
Conversation
For example:
2024-07-22T21:48:18.372+0000 7f4751a3d700 7 mds.6.server dispatch_client_request client_request(client.4748:62187 create owner_uid=1000, owner_gid=1000 #0x1000000148d/file.mdtest.145.31073 2024-07-22T21:48:18.371416+0000 caller_uid=1000, caller_gid=1000{})
2024-07-22T21:48:18.372+0000 7f4751a3d700 7 mds.6.server open w/ O_CREAT on #0x1000000148d/file.mdtest.145.31073
2024-07-22T21:48:18.372+0000 7f4751a3d700 10 mds.6.server rdlock_path_xlock_dentry request(client.4748:62187 nref=2 cr=0x55649752bc00) #0x1000000148d/file.mdtest.145.31073
2024-07-22T21:48:18.372+0000 7f4751a3d700 7 mds.6.cache traverse: opening base ino 0x1000000148d snap head
2024-07-22T21:48:18.372+0000 7f4751a3d700 10 mds.6.locker try_rdlock_snap_layout request(client.4748:62187 nref=3 cr=0x55649752bc00) [inode 0x1000000148d [...2,head] /io500/io500/datafiles/2024.07.22-21.44.41/mdtest-easy/test-dir.0-0/mdtest_tree.205.0/ rep@0.1 fragtree_t(*^3) v101539 f(v20 m2024-07-22T21:48:14.075500+0000 30468=30468+0) n(v20 rc2024-07-22T21:48:14.075500+0000 30469=30468+1)/n(v0 rc2024-07-22T21:44:53.839831+0000 155=154+1) (idft lock) (isnap sync r=53) (inest mix w=101 dirty) (ipolicy sync r=53) (ifile mix w=52 dirty) (iquiesce lock w=52 last_client=4748) caps={4748=pAsLsXs/-@30914} | dirtyscattered=2 request=53 lock=5 importing=0 dirfrag=8 caps=1 waiter=0 export_pin=6 0x556494a71b80]
2024-07-22T21:48:18.372+0000 7f4751a3d700 12 mds.6.cache traverse: path seg depth 0 'file.mdtest.145.31073' snapid head
2024-07-22T21:48:18.372+0000 7f4751a3d700 20 mds.6.cache.dir(0x1000000148d.011*) lookup (file.mdtest.145.31073, 'head')
2024-07-22T21:48:18.372+0000 7f4751a3d700 20 mds.6.cache.dir(0x1000000148d.011*) hit -> (file.mdtest.145.31073,head)
2024-07-22T21:48:18.372+0000 7f4751a3d700 10 mds.6.locker acquire_locks request(client.4748:62187 nref=3 cr=0x55649752bc00)
2024-07-22T21:48:18.372+0000 7f4751a3d700 20 mds.6.locker lov = [LockOp(l=(ifile mix w=52 dirty),f=0x2),LockOp(l=(inest mix w=101 dirty),f=0x2),LockOp(l=(iauth sync),f=0x1),LockOp(l=(dn sync),f=0x4)]
2024-07-22T21:48:18.372+0000 7f4751a3d700 20 mds.6.locker auth_pin_nonblocking=0
2024-07-22T21:48:18.372+0000 7f4751a3d700 20 mds.6.locker must wrlock (ifile mix w=52 dirty) [inode 0x1000000148d [...2,head] /io500/io500/datafiles/2024.07.22-21.44.41/mdtest-easy/test-dir.0-0/mdtest_tree.205.0/ rep@0.1 fragtree_t(*^3) v101539 f(v20 m2024-07-22T21:48:14.075500+0000 30468=30468+0) n(v20 rc2024-07-22T21:48:14.075500+0000 30469=30468+1)/n(v0 rc2024-07-22T21:44:53.839831+0000 155=154+1) (idft lock) (isnap sync r=53) (inest mix w=101 dirty) (ipolicy sync r=53) (ifile mix w=52 dirty) (iquiesce lock w=52 last_client=4748) caps={4748=pAsLsXs/-@30914} | dirtyscattered=2 request=53 lock=5 importing=0 dirfrag=8 caps=1 waiter=0 export_pin=6 0x556494a71b80]
2024-07-22T21:48:18.372+0000 7f4751a3d700 20 mds.6.locker need shared quiesce lock for LockOp(l=(ifile mix w=52 dirty),f=0x2) on ifile of 0x556494a71b80
2024-07-22T21:48:18.372+0000 7f4751a3d700 20 mds.6.locker must wrlock (iquiesce lock w=52 last_client=4748) [inode 0x1000000148d [...2,head] /io500/io500/datafiles/2024.07.22-21.44.41/mdtest-easy/test-dir.0-0/mdtest_tree.205.0/ rep@0.1 fragtree_t(*^3) v101539 f(v20 m2024-07-22T21:48:14.075500+0000 30468=30468+0) n(v20 rc2024-07-22T21:48:14.075500+0000 30469=30468+1)/n(v0 rc2024-07-22T21:44:53.839831+0000 155=154+1) (idft lock) (isnap sync r=53) (inest mix w=101 dirty) (ipolicy sync r=53) (ifile mix w=52 dirty) (iquiesce lock w=52 last_client=4748) caps={4748=pAsLsXs/-@30914} | dirtyscattered=2 request=53 lock=5 importing=0 dirfrag=8 caps=1 waiter=0 export_pin=6 0x556494a71b80]
2024-07-22T21:48:18.372+0000 7f4751a3d700 15 mds.6.locker will also auth_pin [inode 0x1000000148d [...2,head] /io500/io500/datafiles/2024.07.22-21.44.41/mdtest-easy/test-dir.0-0/mdtest_tree.205.0/ rep@0.1 fragtree_t(*^3) v101539 f(v20 m2024-07-22T21:48:14.075500+0000 30468=30468+0) n(v20 rc2024-07-22T21:48:14.075500+0000 30469=30468+1)/n(v0 rc2024-07-22T21:44:53.839831+0000 155=154+1) (idft lock) (isnap sync r=53) (inest mix w=101 dirty) (ipolicy sync r=53) (ifile mix w=52 dirty) (iquiesce lock w=52 last_client=4748) caps={4748=pAsLsXs/-@30914} | dirtyscattered=2 request=53 lock=5 importing=0 dirfrag=8 caps=1 waiter=0 export_pin=6 0x556494a71b80] in case we need to request a scatter
Adding the wrlock on ifile adds iquiesce. Consequently, examining the wrlock on
iquiesce will add an authpin for the inode.
The code for adding authpins on a wrlock should normally no-op because the
xlock which adds the versionlock (dn or inode) already has added the authpin
(and the MDS would be auth for the metadata too). In the case of the
quiescelock, we may not be auth so an authpin can be very expensive and
unnecessary. We only require an authpin for an xlock on the quiescelock when
auth.
Fixes: https://tracker.ceph.com/issues/65851
Signed-off-by: Patrick Donnelly <pdonnell@redhat.com>
|
Good news and bad news. The good news is that in an isolated test of mdtest easy write, this restored performance to at least previous levels (ie we are definitely on the right track here): The bad news is that when I tried to do a full io500 run, io completely stalled at some point. I saw this message in several mds logs: and this in rank 0: I'll start up a new cluster and try again just to see if the behavior repeats. |
|
On a second run, I see Edit: I was able to get through a full run this time. Now doing some comparisons with prior builds. |
|
I think that The I/O stall is probably unrelated although worth looking into. |
|
jenkins test api |
|
jenkins test make check arm64 |
|
Some results (not directly comparable to earlier results due to a combination of HW failure and kernel upgrades):
|
|
Let's open a tracker for |
|
This PR is under test in https://tracker.ceph.com/issues/67214. |
Ready for merge. cc @vshankar |
For example:
Adding the wrlock on ifile adds iquiesce. Consequently, examining the wrlock on iquiesce will add an authpin for the inode.
The code for adding authpins on a wrlock should normally no-op because the xlock which adds the versionlock (dn or inode) already has added the authpin (and the MDS would be auth for the metadata too). In the case of the quiescelock, we may not be auth so an authpin can be very expensive and unnecessary. We only require an authpin for an xlock on the quiescelock when auth.
Fixes: https://tracker.ceph.com/issues/65851
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e