Actions
Bug #66152
closedmds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks
% Done:
0%
Source:
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
quiesce
Labels (FS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
v19.3.0-2435-g25e4ee2fa7
Released In:
v20.2.0~2814
Upkeep Timestamp:
2025-11-01T01:11:31+00:00
Description
like this one: https://pulpito.ceph.com/pdonnell-2024-05-21_01:33:07-fs:workload-wip-lusov-quiesce-overdrive-export-distro-default-smithi/7717849/
quiesce request on mds.1:
"description": "internal op quiesce_inode:mds.1:50612 fp=#0x1000000383a",
...
"type_data": {
"flag_point": "quiesce complete for non-auth inode",
took authpin on mds 0:
2024-05-21T03:24:08.269+0000 7f6319567700 7 mds.0.server dispatch_peer_request request(mds.1:50612 nref=2 peer_to mds.1 sr=0x56059607a700) peer_request(mds.1:50612.0 authpin)
2024-05-21T03:24:08.269+0000 7f6319567700 10 mds.0.server handle_peer_auth_pin request(mds.1:50612 nref=2 peer_to mds.1 sr=0x56059607a700)
2024-05-21T03:24:08.269+0000 7f6319567700 10 mds.0.server auth_pinning [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comment-9.xml.tmp auth{1=1} v1151 DIRTY
PARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 0x56059c154000]
2024-05-21T03:24:08.269+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) auth_pin by 0x56059e100d80 on [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comm
ent-9.xml.tmp auth{1=1} v1151 ap=1 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x5
6059c154000] now 1
then the rename request tried to authpin-freeze the same inode:
"description": "client_request(client.24833:456994 rename #0x20000001f84/comment-9.xml #0x20000001f84/comment-9.xml.tmp 2024-05-21T03:24:04.643446+0000 caller_uid=1000, caller_gid=1316{6,36,1000,1316,})",
2024-05-21T03:24:08.271+0000 7f6319567700 4 mds.0.server handle_peer_request client.24833:456994 from mds.1
2024-05-21T03:24:08.271+0000 7f6319567700 7 mds.0.cache request_start_peer request(client.24833:456994 nref=2 peer_to mds.1) by mds.1
2024-05-21T03:24:08.271+0000 7f6319567700 7 mds.0.server dispatch_peer_request request(client.24833:456994 nref=2 peer_to mds.1 sr=0x56059ff5f500) peer_request(client.24833:456994.0 authpin)
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.server handle_peer_auth_pin request(client.24833:456994 nref=2 peer_to mds.1 sr=0x56059ff5f500)
2024-05-21T03:24:08.271+0000 7f6319567700 20 mds.0.cache.dir(0x20000001f84.100*) lookup (comment-9.xml.tmp, 'head')
2024-05-21T03:24:08.271+0000 7f6319567700 20 mds.0.cache.dir(0x20000001f84.100*) hit -> (comment-9.xml.tmp,head)
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.server freezing auth pin on [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comment-9.xml.tmp auth{1=1} v1
151 ap=1 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x56059c154000]
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) auth_pin by 0x56059e103180 on [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comm
ent-9.xml.tmp auth{1=1} v1151 ap=2 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x5
6059c154000] now 2 <---
2024-05-21T03:24:08.271+0000 7f6319567700 15 mds.0.cache.dir(0x20000001f84.100*) adjust_nested_auth_pins 1 on [dir 0x20000001f84.100* /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/ [2,head] auth{1=1,2=1} v=1152 cv=763/763 REP ap=0+9 state=1610874881|complete f(v5 m2024-05-21T03:24:04.636446+0000 26=26+0) n(v7 rc2024-05-21T03:24:04.636446+0000 b566673 26=26+0) hs=26+12,ss=31+0 dirty=69 | ptrwaiter=0 child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1 dirty=1 authpin=0 0x560592cacd00] by 0x56059c154000 count now 0/9
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) freeze_inode - waiting for auth_pins to drop to 1
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) add_waiter tag 2 0x56059bda5d40 !ambig 1 !frozen 1 !freezing 0
this inode is now freezing and is blocking the quiesce but it can't freeze until the quiesce is over and that first authpin is lifted --> deadlock
Updated by Leonid Usov almost 2 years ago
- Tracker changed from Enhancement to Bug
- Subject changed from test_quiesce: test_quiesce_drops_remote_authpins_on_failure needs fixing after rolling back one of the changed it depended on to mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks
- Description updated (diff)
- Regression set to No
- Severity set to 3 - minor
Updated by Leonid Usov almost 2 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 57579
Updated by Patrick Donnelly almost 2 years ago
- Status changed from Fix Under Review to Pending Backport
- Target version set to v20.0.0
Updated by Upkeep Bot almost 2 years ago
- Copied to Backport #66258: squid: mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks added
Updated by Leonid Usov almost 2 years ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 25e4ee2fa7e8913bac09af9e43706ddeba1cd14a
- Fixed In set to v19.3.0-2435-g25e4ee2fa7e
- Upkeep Timestamp set to 2025-06-27T03:18:30+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-2435-g25e4ee2fa7e to v19.3.0-2435-g25e4ee2fa7
- Upkeep Timestamp changed from 2025-06-27T03:18:30+00:00 to 2025-07-14T16:44:57+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2814
- Upkeep Timestamp changed from 2025-07-14T16:44:57+00:00 to 2025-11-01T01:11:31+00:00
Actions