Project

General

Profile

Actions

Bug #66152

closed

mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks

Added by Leonid Usov almost 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
% Done:

0%

Source:
Backport:
squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
quiesce
Labels (FS):
Pull request ID:
Tags (freeform):
Fixed In:
v19.3.0-2435-g25e4ee2fa7
Released In:
v20.2.0~2814
Upkeep Timestamp:
2025-11-01T01:11:31+00:00

Description

like this one: https://pulpito.ceph.com/pdonnell-2024-05-21_01:33:07-fs:workload-wip-lusov-quiesce-overdrive-export-distro-default-smithi/7717849/

quiesce request on mds.1:

  "description": "internal op quiesce_inode:mds.1:50612 fp=#0x1000000383a",
...
  "type_data": {
    "flag_point": "quiesce complete for non-auth inode",

took authpin on mds 0:

2024-05-21T03:24:08.269+0000 7f6319567700  7 mds.0.server dispatch_peer_request request(mds.1:50612 nref=2 peer_to mds.1 sr=0x56059607a700) peer_request(mds.1:50612.0 authpin)
2024-05-21T03:24:08.269+0000 7f6319567700 10 mds.0.server handle_peer_auth_pin request(mds.1:50612 nref=2 peer_to mds.1 sr=0x56059607a700)
2024-05-21T03:24:08.269+0000 7f6319567700 10 mds.0.server auth_pinning [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comment-9.xml.tmp auth{1=1} v1151 DIRTY
PARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 0x56059c154000]
2024-05-21T03:24:08.269+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) auth_pin by 0x56059e100d80 on [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comm
ent-9.xml.tmp auth{1=1} v1151 ap=1 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x5
6059c154000] now 1

then the rename request tried to authpin-freeze the same inode:

  "description": "client_request(client.24833:456994 rename #0x20000001f84/comment-9.xml #0x20000001f84/comment-9.xml.tmp 2024-05-21T03:24:04.643446+0000 caller_uid=1000, caller_gid=1316{6,36,1000,1316,})",
2024-05-21T03:24:08.271+0000 7f6319567700  4 mds.0.server handle_peer_request client.24833:456994 from mds.1
2024-05-21T03:24:08.271+0000 7f6319567700  7 mds.0.cache request_start_peer request(client.24833:456994 nref=2 peer_to mds.1) by mds.1
2024-05-21T03:24:08.271+0000 7f6319567700  7 mds.0.server dispatch_peer_request request(client.24833:456994 nref=2 peer_to mds.1 sr=0x56059ff5f500) peer_request(client.24833:456994.0 authpin)
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.server handle_peer_auth_pin request(client.24833:456994 nref=2 peer_to mds.1 sr=0x56059ff5f500)
2024-05-21T03:24:08.271+0000 7f6319567700 20 mds.0.cache.dir(0x20000001f84.100*) lookup (comment-9.xml.tmp, 'head')
2024-05-21T03:24:08.271+0000 7f6319567700 20 mds.0.cache.dir(0x20000001f84.100*)   hit -> (comment-9.xml.tmp,head)
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.server  freezing auth pin on [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comment-9.xml.tmp auth{1=1} v1
151 ap=1 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x56059c154000]
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) auth_pin by 0x56059e103180 on [inode 0x1000000383a [c,head] /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/comm
ent-9.xml.tmp auth{1=1} v1151 ap=2 DIRTYPARENT s=0 n(v0 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={24833=0-4194304@b} caps={24833=pAsxLsXsxFsxcrwb/pAsxXsxFxwb@2},l=24833 | request=0 caps=1 dirtyparent=1 replicated=1 dirty=1 authpin=1 0x5
6059c154000] now 2 <--- 
2024-05-21T03:24:08.271+0000 7f6319567700 15 mds.0.cache.dir(0x20000001f84.100*) adjust_nested_auth_pins 1 on [dir 0x20000001f84.100* /volumes/qa/sv_1/d8fbe2d3-1fdc-4c3a-b7dc-a7b20043535f/client.0/tmp/blogbench-1.0/src/blogtest_in/blog-52/ [2,head] auth{1=1,2=1} v=1152 cv=763/763 REP ap=0+9 state=1610874881|complete f(v5 m2024-05-21T03:24:04.636446+0000 26=26+0) n(v7 rc2024-05-21T03:24:04.636446+0000 b566673 26=26+0) hs=26+12,ss=31+0 dirty=69 | ptrwaiter=0 child=1 frozen=0 subtree=0 importing=0 importbound=0 sticky=0 replicated=1 dirty=1 authpin=0 0x560592cacd00] by 0x56059c154000 count now 0/9
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) freeze_inode - waiting for auth_pins to drop to 1
2024-05-21T03:24:08.271+0000 7f6319567700 10 mds.0.cache.ino(0x1000000383a) add_waiter tag 2 0x56059bda5d40 !ambig 1 !frozen 1 !freezing 0

this inode is now freezing and is blocking the quiesce but it can't freeze until the quiesce is over and that first authpin is lifted --> deadlock


Related issues 1 (0 open1 closed)

Copied to CephFS - Backport #66258: squid: mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocksResolvedLeonid UsovActions
Actions #1

Updated by Leonid Usov almost 2 years ago

  • Tracker changed from Enhancement to Bug
  • Subject changed from test_quiesce: test_quiesce_drops_remote_authpins_on_failure needs fixing after rolling back one of the changed it depended on to mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks
  • Description updated (diff)
  • Regression set to No
  • Severity set to 3 - minor
Actions #2

Updated by Leonid Usov almost 2 years ago

  • Description updated (diff)
Actions #3

Updated by Leonid Usov almost 2 years ago

  • Description updated (diff)
Actions #4

Updated by Leonid Usov almost 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 57579
Actions #5

Updated by Leonid Usov almost 2 years ago

  • Backport set to squid
Actions #6

Updated by Patrick Donnelly almost 2 years ago

  • Status changed from Fix Under Review to Pending Backport
  • Target version set to v20.0.0
Actions #7

Updated by Upkeep Bot almost 2 years ago

  • Copied to Backport #66258: squid: mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks added
Actions #9

Updated by Leonid Usov almost 2 years ago

  • Status changed from Pending Backport to Resolved
Actions #10

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to 25e4ee2fa7e8913bac09af9e43706ddeba1cd14a
  • Fixed In set to v19.3.0-2435-g25e4ee2fa7e
  • Upkeep Timestamp set to 2025-06-27T03:18:30+00:00
Actions #11

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-2435-g25e4ee2fa7e to v19.3.0-2435-g25e4ee2fa7
  • Upkeep Timestamp changed from 2025-06-27T03:18:30+00:00 to 2025-07-14T16:44:57+00:00
Actions #12

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2814
  • Upkeep Timestamp changed from 2025-07-14T16:44:57+00:00 to 2025-11-01T01:11:31+00:00
Actions

Also available in: Atom PDF