Bug #65977
closedQuiesce times out while the ops dump shows all existing quiesce ops as complete;
0%
Description
2024-05-13T06:55:03.254 WARNING:tasks.quiescer.fs.[cephfs]:Set '9ce534df' hasn't quiesced after 55.6 seconds (timeout: 90). Dumping ops with locks from all ranks.
2024-05-13T06:55:07.364 DEBUG:tasks.quiescer.fs.[cephfs]:Dumping ops on rank 4 (i) to a remote file /var/run/ceph/1717f1ac-10f0-11ef-bc98-c7b262605968/ops-9ce534df-1-mds.i.json
2024-05-13T06:55:12.292 DEBUG:tasks.quiescer.fs.[cephfs]:Dumping ops on rank 3 (h) to a remote file /var/run/ceph/1717f1ac-10f0-11ef-bc98-c7b262605968/ops-9ce534df-1-mds.h.json
2024-05-13T06:55:20.018 DEBUG:tasks.quiescer.fs.[cephfs]:Dumping ops on rank 2 (g) to a remote file /var/run/ceph/1717f1ac-10f0-11ef-bc98-c7b262605968/ops-9ce534df-1-mds.g.json
2024-05-13T06:55:24.991 DEBUG:tasks.quiescer.fs.[cephfs]:Dumping ops on rank 1 (a) to a remote file /var/run/ceph/1717f1ac-10f0-11ef-bc98-c7b262605968/ops-9ce534df-1-mds.a.json
2024-05-13T06:55:29.578 DEBUG:tasks.quiescer.fs.[cephfs]:Dumping ops on rank 0 (b) to a remote file /var/run/ceph/1717f1ac-10f0-11ef-bc98-c7b262605968/ops-9ce534df-1-mds.b.json
2024-05-13T06:55:42.118 DEBUG:tasks.quiescer.fs.[cephfs]:Outstanding op at rank 3 (h) for 9ce534df-1: 'client_request(client.25126:364918 unlink #0x100000079bc/gspca_se401.ko.xz 2024-05-13T06:54:18.557051+0000 caller_uid=0, caller_gid=0{0,})'
The only pending op is
{
"description": "client_request(client.25126:364918 unlink #0x100000079bc/gspca_se401.ko.xz 2024-05-13T06:54:18.557051+0000 caller_uid=0, caller_gid=0{0,})",
"initiated_at": "2024-05-13T06:54:18.558547+0000",
"age": 59.436941433,
"duration": 59.650470112,
"continuous": false,
"type_data": {
"result": null,
"flag_point": "failed to acquire quiesce lock",
"reqid": {
"entity": {
"type": "client",
"num": 25126
},
"tid": 364918
},
and all quiesce_inode ops in all other dumps are complete.
Potentially, the same issue: http://qa-proxy.ceph.com/teuthology/leonidus-2024-05-13_05:53:33-fs-wip-lusov-quiesce-distro-default-smithi/7704563/
Updated by Leonid Usov almost 2 years ago · Edited
This looks like a quiesce db issue, it's missing an ack though we clearly see them all coming
2024-05-13T06:54:19.158+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(102:73) roots:1/0] from 24512 2024-05-13T06:54:19.998+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(103:73) roots:1/0] from 24512 2024-05-13T06:54:19.998+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(103:73) roots:1/0] from 24512 2024-05-13T06:54:20.229+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(103:73) roots:1/0] from 24478 2024-05-13T06:54:21.010+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(104:73) roots:1/0] from 24478 2024-05-13T06:54:21.010+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(104:73) roots:1/0] from 24478 2024-05-13T06:54:21.011+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(104:73) roots:1/0] from 24512 2024-05-13T06:54:21.011+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(104:73) roots:1/0] from 24512 2024-05-13T06:54:21.878+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(104:73) roots:1/0] from 24491 2024-05-13T06:54:22.556+0000 7fa2d1209700 10 quiesce.mds.0 <quiesce_dispatch> got ack q-map[v:(104:73) roots:1/0] from 24506
Unfortunately, the logging from the quiesce db is insufficient to understand what prevents it from registering this set as quiesced. Along with the fix, I'll add better logging to the quiesce db manager where it is looking at the peer reports
Updated by Leonid Usov almost 2 years ago · Edited
- Status changed from New to In Progress
This is a real quiesce db issue which I could reproduce in a unit test.
It happens when the peer receives a state update while on a newer epoch than the leader. The fix is coming soon
Updated by Leonid Usov almost 2 years ago
- Status changed from In Progress to Pending Backport
- Backport set to squid
Updated by Upkeep Bot almost 2 years ago
- Copied to Backport #66070: squid: Quiesce times out while the ops dump shows all existing quiesce ops as complete; added
Updated by Leonid Usov almost 2 years ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 9 months ago
- Merge Commit set to cbb1da11892bf2915bbe3a1a03e2755cf33b9035
- Fixed In set to v19.3.0-2196-gcbb1da11892
- Upkeep Timestamp set to 2025-06-30T19:34:12+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-2196-gcbb1da11892 to v19.3.0-2196-gcbb1da1189
- Upkeep Timestamp changed from 2025-06-30T19:34:12+00:00 to 2025-07-14T16:45:00+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2890
- Upkeep Timestamp changed from 2025-07-14T16:45:00+00:00 to 2025-11-01T01:11:35+00:00