mds/quiesce: fix timeouts, a crash, and overdrive a tree export when possible#57579
mds/quiesce: fix timeouts, a crash, and overdrive a tree export when possible#57579
Conversation
3b748ea to
7956b5e
Compare
|
jenkins test make check arm64 |
1 similar comment
|
jenkins test make check arm64 |
batrick
left a comment
There was a problem hiding this comment.
mds/quiesce: don't force a remote authpin for the quiesce lock was not part of the two runs I did for
https://pulpito.ceph.com/?branch=wip-lusov-quiesce-overdrive-export
I would prefer to keep that in another PR/ticket. Let's get the export fix merged quickly.
I'm not sure. I think they work together, where the absence of the remote authpin in most of the cases eliminates the possibility for a deadlock. It also reduces the inter-rank messaging. And it's true to the current design of the quiesce. And it's required to fix that test that fails otherwise. Your run shows one quiesce timeout due to exporting. I have looked at that one and it appears to be the renaming issue or rather a different kind of it. I'm re-running just the exporter with full replication x8 to stress-test this PR: https://pulpito.ceph.com/leonidus-2024-05-21_18:10:18-fs-wip-lusov-quiesce-overdrive-export-distro-default-smithi/ |
No quiesce errors. I do think that the absence of the authpin plays the role. To validate it I'll rerun the same suite as above but with the previous version of the branch, without the authpin change |
|
|
scheduling 20 jobs of the workload that has seen the rename issue. without the AP change: https://pulpito.ceph.com/leonidus-2024-05-22_06:28:15-fs-wip-lusov-quiesce-overdrive-export-distro-default-smithi/ with the AP change: https://pulpito.ceph.com/leonidus-2024-05-22_06:28:39-fs-wip-lusov-quiesce-overdrive-export-distro-default-smithi/ |
|
Yes, I found that issue with the rename, and it's indeed due to the remote authpin from the quiesce request. quiesce request on mds.1: took authpin on mds 0: then the rename request tried to authpin-freeze the same inode: this inode is now freezing and is blocking the quiesce but it can't freeze until the quiesce is over and that first authpin is lifted --> deadlock |
7956b5e to
0cfaf3a
Compare
|
|
@batrick can we please have a single PR for those two tickets? They will have to be backported everywhere in a batch... 🙏🏻 🙏🏻 |
0cfaf3a to
9dc8c01
Compare
|
jenkins test make check arm64 |
batrick
left a comment
There was a problem hiding this comment.
Don't forget to QA -s fs:functional --filter quiesce.
mds/qiuesce: allow quiescing a single file in dispatch_quiesce_path
will break test_quiesce_path_regfile
mds: quiesce_path: don't block the asock thread and return an adequate rc -> mds: do not block the asok thread via quiesce_path and return an adequate rc
otherwise the change is 👍
mds: add lifetimecmd param tolock path`` 👍
qa/cephfs: quiesce: test that a quiesce op doesn't hold remote ap 👍
Other fixes are good. You've convinced me to combine the changes.
240841e to
6e074e1
Compare
8e0427a to
11ffcf2
Compare
|
Latest: Stress exporting/renaming/fragmenting - pass (*half of the jobs had failed because of the backtrace scrub, but no quiesce failures) General with-quiesce: 2 timeouts 7724309 and 772410 is a problem with the QuiesceAgent ack reordering, reported this in a new defect: https://tracker.ceph.com/issues/66219 |
… adequate rc Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
1. avoid taking a remote authpin for the quiesce lock 2. drop remote authpins that were taken because of other locks We should not be forcing a mustpin when taking quiesce lock. This creates unnecessary overhead due to the distributed nature of the quiesce: all ranks will execute quiesce_inode, including the auth rank, which will authpin the inode. Auth pinning on the auth rank is important to synchronize quiesce with operations that are managed by the auth, like fragmenting and exporting. If we let a remote quiesce process take a foreign authpin then it may block freezing on the auth, which will stall quiesce locally. This wouldn't be a problem if the quiesce that is blocked on the auth and the quiesce that's holding a remote authpin from the replica side were unrelated, but in our case it may be the same logical quiesce that effectively steps on its own toes. This creates an opportunity for a deadlock. Fixes: https://tracker.ceph.com/issues/66152 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Just like with the fragmenting, we should abort an ongoing export if a quiesce is attempted for the directory. To minimize the stress for the system, we only allow the abort if the export hasn't yet managed to freeze the tree. If that is the case, then quiesce will have to wait for the export to finish. Fixes: https://tracker.ceph.com/issues/66123 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
…rancy Fixes: https://tracker.ceph.com/issues/66208 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
In this scenario, the agent thread is able to run and generate an ack before the db_update call returns to the caller. Fixes: https://tracker.ceph.com/issues/66219 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Defer to the agent thread to perform all acking. This avoids race conditions between the updating thread and the acking thread. Fixes: https://tracker.ceph.com/issues/66219 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
11ffcf2 to
9a4c585
Compare
Fixes: https://tracker.ceph.com/issues/66225 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
|
jenkins test make check |
|
jenkins test windows |
|
A rerun with the fix of https://tracker.ceph.com/issues/66219 found by the tests in #57579 (comment): Functional - pass Export/replication kernel_untar_build stress 8/20 pass, no quiesce errors A generic A second generic The analysis suggests that this is a client issue, see https://tracker.ceph.com/issues/66229. In total, there were 980 successful quiesces over the three runs of this batch. The quiesce times are measured by the script, and as such are not reliable, but still, here's the distribution:
|
|
A re-run with the change for [quiesce] disable debug parameters on quiesce roots Functional - pass A generic Both issues are due to messaging errors preventing the delivery of an ack: 7727755 - during the quiesce 7727756 - during the release Stress test with replication and exporting |


Fixes: https://tracker.ceph.com/issues/66152 - mds/quiesce: holding remote authpins for the duration of the quiesce op may cause deadlocks
Fixes: https://tracker.ceph.com/issues/66123 - Quiesce timeout due to exporting
Fixes: https://tracker.ceph.com/issues/66208 - Segfault when
quiesce_overdrive_fragmentingsynchronously callsdispatch_fragment_dirfor an abortFixes: https://tracker.ceph.com/issues/66219 - [quiesce timeout] QuiesceAgent may send an async QUIESCED ack before the QuiesceManager does the sync QUIESCING ack, which causes the QUIESCED ack to be lost
Fixes: https://tracker.ceph.com/issues/66225 - [quiesce] disable debug parameters on quiesce roots
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e