mds: prevent deadlocks between quiesce and fragmenting#57250
mds: prevent deadlocks between quiesce and fragmenting#57250leonid-s-usov wants to merge 3 commits intomainfrom
Conversation
9b6adf6 to
00c97f7
Compare
|
jenkins test api |
00c97f7 to
a74db31
Compare
qa/tasks/cephfs/test_quiesce.py
Outdated
| self.mount_a.run_shell_payload("mkdir -p root/sub2") | ||
| self.mount_a.write_file("root/sub2/file2", "file2") | ||
|
|
||
| self.config_set('mds', 'mds_freeze_delay_ms', '15000') # fragments will spend at least 15 seconds freezing |
There was a problem hiding this comment.
Commenting on this before I head to bed for the night: did the procedure I outlined in slack not work? Why? I like this config for debugging but I don't think it should be necessary for this test.
There was a problem hiding this comment.
Sorry, I'm not sure which procedure you're referring to, would you mind repeating?
In this test, the delay is crucial. It reproduces the situation when a directory stays freezing for a long time and it does so in a predictable way. As a result, I can be sure that if quiesce completes faster than a directory would reach the frozen state we overcome the issue in the ticket.
There was a problem hiding this comment.
Here is that test that works for me:
def test_quiesce_parent_frag(self):
"""
"""
self._configure_subvolume()
self.mount_a.run_shell_payload("for i in `seq 1 100`; do mkdir -p foo/$i; done")
path_parent = self.mount_a.cephfs_mntpt + "/foo"
path_q1 = path_parent + "/1"
J = self.fs.rank_tell("quiesce", "path", path_q1)
log.debug(f"{J}")
reqid = self._reqid_tostr(J['op']['reqid'])
self._wait_for_quiesce_complete(reqid)
self.fs.rank_tell("dirfrag", "split", path_parent, "0/0", "1")
path_q2 = path_parent + "/2"
J = self.fs.rank_tell("quiesce", "path", path_q2)
log.debug(f"{J}")
reqid = self._reqid_tostr(J['op']['reqid'])
self._wait_for_quiesce_complete(reqid) # will fail without fix
# TODO: verify dir is fragmented
and it can go in the TestQuiesce class because it does not require multiple actives. I believe this simpler, no?
There was a problem hiding this comment.
I don't find it simpler, TBH, though it's similar. Additionally, my version predictably goes into an endless balancing war of tug with as few as two directories and two files, which I think is a stable approach.
Finally, my current test can be moved to TestQuiesce as is, too, it's not dependent on the pinning.
There was a problem hiding this comment.
I'm puzzled you don't find this simpler. It conceptually is three operations:
- quiesce one dir below a parent
- start fragmenting the parent (will reliably block because step 1 has auth pins for duration of quiesce)
- quiesce another dir below the parent
Your test relies on delaying freezing, configuration-driven fragmentation, and numerous quiesce ops.
There was a problem hiding this comment.
@batrick I have combined our approaches. Thanks for the trick with two quiesce ops under the same parent, that's the key. I used my infinite rebalancing approach and your double quiesce approach together in a single test with just two subdirectories, and it works well to reproduce the issue. I will push the new version shortly and let you review
|
jenkins test api |
|
jenkins test windows |
|
This is under test with-quiesce (thrasher), x3 |
a74db31 to
7669374
Compare
batrick
left a comment
There was a problem hiding this comment.
otherwise the bypass_freezing changes look good to me.
qa/tasks/cephfs/test_quiesce.py
Outdated
| self.mount_a.run_shell_payload("mkdir -p root/sub2") | ||
| self.mount_a.write_file("root/sub2/file2", "file2") | ||
|
|
||
| self.config_set('mds', 'mds_freeze_delay_ms', '15000') # fragments will spend at least 15 seconds freezing |
There was a problem hiding this comment.
Here is that test that works for me:
def test_quiesce_parent_frag(self):
"""
"""
self._configure_subvolume()
self.mount_a.run_shell_payload("for i in `seq 1 100`; do mkdir -p foo/$i; done")
path_parent = self.mount_a.cephfs_mntpt + "/foo"
path_q1 = path_parent + "/1"
J = self.fs.rank_tell("quiesce", "path", path_q1)
log.debug(f"{J}")
reqid = self._reqid_tostr(J['op']['reqid'])
self._wait_for_quiesce_complete(reqid)
self.fs.rank_tell("dirfrag", "split", path_parent, "0/0", "1")
path_q2 = path_parent + "/2"
J = self.fs.rank_tell("quiesce", "path", path_q2)
log.debug(f"{J}")
reqid = self._reqid_tostr(J['op']['reqid'])
self._wait_for_quiesce_complete(reqid) # will fail without fix
# TODO: verify dir is fragmented
and it can go in the TestQuiesce class because it does not require multiple actives. I believe this simpler, no?
7669374 to
89f0b13
Compare
want_bypass_freezing flag to MDR and set it for quiesce ops|
Here's what the merge detector traces look like: |
ecb5a06 to
1b6a7dd
Compare
|
jenkins test make check |
7802755 to
e263233
Compare
|
jenkins test make check |
|
This was tested together with #57332 in https://pulpito.ceph.com/leonidus-2024-05-13_05:53:33-fs-wip-lusov-quiesce-distro-default-smithi/. The results are promising: the only two quiesce timeouts are instances of a different issue https://tracker.ceph.com/issues/65977 the EMEDIUMTYPE are timeouts from the teuthology command runner, and are usually signs of unrelated issues. |
Fixes: https://tracker.ceph.com/issues/65716 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
This reverts a9964a7 Signed-off-by: Leonid Usov <leonid.usov@ibm.com>
Quiesce requires revocation of capabilities, which is not working for a freezing/frozen nodes. Since it is best effort, abort an ongoing fragmenting for the sake of a faster quiesce. Signed-off-by: Leonid Usov <leonid.usov@ibm.com> Fixes: https://tracker.ceph.com/issues/65716
e263233 to
164db2d
Compare
|
@batrick please approve for merge. See #57332 (comment) |
batrick
left a comment
There was a problem hiding this comment.
Notes from our discussion:
-
Please open a tracker to check that the fragmentation/merging is occurring between the quiesce calls. (I feel this one is tricky because you want to check that fragmentation starts after the first but before the second. That's one reason I think the test I originally provided is a good additional test to your changes to
test_quiesce_dir_fragment. -
Please update the commit message for
revert: mds: provide mechanism to authpin while freezingto explain why this approach didn't work.
No code changes should be required.
|
I am closing this PR and will submit the three commits via #57332. These two PRs have a dependency, and it's easier to manage as a single stream of commits.
I'll apply the requested change in the next revision of #57332 |
|
Quiesce requires revocation of capabilities,
which is not working for a freezing/frozen nodes.
Since it is best effort, abort an ongoing fragmenting
for the sake of a faster quiesce.
Fixes: https://tracker.ceph.com/issues/65716
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e