qa/cephfs: clean up evicted client in 4-compat_client.yaml #46988
qa/cephfs: clean up evicted client in 4-compat_client.yaml #46988rishabh-d-dave merged 3 commits intoceph:mainfrom
Conversation
|
jenkins test api |
|
@vshankar @lxbsz The fix and the issue have been verified by running the failing job again. http://pulpito.front.sepia.ceph.com/rishabh-2022-07-06_08:49:09-fs-wip-vshankar-testing-20220527-073645-distro-default-smithi/ |
| clients: | ||
| client.0: False | ||
| client.1: True | ||
| # cleanup evicted client so there's no trouble later. |
There was a problem hiding this comment.
nit: add a comment that only client.0 is upgraded and client.1 is evicted by the mds due to missing feature compat set.
There was a problem hiding this comment.
|
Tested PR #45036 and PR #46988. Ran fine - http://pulpito.front.sepia.ceph.com/rishabh-2022-07-06_11:37:52-fs-wip-vshankar-testing-20220527-073645-distro-default-smithi/. This PR is ready for QA now. |
49e51ba to
9ee8f27
Compare
| - cat mntpt.txt | ||
| - sudo umount -f $(cat mntpt.txt) | ||
| - sudo rmdir $(cat mntpt.txt) | ||
| - rm mntpt.txt |
There was a problem hiding this comment.
How about doing this in clients_evicted() in tasks/fs.py instead ? Then it should be very simple by:
mount.umount_wait()
There was a problem hiding this comment.
Could it lead to a problem for teuthology jobs that wants to keep evicted clients as it is around for some time?
There was a problem hiding this comment.
There are only two places using this, I didn't see it could lead potential issues.
There was a problem hiding this comment.
And from my understanding the fs.clients_evicted: is where needs the evicted clients as it is around for some time. So after this or in this to unmount them should be fine.
There was a problem hiding this comment.
Okay, I'll make the change in that case.
There was a problem hiding this comment.
You'd only need to make sure that the findmnt (from your changes) does not interleave anywhere, which then could lead to a test hang.
c25d3e1 to
41938cc
Compare
|
Tested new change, it work fine - http://pulpito.front.sepia.ceph.com/rishabh-2022-07-07_17:52:35-fs-main-distro-default-smithi/ |
|
jenkins test make check |
84a2841 to
f3857ec
Compare
|
jenkins test api |
|
jenkins test api |
|
jenkins test api |
|
@rishabh-d-dave ping? |
Elaborating on the problem: The Python code hangs when a blocked client is being operated on even when The way around this is (which is currently on this PR) to set Noting the client socket in class constructor and checking if it is blocked in class destructor should be a better way to deal with this issue. I'll try this out and post the result. |
604c3ce to
d8416c8
Compare
f51e1ec to
79bb6a7
Compare
Add a note explaining the reason behind the eviction of "client.1" during this test. Signed-off-by: Rishabh Dave <ridave@redhat.com>
Signed-off-by: Rishabh Dave <ridave@redhat.com>
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
79bb6a7 to
737f978
Compare
|
Tested this PR individually, testing was successful - http://pulpito.front.sepia.ceph.com/rishabh-2022-08-19_14:15:36-fs-wip-rishabh-client-evict-distro-default-smithi/. Adding it for QA run. |
Before unmounting check if the client has been evicted and, if so, run "umount -f -l" for the mount point of the client and cleanup the mount right after it. Attempting to unmount, cleanup or operate in any way over mount point of a evicted client will hang the operation (and thereby our Python code too). Lazy-force unmount prevents such hangs for our Python code and also frees the mount point. This commit also adds code to gather session info for kernel mounts after mounting is successful. This is a necessity since network address of session is needed to check if it is blocked by Ceph cluster. Fixes: https://tracker.ceph.com/issues/56476 Signed-off-by: Rishabh Dave <ridave@redhat.com>
737f978 to
c279b47
Compare
|
In last Fixed now. |
|
jenkins test api |
|
jenkins test make check arm64 |
|
jenkins test windows |
|
QA was successful - https://tracker.ceph.com/projects/cephfs/wiki/Main#2022-Aug-26. The QA job that this PR fixes didn't got executed during QA run, so I ran it myself. The job ran successfully - http://pulpito.front.sepia.ceph.com/rishabh-2022-08-26_12:11:39-fs-wip-rishabh-testing-2022Aug19-distro-default-smithi/detail. Waiting on CI now. |
|
jenkins test api |
|
jenkins test make check arm64 |
|
jenkins test windows |
Requested changes were added.
Related PR - PR #45036
4-compat_client.yaml in creates two clients and evicts one of them. The
evicted client is not cleaned up later, that is it's left unmounted and
the mount point is left undeleted. This doesn't cause failure during
final teardown for main branch but with PR #45036 it does lead to
failure every time.
PR #45036 changes the fact that CephFS code in directory "qa" depends on
value of attribute "is_mounted" to check if a CephFS has been unmounted
or not. Instead, it runs "findmnt" command to check if the client was
actually unmounted.
Operating on a CephFS mountpoint after the client has been evicted
causes the operation to hang. Thus with PR #45036 the final teardown
for teuthology job fails every time.
Fixes: https://tracker.ceph.com/issues/56476
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "pacific"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows