reef: mon,cephfs: require confirmation flag to bring down unhealthy MDS#57837
Merged
reef: mon,cephfs: require confirmation flag to bring down unhealthy MDS#57837
Conversation
When running the command "ceph mds fail" for an MDS that is unhealthy due to, MDS_CACHE_OVERSIZED or MDS_TRIM, user must pass confirmation flag. Else, the command will fail and print an appropriate error message. Restarting an MDS with such health warnings is not recommended since it will have a slow reocvery during restart which will create new problems. Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit eeda00e)
Update docs since command "ceph mds fail" will now fail if MDS has either health warning MDS_TRIM or MDS_CACHE_OVERSIZED and if confirmation flag is not passed. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit dea2220)
Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit f241a3c)
Since the command "ceph mds fail" now may require confirmation flag
("--yes-i-really-mean-it"), update this method to allow/disallow adding
this flag to the command arguments.
Signed-off-by: Rishabh Dave <ridave@redhat.com>
(cherry picked from commit 4f333e1)
anthonyeleven
approved these changes
Jun 3, 2024
Confirmation flag must be passed when running the command "ceph fs fail" when the MDS for this FS has either of the two health warnings: MDS_TRIM or MDS_CACHE_OVERSIZED. Else, the command will fail and print an appropriate error message. Restarting an MDS with these health warnings is not recommened since it will have a slow recovery during restart which will create new problems. Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit b901616) Conflicts: - src/mon/FSCommands.cc - lines surrounding the patch are different in reef compared to main. the reef code was still accessing "mds_map" directly instead of accessing it using "get_mds_map()". - return value of get_filesystem() is different in main.
Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit de18c5a)
Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 2481642)
Since "ceph fs fail" command now requires the confirmation flag when Ceph cluster has either health warning MDS_TRIM or MDS_CACHE_OVERSIZE, update tear down in QA code. During the teardown, the CephFS should be failed, regardless of whether or not Ceph cluster has health warnings, since it is teardown. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit a1af1bf)
Add tests to verify that the confirmation flag is mandatory for running commands "ceph mds fail" and "ceph fs fail" when MDS has one of the two health warnings: MDS_CACHE_OVERSIZE or MDS_TRIM. Also, add MDS_CACHE_OVERSIZE and MDS_TRIM to ignorelist for test_admin.py so that QA jobs knows this an expected failure. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 214d614)
This issue was not caught in original QA run because "ceph mds fail" returns 0 even though MDS name received by it in argument is non-existent. This is done for the sake of idempotency, however it caused this bug to go uncaught. Fixea: https://tracker.ceph.com/issues/65864 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ab643f7)
After running TestFSFail, CephFSTestCase.tearDown() fails attempting to unmount CephFS. Set joinable on FS and wait for the MDS to be up before exiting the test. This will ensure that unmounting is successful in teardown. Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit faa30e0)
a469f86 to
4b158b5
Compare
Contributor
Author
|
@leonid-s-usov make check passed. |
Contributor
Author
ping |
joscollin
approved these changes
Jun 17, 2024
Member
|
Tested in https://tracker.ceph.com/issues/66468 |
Member
|
@rishabh-d-dave This PR caused the qa test failure, it seems you didn't backport the dependency commit: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BACKPORTS 3 related trackes together
backport tracker: https://tracker.ceph.com/issues/65927
backport tracker https://tracker.ceph.com/issues/66198
backport tracker https://tracker.ceph.com/issues/66409
this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh
updated using ceph-backport.sh version 16.0.0.6848