quincy: mon,cephfs: require confirmation flag to bring down unhealthy MDS by rishabh-d-dave · Pull Request #57841 · ceph/ceph

rishabh-d-dave · 2024-06-03T13:53:05Z

BACKPORTS 3 related trackes together

backport tracker: https://tracker.ceph.com/issues/65928
- backport of mon,cephfs: require confirmation flag to bring down unhealthy MDS #56066
- parent tracker: https://tracker.ceph.com/issues/61866
backport tracker https://tracker.ceph.com/issues/66199
- parent tracker https://tracker.ceph.com/issues/65864
- PR qa/cephfs: pass MDS name, not FS name, to "ceph mds fail" cmd #57493
backport tracker https://tracker.ceph.com/issues/66410
- parent tracker https://tracker.ceph.com/issues/65841
- PR qa/cephfs: set joinable on FS before exiting tests in TestFSFail #57333

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/master/src/script/ceph-backport.sh

When running the command "ceph mds fail" for an MDS that is unhealthy due to, MDS_CACHE_OVERSIZED or MDS_TRIM, user must pass confirmation flag. Else, the command will fail and print an appropriate error message. Restarting an MDS with such health warnings is not recommended since it will have a slow reocvery during restart which will create new problems. Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit eeda00e)

Update docs since command "ceph mds fail" will now fail if MDS has either health warning MDS_TRIM or MDS_CACHE_OVERSIZED and if confirmation flag is not passed. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit dea2220)

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit f241a3c)

Since the command "ceph mds fail" now may require confirmation flag ("--yes-i-really-mean-it"), update this method to allow/disallow adding this flag to the command arguments. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 4f333e1)

Confirmation flag must be passed when running the command "ceph fs fail" when the MDS for this FS has either of the two health warnings: MDS_TRIM or MDS_CACHE_OVERSIZED. Else, the command will fail and print an appropriate error message. Restarting an MDS with these health warnings is not recommened since it will have a slow recovery during restart which will create new problems. Fixes: https://tracker.ceph.com/issues/61866 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit b901616) Conflicts: - src/mon/FSCommands.cc Lines around the patch are different in quincy compared to main branch. "get_mds_map()" is not available in quincy branch, unlike main.

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit de18c5a)

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 2481642)

Since "ceph fs fail" command now requires the confirmation flag when Ceph cluster has either health warning MDS_TRIM or MDS_CACHE_OVERSIZE, update tear down in QA code. During the teardown, the CephFS should be failed, regardless of whether or not Ceph cluster has health warnings, since it is teardown. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit a1af1bf)

Add tests to verify that the confirmation flag is mandatory for running commands "ceph mds fail" and "ceph fs fail" when MDS has one of the two health warnings: MDS_CACHE_OVERSIZE or MDS_TRIM. Also, add MDS_CACHE_OVERSIZE and MDS_TRIM to ignorelist for test_admin.py so that QA jobs knows this an expected failure. Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 214d614)

This issue was not caught in original QA run because "ceph mds fail" returns 0 even though MDS name received by it in argument is non-existent. This is done for the sake of idempotency, however it caused this bug to go uncaught. Fixea: https://tracker.ceph.com/issues/65864 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit ab643f7)

After running TestFSFail, CephFSTestCase.tearDown() fails attempting to unmount CephFS. Set joinable on FS and wait for the MDS to be up before exiting the test. This will ensure that unmounting is successful in teardown. Fixes: https://tracker.ceph.com/issues/65841 Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit faa30e0)

joscollin · 2024-06-21T06:08:42Z

jenkins test windows

joscollin · 2024-06-21T06:24:35Z

This PR is under test in https://tracker.ceph.com/issues/66597.

rishabh-d-dave

Found related failure - https://pulpito.ceph.com/xiubli-2024-08-01_12:01:33-fs-wip-xiubli-testing-20240801.015240-quincy-distro-default-smithi/7830262

rzarzynski

The changes in src/mon/MonCommands.h looks good from the core's standpoint but leaving the approval to the @ceph/cephfs.

vshankar · 2024-08-20T04:03:27Z

Found related failure - https://pulpito.ceph.com/xiubli-2024-08-01_12:01:33-fs-wip-xiubli-testing-20240801.015240-quincy-distro-default-smithi/7830262

I recall that there is a tracker for this, yes? Please link it here @rishabh-d-dave

github-actions · 2024-11-26T13:08:36Z

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

vshankar

https://tracker.ceph.com/projects/cephfs/wiki/Quincy#wip-jcollin-testing-20240812053224-quincy

rishabh-d-dave added 4 commits June 3, 2024 19:19

PendingReleaseNotes: note need of confirmation for "ceph mds fail"

eae9e23

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit f241a3c)

rishabh-d-dave requested review from a team as code owners June 3, 2024 13:53

rishabh-d-dave added this to the quincy milestone Jun 3, 2024

rishabh-d-dave added the core label Jun 3, 2024

github-actions bot added cephfs Ceph File System documentation mon tests labels Jun 3, 2024

anthonyeleven approved these changes Jun 3, 2024

View reviewed changes

rishabh-d-dave added 4 commits June 12, 2024 16:16

doc/cephfs: mention need of confirmation for "ceph fs fail"

7d1800c

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit de18c5a)

PendingReleaseNotes: note need of confirmation for "ceph fs fail"

b57856f

Signed-off-by: Rishabh Dave <ridave@redhat.com> (cherry picked from commit 2481642)

rishabh-d-dave force-pushed the wip-65928-quincy branch from 76baeb0 to f8e910b Compare June 12, 2024 10:47

rishabh-d-dave added 3 commits June 12, 2024 18:13

rishabh-d-dave force-pushed the wip-65928-quincy branch from f8e910b to 973ff56 Compare June 12, 2024 12:43

joscollin requested a review from a team June 20, 2024 07:49

joscollin added the wip-jcollin-testing-quincy2 label Jun 21, 2024

lxbsz added wip-lxb-testing-quincy2 and removed wip-jcollin-testing-quincy2 labels Jul 30, 2024

rishabh-d-dave commented Aug 7, 2024

View reviewed changes

ljflores requested a review from vshankar August 19, 2024 16:13

ljflores requested review from batrick and gregsfortytwo August 19, 2024 16:13

ljflores added the needs-review label Aug 19, 2024

rzarzynski reviewed Aug 19, 2024

View reviewed changes

github-actions bot added the stale label Nov 26, 2024

vshankar removed the stale label Nov 26, 2024

vshankar approved these changes Nov 28, 2024

View reviewed changes

vshankar merged commit 2d9a4d6 into ceph:quincy Nov 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quincy: mon,cephfs: require confirmation flag to bring down unhealthy MDS#57841

quincy: mon,cephfs: require confirmation flag to bring down unhealthy MDS#57841
vshankar merged 11 commits intoceph:quincyfrom
rishabh-d-dave:wip-65928-quincy

rishabh-d-dave commented Jun 3, 2024 •

edited

Loading

Uh oh!

joscollin commented Jun 21, 2024

Uh oh!

joscollin commented Jun 21, 2024

Uh oh!

rishabh-d-dave left a comment

Uh oh!

rzarzynski left a comment

Uh oh!

vshankar commented Aug 20, 2024

Uh oh!

github-actions bot commented Nov 26, 2024

Uh oh!

vshankar left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

rishabh-d-dave commented Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joscollin commented Jun 21, 2024

Uh oh!

joscollin commented Jun 21, 2024

Uh oh!

rishabh-d-dave left a comment

Choose a reason for hiding this comment

Uh oh!

rzarzynski left a comment

Choose a reason for hiding this comment

Uh oh!

vshankar commented Aug 20, 2024

Uh oh!

github-actions bot commented Nov 26, 2024

Uh oh!

vshankar left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

rishabh-d-dave commented Jun 3, 2024 •

edited

Loading