Skip to content

mon: fix fs set down to adjust max_mds only when cluster is not down#58582

Merged
batrick merged 2 commits intoceph:mainfrom
littlebees:fix-set-down-false
Aug 6, 2024
Merged

mon: fix fs set down to adjust max_mds only when cluster is not down#58582
batrick merged 2 commits intoceph:mainfrom
littlebees:fix-set-down-false

Conversation

@littlebees
Copy link

@littlebees littlebees commented Jul 15, 2024

Applying fs set down false on an up cluster will set the cluster's max_mds to 1, regardless of the cluster's current max_mds.

fs set down false should only change max_mds when the cluster was set to down. Otherwise, the cluster should remain unchanged.

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@littlebees littlebees requested a review from a team as a code owner July 15, 2024 02:35
@github-actions github-actions bot added cephfs Ceph File System core mon labels Jul 15, 2024
@littlebees littlebees force-pushed the fix-set-down-false branch 2 times, most recently from 91e42fd to 57709f6 Compare July 16, 2024 01:37
@vshankar vshankar requested a review from a team July 23, 2024 12:51

ss << fsp->get_mds_map().get_fs_name();

if (!is_down && fsp->get_mds_map().get_old_max_mds() == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!is_down && fsp->get_mds_map().get_old_max_mds() == 0) {
if (!is_down && fsp->get_mds_map().get_max_mds() > 0) {

would work, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@dparmar18
Copy link
Contributor

This looks be to intentional, cephfs-admin-docs say:

This will also restore the previous value of max_mds. MDS daemons are brought down in a way such that journals are flushed to the metadata pool and all client I/O is stopped.

If this behaviour is to be changed then the docs would also need to be updated. 4e9ffff brought in this documentation.

@dparmar18
Copy link
Contributor

dparmar18 commented Jul 24, 2024

So, when a CephFS cluster is taken down using ceph fs set <fs_name> down true, it does this:

ceph/src/mon/FSCommands.cc

Lines 604 to 608 in 5969c11

if (is_down) {
if (fs.get_mds_map().get_max_mds() > 0) {
fs.get_mds_map().set_old_max_mds();
fs.get_mds_map().set_max_mds(0);
} /* else already down! */

It first records the old max_mds val using:

void set_old_max_mds() { old_max_mds = max_mds; }

and then max_mds is set to 0. Which means first the max_mds is preserved and then reset to 0. Looks correct.

So, now when ceph fs set <fs_name> down false is called:

ceph/src/mon/FSCommands.cc

Lines 609 to 612 in 5969c11

} else {
mds_rank_t oldmax = fs.get_mds_map().get_old_max_mds();
fs.get_mds_map().set_max_mds(oldmax ? oldmax : 1);
}

old max_mds val is fetched; if it is 0 i.e. the old max_mds was 0 (huh?), it sets the new max_mds to 1, maybe it could be possible that the max_mds was always 1 OR some cmd was ran between ceph fs set <fs_name> down true and ceph fs set <fs_name> down false which reset the max_mds to 1?

It looks to me that this functionality is correct, it doesn't always set the max_mds to 1 but follows what the last recorded value is and sets it.

EDIT: Please discard this, the issue being here is when the ceph fs set <fs> down false is executed without ceph fs set a down true which makes fs.get_mds_map().set_max_mds(oldmax ? oldmax : 1); set max_mds to 1 since on a fresh cluster that never ran ceph fs set a down true will have mds_rank_t old_max_mds = 0;

@dparmar18
Copy link
Contributor

dparmar18 commented Jul 24, 2024

So, when a CephFS cluster is taken down using ceph fs set <fs_name> down true, it does this:
...
...
...
It looks to me that this functionality is correct, it doesn't always set the max_mds to 1 but follows what the last recorded value is and sets it.

@batrick any thoughts?

Copy link
Contributor

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't reproduce the bug this PR is trying to fix, it doesn't exist. ceph fs set a down true successfully brought down the 2 active MDSs FS had and ceph fs set a down false successfully brought FS up with same number of active MDSs.

EDIT
@littlebees Was this bug seen in one of the release branches?

EDIT 2
The bug is reproducible on a new cluster when ceph fs set a down false is run without running ceph fs set a down true in the past.

@rishabh-d-dave
Copy link
Contributor

It looks to me that this functionality is correct, it doesn't always set the max_mds to 1 but follows what the last recorded value is and sets it.

Ugh! Seeing this Dhairya's comment now since the page was refreshed while posting my review.

Agreed +1.

@dparmar18 @batrick

@vshankar
Copy link
Contributor

This looks be to intentional, cephfs-admin-docs say:

This will also restore the previous value of max_mds. MDS daemons are brought down in a way such that journals are flushed to the metadata pool and all client I/O is stopped.

If this behaviour is to be changed then the docs would also need to be updated. 4e9ffff brought in this documentation.

I think the bug that this change is fixing is when fs set <> down false when the file system is online is causing max_mds to get reset to one (1).

@dparmar18
Copy link
Contributor

dparmar18 commented Jul 25, 2024

This looks be to intentional, cephfs-admin-docs say:

This will also restore the previous value of max_mds. MDS daemons are brought down in a way such that journals are flushed to the metadata pool and all client I/O is stopped.

If this behaviour is to be changed then the docs would also need to be updated. 4e9ffff brought in this documentation.

I think the bug that this change is fixing is when fs set <> down false when the file system is online is causing max_mds to get reset to one (1).

The code never resets to one unless the last recorded max_mds is 0. check #58582 (comment)

@dparmar18
Copy link
Contributor

This looks be to intentional, cephfs-admin-docs say:

This will also restore the previous value of max_mds. MDS daemons are brought down in a way such that journals are flushed to the metadata pool and all client I/O is stopped.

If this behaviour is to be changed then the docs would also need to be updated. 4e9ffff brought in this documentation.

I think the bug that this change is fixing is when fs set <> down false when the file system is online is causing max_mds to get reset to one (1).

My bad, I understand this now. You're right.

@dparmar18
Copy link
Contributor

dparmar18 commented Jul 25, 2024

this can be easily reproduced, better have a test case included as well.

$ ./bin/ceph fs set a max_mds 2
$ ./bin/ceph fs get a | grep "max_mds"
max_mds	2
$ ./bin/ceph fs set a down false
a marked up, max_mds = 1
$ ./bin/ceph fs get a | grep "max_mds"
max_mds	1

@rishabh-d-dave rishabh-d-dave self-requested a review July 25, 2024 12:53
@littlebees littlebees force-pushed the fix-set-down-false branch from 57709f6 to 860153b Compare July 25, 2024 12:56
Copy link
Contributor

@dparmar18 dparmar18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be easily reproduced, it'd be great to have a test case.

@littlebees littlebees force-pushed the fix-set-down-false branch from 860153b to 257d6a5 Compare July 25, 2024 13:52
@github-actions github-actions bot added the tests label Jul 25, 2024
@littlebees littlebees requested a review from batrick July 25, 2024 14:02
@rishabh-d-dave
Copy link
Contributor

jenkins test api

@rishabh-d-dave
Copy link
Contributor

jenkins test make check

Copy link
Member

@batrick batrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I disagree with the error message change. We shouldn't suggest the operator bring the fs down just so that this command will succeed. That doesn't make sense.

Applying `fs set down false` on an up cluster will set the cluster's max_mds to 1, regardless of the cluster's current max_mds.
`fs set down false` should only change max_mds when the cluster was set to down. Otherwise, the cluster should remain unchanged.

Fixes: https://tracker.ceph.com/issues/66960
Signed-off-by: chungfengz <chungfengz@synology.com>
Fixes: https://tracker.ceph.com/issues/66960
Signed-off-by: chungfengz <chungfengz@synology.com>
@littlebees littlebees force-pushed the fix-set-down-false branch from 257d6a5 to f8becaa Compare July 26, 2024 14:41
@batrick
Copy link
Member

batrick commented Jul 26, 2024

This PR is under test in https://tracker.ceph.com/issues/67214.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants