mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal by ethanwu-syno · Pull Request #65640 · ceph/ceph

ethanwu-syno · 2025-09-23T02:25:24Z

Fix bug where active MDS daemons in remaining filesystems incorrectly have their join_fscid cleared to FS_CLUSTER_ID_NONE when any other filesystem is removed.

The issue was caused by variable name shadowing in erase_filesystem() where the loop variable 'fscid' shadowed the function parameter 'fscid': Inside loop: if (info.join_fscid == fscid) compared against the loop variable (remaining FS ID) instead of parameter (removed FS ID)

Renamed the loop variable to 'remaining_fscid' to eliminate the shadowing and ensure the comparison uses the correct filesystem ID.

Reproducer:
../src/vstart.sh --new -x --localhost --bluestore
FS=b
./bin/ceph osd pool create cephfs.${FS}.meta 64 64 replicated
./bin/ceph osd pool create cephfs.${FS}.data 64 64 replicated
./bin/ceph fs new ${FS} cephfs.${FS}.meta cephfs.${FS}.data
./bin/ceph config set mds.a mds_join_fs a
./bin/ceph config set mds.b mds_join_fs a
./bin/ceph fs fail ${FS}
./bin/ceph fs rm ${FS} --yes-i-really-mean-it

Then from ./bin/ceph fs dump
We can see join_fscid in all active mds filesystem 'a' are reset. Since there are standby mds with join_fscid=1
MDSMonitor think they have better affinity and trigger switch over.

Fixes: https://tracker.ceph.com/issues/73183

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins test classic perf Jenkins Job | Jenkins Job Definition
jenkins test crimson perf Jenkins Job | Jenkins Job Definition
jenkins test signed Jenkins Job | Jenkins Job Definition
jenkins test make check Jenkins Job | Jenkins Job Definition
jenkins test make check arm64 Jenkins Job | Jenkins Job Definition
jenkins test submodules Jenkins Job | Jenkins Job Definition
jenkins test dashboard Jenkins Job | Jenkins Job Definition
jenkins test dashboard cephadm Jenkins Job | Jenkins Job Definition
jenkins test api Jenkins Job | Jenkins Job Definition
jenkins test docs ReadTheDocs | Github Workflow Definition
jenkins test ceph-volume all Jenkins Jobs | Jenkins Jobs Definition
jenkins test windows Jenkins Job | Jenkins Job Definition
jenkins test rook e2e Jenkins Job | Jenkins Job Definition

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

…ng filesystem removal Fix bug where active MDS daemons in remaining filesystems incorrectly have their join_fscid cleared to FS_CLUSTER_ID_NONE when any other filesystem is removed. The issue was caused by variable name shadowing in erase_filesystem() where the loop variable 'fscid' shadowed the function parameter 'fscid': Inside loop: if (info.join_fscid == fscid) compared against the loop variable (remaining FS ID) instead of parameter (removed FS ID) Renamed the loop variable to 'remaining_fscid' to eliminate the shadowing and ensure the comparison uses the correct filesystem ID. Reproducer: ../src/vstart.sh --new -x --localhost --bluestore FS=b ./bin/ceph osd pool create cephfs.${FS}.meta 64 64 replicated ./bin/ceph osd pool create cephfs.${FS}.data 64 64 replicated ./bin/ceph fs new ${FS} cephfs.${FS}.meta cephfs.${FS}.data ./bin/ceph config set mds.a mds_join_fs a ./bin/ceph config set mds.b mds_join_fs a ./bin/ceph fs fail ${FS} ./bin/ceph fs rm ${FS} --yes-i-really-mean-it Then from ./bin/ceph fs dump We can see join_fscid in all active mds filesystem 'a' are reset. Since there are standby mds with join_fscid=1 MDSMonitor think they have better affinity and trigger switch over. Fixes: https://tracker.ceph.com/issues/73183 Signed-off-by: ethanwu <ethanwu@synology.com>

batrick

Good catch!

rishabh-d-dave · 2025-09-29T18:22:02Z

This PR is under test in https://tracker.ceph.com/issues/73311.

rishabh-d-dave

QA run was successful - https://tracker.ceph.com/projects/cephfs/wiki/QA_main_2025#wip-rishabh-testing-20250929182116

github-actions · 2025-10-03T11:20:04Z

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

https://tracker.ceph.com/issues/73183

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/18220800786

github-actions bot added the cephfs Ceph File System label Sep 23, 2025

batrick approved these changes Sep 23, 2025

View reviewed changes

batrick added the needs-qa label Sep 23, 2025

rishabh-d-dave added the wip-rishabh-testing Rishabh's testing label label Sep 29, 2025

rishabh-d-dave approved these changes Oct 3, 2025

View reviewed changes

rishabh-d-dave merged commit f8aa413 into ceph:main Oct 3, 2025
23 of 26 checks passed

This was referenced Oct 4, 2025

tentacle: mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal #65777

Merged

squid: mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal #65822

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal#65640

mds/FSMap: fix join_fscid being incorrectly reset for active MDS during filesystem removal#65640
rishabh-d-dave merged 1 commit intoceph:mainfrom
ethanwu-syno:fix-join_fscid_reset_during_fs_removal

ethanwu-syno commented Sep 23, 2025 •

edited

Loading

Uh oh!

batrick left a comment

Uh oh!

rishabh-d-dave commented Sep 29, 2025

Uh oh!

rishabh-d-dave left a comment •

edited

Loading

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ethanwu-syno commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

batrick left a comment

Choose a reason for hiding this comment

Uh oh!

rishabh-d-dave commented Sep 29, 2025

Uh oh!

rishabh-d-dave left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ethanwu-syno commented Sep 23, 2025 •

edited

Loading

rishabh-d-dave left a comment •

edited

Loading