Skip to content

squid: mds: fix rank 0 marked damaged if stopping fails after Elid flush.#65823

Open
joscollin wants to merge 1 commit intoceph:squidfrom
joscollin:wip-73301-squid
Open

squid: mds: fix rank 0 marked damaged if stopping fails after Elid flush.#65823
joscollin wants to merge 1 commit intoceph:squidfrom
joscollin:wip-73301-squid

Conversation

@joscollin
Copy link
Member

backport tracker: https://tracker.ceph.com/issues/73301


backport of #65483
parent tracker: https://tracker.ceph.com/issues/72983

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

… log trimmed

steps to reproduce
 ../src/vstart.sh --debug --new -x --localhost --bluestore
 ./bin/ceph tell mds.<rank 0> config set mds_kill_shutdown_at 10
 ./bin/ceph fs set <fs name> down true

wait for a few seconds and will see the following log from take-over mds
and rank 0 is marked damaged
2025-09-11T16:47:24.591+0800 785dabeaa6c0 -1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank!
2025-09-11T16:47:24.591+0800 785dabeaa6c0 5 mds.beacon.b set_want_state: up:rejoin -> down:damaged

During shutdown_pass after submitting Elid and trimming mdlog, mds log
will now have only ELid event which does nothing at replay.
After replay, no subtree is found.

Fix this by checking whther MDLog contains only one event.
If so, skip the subtree check for rank 0, and allow it to request
STATE_STOPPED just like the other ranks.

Fixes: https://tracker.ceph.com/issues/72983
Signed-off-by: ethanwu <ethanwu@synology.com>
(cherry picked from commit adb448b)
@joscollin joscollin added this to the squid milestone Oct 8, 2025
@joscollin joscollin added the cephfs Ceph File System label Oct 8, 2025
@joscollin
Copy link
Member Author

jenkins test make check

@joscollin
Copy link
Member Author

This PR is under test in https://tracker.ceph.com/issues/73560.

@batrick batrick modified the milestones: squid, v19.2.4 Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants