squid: mds: fix bug and error handling of mds STATE_STARTING#58394
Merged
lxbsz merged 3 commits intoceph:squidfrom Jul 22, 2024
Merged
squid: mds: fix bug and error handling of mds STATE_STARTING#58394lxbsz merged 3 commits intoceph:squidfrom
lxbsz merged 3 commits intoceph:squidfrom
Conversation
Member
Author
|
jenkins test make check |
…TARTING Just like STATE_CREATING, mds could fail or being stopped any where at STATE_STARTING state, so make sure subsequent take-over mds will start from STATE_STARTING. Otherwise, we'll end up with empty journal(No ESubtreeMap). The subsequent take-over mds will fail with no subtrees found and rank will be marked damaged. Quick way to reproduce this: ./bin/ceph fs set a down true # take down all rank in filesystem a #wait for fs to stop all rank ./bin/ceph fs set a down true; pidof ceph-mds | xargs kill # quickly kill all mds soon after they enter starting state ./bin/ceph-mds -i a -c ./ceph.conf # start all mds. Then we'll find out that mds rank is reported damaged with following log -1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank! 5 mds.beacon.a set_want_state: up:rejoin -> down:damaged Fixes: https://tracker.ceph.com/issues/65094 Signed-off-by: ethanwu <ethanwu@synology.com> (cherry picked from commit 767494d)
If we donn't flush mds log before requesting STATE_ACTIVE, and mds happens to stop later before the log reaches journal. The take-over mds will have no SubtreeMap to replay, and fail later at non-empty subtree check. Fixes: https://tracker.ceph.com/issues/65094 Signed-off-by: ethanwu <ethanwu@synology.com> (cherry picked from commit ee5472e)
…starting
Root ino belongs to subtree of root rank, and should be inserted when creating
subtree map log. This is missing when mds runs at STATE_STARTING, however.
When doing replay, all inode under this subtree will be trimmed by
trim_non_auth_subtree and cause replay failure.
Quick way to reproduce this:
After creating filesystem, mount it and create some directory.
mkdir -p ${cephfs_root}/dir1/dir11/foo
mkdir -p ${cephfs_root}/dir1/dir11/bar
unmount cephfs
./bin/ceph fs set a down true
./bin/ceph fs set a down false
./bin/cephfs-journal-tool --rank=a:0 event get json --path output # Can see that ESubtreeMap only contains 0x100 but no 0x1
mount cephfs
rmdir ${cephfs_root}/dir1/dir11/foo
rmdir ${cephfs_root}/dir1/dir11/bar
unmount cephfs
trigger mds rank 0 failover, and you can find rank 0 fails during replay and is marked damaged
Check mds log will find the following related message:
-49> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.cache trim_non_auth_subtree(0x560372b2df80) [dir 0x1 / [2,head] auth v=12 cv=0/0 dir_auth=-2 state=1073741824 f(v0 m2024-03-24T18:03:30.350260+0800 1=0+1) n(v3 rc2024-03-24T18:03:30.401819+0800 4=0+4) hs=1+0,ss=0+0 | child=1 subtree=1 0x560372b2df80]
-27> 2024-03-24T18:06:19.461+0800 7f1542cbf700 14 mds.0.cache remove_inode [inode 0x10000000000 [...2,head] #10000000000/ auth v10 f(v0 m2024-03-24T18:03:30.378677+0800 1=0+1) n(v1 rc2024-03-24T18:03:30.401819+0800 4=0+4) (iversion lock) 0x560372c52100]
-21> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.log _replay 4216759~3161 / 4226491 2024-03-24T18:05:16.515314+0800: EUpdate unlink_local [metablob 0x10000000000, 4 dirs] -20> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EUpdate::replay
-19> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay 4 dirlumps by unknown.0
-18> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay don't have renamed ino 0x10000000003
-17> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay found null dentry in dir 0x10000000001
-16> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay dir 0x10000000000
-15> 2024-03-24T18:06:19.461+0800 7f1542cbf700 0 mds.0.journal EMetaBlob.replay missing dir ino 0x10000000000
-14> 2024-03-24T18:06:19.461+0800 7f1542cbf700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob)
-13> 2024-03-24T18:06:19.461+0800 7f1542cbf700 5 mds.beacon.c set_want_state: up:replay -> down:damaged
The way to fix this is refering to how mdsdir inode is handled when MDS enter STARTING.
Fixes: https://tracker.ceph.com/issues/65094
Signed-off-by: ethanwu <ethanwu@synology.com>
(cherry picked from commit 463c3b7)
8bd95dd to
eee6fb0
Compare
Member
Author
|
jenkins test windows |
Member
Author
|
This PR is under test in https://tracker.ceph.com/issues/66906. |
lxbsz
approved these changes
Jul 22, 2024
Member
lxbsz
left a comment
There was a problem hiding this comment.
None of the failure is related to this backport PR:
More detail please see section 2024-07-17 in https://tracker.ceph.com/projects/cephfs/wiki/Squid.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: https://tracker.ceph.com/issues/66774
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e