squid: mds: fix bug and error handling of mds STATE_STARTING by joscollin · Pull Request #58394 · ceph/ceph

joscollin · 2024-07-02T10:15:20Z

Fixes: https://tracker.ceph.com/issues/66774

Contribution Guidelines

To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

Tracker (select at least one)
- References tracker ticket
- Very recent bug; references commit where it was introduced
- New feature (ticket optional)
- Doc update (no ticket needed)
- Code cleanup (no ticket needed)
Component impact
- Affects Dashboard, opened tracker ticket
- Affects Orchestrator, opened tracker ticket
- No impact that needs to be tracked
Documentation (select at least one)
- Updates relevant documentation
- No doc update is appropriate
Tests (select at least one)
- Includes unit test(s)
- Includes integration test(s)
- Includes bug reproducer
- No tests

Show available Jenkins commands

jenkins retest this please
jenkins test classic perf
jenkins test crimson perf
jenkins test signed
jenkins test make check
jenkins test make check arm64
jenkins test submodules
jenkins test dashboard
jenkins test dashboard cephadm
jenkins test api
jenkins test docs
jenkins render docs
jenkins test ceph-volume all
jenkins test ceph-volume tox
jenkins test windows
jenkins test rook e2e

joscollin · 2024-07-04T02:01:19Z

jenkins test make check

…TARTING Just like STATE_CREATING, mds could fail or being stopped any where at STATE_STARTING state, so make sure subsequent take-over mds will start from STATE_STARTING. Otherwise, we'll end up with empty journal(No ESubtreeMap). The subsequent take-over mds will fail with no subtrees found and rank will be marked damaged. Quick way to reproduce this: ./bin/ceph fs set a down true # take down all rank in filesystem a #wait for fs to stop all rank ./bin/ceph fs set a down true; pidof ceph-mds | xargs kill # quickly kill all mds soon after they enter starting state ./bin/ceph-mds -i a -c ./ceph.conf # start all mds. Then we'll find out that mds rank is reported damaged with following log -1 log_channel(cluster) log [ERR] : No subtrees found for root MDS rank! 5 mds.beacon.a set_want_state: up:rejoin -> down:damaged Fixes: https://tracker.ceph.com/issues/65094 Signed-off-by: ethanwu <ethanwu@synology.com> (cherry picked from commit 767494d)

If we donn't flush mds log before requesting STATE_ACTIVE, and mds happens to stop later before the log reaches journal. The take-over mds will have no SubtreeMap to replay, and fail later at non-empty subtree check. Fixes: https://tracker.ceph.com/issues/65094 Signed-off-by: ethanwu <ethanwu@synology.com> (cherry picked from commit ee5472e)

…starting Root ino belongs to subtree of root rank, and should be inserted when creating subtree map log. This is missing when mds runs at STATE_STARTING, however. When doing replay, all inode under this subtree will be trimmed by trim_non_auth_subtree and cause replay failure. Quick way to reproduce this: After creating filesystem, mount it and create some directory. mkdir -p ${cephfs_root}/dir1/dir11/foo mkdir -p ${cephfs_root}/dir1/dir11/bar unmount cephfs ./bin/ceph fs set a down true ./bin/ceph fs set a down false ./bin/cephfs-journal-tool --rank=a:0 event get json --path output # Can see that ESubtreeMap only contains 0x100 but no 0x1 mount cephfs rmdir ${cephfs_root}/dir1/dir11/foo rmdir ${cephfs_root}/dir1/dir11/bar unmount cephfs trigger mds rank 0 failover, and you can find rank 0 fails during replay and is marked damaged Check mds log will find the following related message: -49> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.cache trim_non_auth_subtree(0x560372b2df80) [dir 0x1 / [2,head] auth v=12 cv=0/0 dir_auth=-2 state=1073741824 f(v0 m2024-03-24T18:03:30.350260+0800 1=0+1) n(v3 rc2024-03-24T18:03:30.401819+0800 4=0+4) hs=1+0,ss=0+0 | child=1 subtree=1 0x560372b2df80] -27> 2024-03-24T18:06:19.461+0800 7f1542cbf700 14 mds.0.cache remove_inode [inode 0x10000000000 [...2,head] #10000000000/ auth v10 f(v0 m2024-03-24T18:03:30.378677+0800 1=0+1) n(v1 rc2024-03-24T18:03:30.401819+0800 4=0+4) (iversion lock) 0x560372c52100] -21> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.log _replay 4216759~3161 / 4226491 2024-03-24T18:05:16.515314+0800: EUpdate unlink_local [metablob 0x10000000000, 4 dirs] -20> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EUpdate::replay -19> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay 4 dirlumps by unknown.0 -18> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay don't have renamed ino 0x10000000003 -17> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay found null dentry in dir 0x10000000001 -16> 2024-03-24T18:06:19.461+0800 7f1542cbf700 10 mds.0.journal EMetaBlob.replay dir 0x10000000000 -15> 2024-03-24T18:06:19.461+0800 7f1542cbf700 0 mds.0.journal EMetaBlob.replay missing dir ino 0x10000000000 -14> 2024-03-24T18:06:19.461+0800 7f1542cbf700 -1 log_channel(cluster) log [ERR] : failure replaying journal (EMetaBlob) -13> 2024-03-24T18:06:19.461+0800 7f1542cbf700 5 mds.beacon.c set_want_state: up:replay -> down:damaged The way to fix this is refering to how mdsdir inode is handled when MDS enter STARTING. Fixes: https://tracker.ceph.com/issues/65094 Signed-off-by: ethanwu <ethanwu@synology.com> (cherry picked from commit 463c3b7)

joscollin · 2024-07-09T09:37:01Z

jenkins test windows

joscollin · 2024-07-11T09:57:42Z

This PR is under test in https://tracker.ceph.com/issues/66906.

lxbsz

None of the failure is related to this backport PR:

More detail please see section 2024-07-17 in https://tracker.ceph.com/projects/cephfs/wiki/Squid.

github-actions bot added the cephfs Ceph File System label Jul 2, 2024

github-actions bot added this to the squid milestone Jul 2, 2024

ethanwu-syno added 3 commits July 4, 2024 17:48

joscollin force-pushed the wip-66774-squid branch from 8bd95dd to eee6fb0 Compare July 4, 2024 12:18

joscollin requested a review from a team July 10, 2024 08:15

joscollin added the wip-jcollin-testing-squid3 label Jul 11, 2024

lxbsz approved these changes Jul 22, 2024

View reviewed changes

lxbsz merged commit 820d144 into ceph:squid Jul 22, 2024

joscollin deleted the wip-66774-squid branch July 23, 2024 03:31

joscollin removed the wip-jcollin-testing-squid3 label Jul 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

squid: mds: fix bug and error handling of mds STATE_STARTING#58394

squid: mds: fix bug and error handling of mds STATE_STARTING#58394
lxbsz merged 3 commits intoceph:squidfrom
joscollin:wip-66774-squid

joscollin commented Jul 2, 2024 •

edited

Loading

Uh oh!

joscollin commented Jul 4, 2024

Uh oh!

joscollin commented Jul 9, 2024

Uh oh!

joscollin commented Jul 11, 2024

Uh oh!

lxbsz left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joscollin commented Jul 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution Guidelines

Checklist

Uh oh!

joscollin commented Jul 4, 2024

Uh oh!

joscollin commented Jul 9, 2024

Uh oh!

joscollin commented Jul 11, 2024

Uh oh!

lxbsz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joscollin commented Jul 2, 2024 •

edited

Loading