mds: for an irreparable hard link scrub is not able to identify the damage#61173
mds: for an irreparable hard link scrub is not able to identify the damage#61173
Conversation
b960918 to
f8348ed
Compare
|
jenkins test make check |
|
Can you please explain (in the commit message) how you're cleaning up these remote dentries in scrub? How can you validate the link validity if it's linked to an inode where the auth is a remote server? |
@gregsfortytwo Commit message is updated. Mostly followed the code already written in mds by understanding it from high level(not much experience with cephFS). Feel free to share any insights you found. |
|
@gregsfortytwo Are you reviewing it? |
So, the issue if I understand it is that once the primary inode is gone (corrupted, gone, etc..), scrubbing the secondary inode isn't detecting the damage, right? If that's the case, there are two options: a) scrub the tree where the primary inode resides or b) when scrub encounters a remote link, try to open the link which would detect the faulty inode and mark the inode as damaged. What this change implements, is option (b), right? |
Yes, but if repair flag is on, it will just clean up the dentry and cleanup damage(This part I am not absolutely confident about my code, I just followed the server's code of removing remote link). Otherwise, everytime user stat or hit that inode it will create a damage message. |
[cc @mchangir who has worked on cephfs scrub lately] From the commit message of this change:
I'm not 100% what's being done here. From my understanding, scrub code can (try to) lookup the remote inode for any damage and add it to the damage table. To repair a damage, the scrub command should be forwarded to the inode auth mds. |
As for this dentry, it is not pointing to any inode, more specifically |
That's because it's a A remote inode will have
This is the part which is not 100% clear to me. Why do we want to remove the omap entry? Is that the way repair should be done? You are loosing the linkage altogether by doing that.
Yes, I did. The steps detail that once a scrub is done, the damage list is not populated. If the scrub is done on a directory where the secondary link is present and the primary link is in different directory, then , as you pointed out, scrub does not do anything. But if scrub is run of root, then the damage list should be populated. So, the question is, if scrub is run on a secondary hardlink directory with repair flag, what should be done? Should it repair the damaged (primary) inode which is in another directory? Scrub should at least identify the damage IMO. cc @mchangir |
@vshankar thanks for explaining. Even we scrub the primary inode, it will not recover (As from the test case I deleted the object, let me know if I am missing something). Then how should user get out of this loop. Because everytime user will stat that hardlink(which they assume it exists), it will create damage. Let's say we identify the damage, but what should be the way to get rid of this damage. That's why I was talking about removing the dentry from the dentry list of it's parent directory. (I need to study a bit). If there is any manual way of getting out of this damage, can you share? |
There are damages that scrub can fix and there are some which it cannot. When you remove the inode itself, the only way to recover is to get back the inode by scanning the data pool by reading backtraces and creating the dentry linkages. This is performed using
Right. So, its either via scrub (w/ repair flag), or if the damage ins't fixable by scrub, then recover the metadata using the repair tools. |
So in your opinion, we shouldn't cleanup/repair it this way right? Then I will remove the repair part and keep the identification part. |
Also do you think |
That does a path-walk on the MDS and if the inode is missing, then are bets are off... EDIT: s/are/all |
Speaking on behalf of the affected user. We know where the original inode was, and that directory has been deleted, presumably by a script. So we know that the hardlink is truly dangling. I personally don't care about the exact way that we should follow to clear the damage. I.e., so that "cat log.run.1" says "No such file or directory", not "I/O error". It can be rm. It can be some other tool or procedure. But this procedure should exist. If this helps, I would advise @sajibreadd to split this MR into two:
|
|
@vshankar Should it repair the damaged (primary) inode which is in another directory(if possible with repair flag)? Should we do that or keep just the identification part in this PR? |
In my opinion, scrub should not attempt to repair things from secondary access paths. If at all, only the path to the primary path should be dumped. But if the inode itself is missing then we won't have the backtrace to the original path either. So, Venky's suggestion to |
So, I will just keep the identification part that means after scrub user will get these broken links in the damage list. |
|
@vshankar @mchangir @kotreshhr In general I have a question, You can see there is a function if pool == -1 then why we are only searching in the default pool, why not other fs associated pools or the pool set as the xattr? For example, if for that specific file ceph.layout.pool attribute is explicitly set to a different pool, then the default pool(e.g. |
The default data pool will always have the 0th object which has the file backtrace and layout even if custom data pool layout is set. |
|
@sajibreadd ping? |
|
This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved |
|
@vshankar I added the test following the reproducer. Regarding notes in |
|
jenkins test api |
|
@vshankar any thoughts? |
OK. Let's start by breaking the change into different commits - one for the fix itself, one for the test addition and another for the doc changes ( @mchangir PTAL when this is ready. |
|
This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days. |
qa/tasks/cephfs/test_scrub_checks.py
Outdated
| def __str__(self): | ||
| return "Admin socket: {command} failed with rc={rc} json output={json}, because '{es}'".format( | ||
| command=self.command, rc=self.rc, json=self.json, es=self.errstring) | ||
|
|
There was a problem hiding this comment.
unrelated change. please revert this hunk.
doc/cephfs/scrub.rst
Outdated
| If scrub is able to repair the damage, the corresponding entry is automatically | ||
| removed from the damage table. | ||
|
|
||
| Note: Scrub with repair flag can idetify the hard link damage but can not repair it. |
There was a problem hiding this comment.
nit: scrub invoked with repair option can identify an damaged hard link but not repair it.
qa/tasks/cephfs/test_scrub_checks.py
Outdated
| rados_obj_file = "{ino:x}.00000000".format(ino=file_ino) | ||
| rados_obj_dir = "{ino:x}.00000000".format(ino=dir_ino) | ||
| self.fs.flush() | ||
| time.sleep(20) |
qa/tasks/cephfs/test_scrub_checks.py
Outdated
| self.fs.rados(["rm", rados_obj_file], pool=self.fs.get_data_pool_name()) | ||
| self.fs.rados(["rm", rados_obj_dir], pool=self.fs.get_metadata_pool_name()) | ||
| self.fs.fail() | ||
| time.sleep(10) |
There was a problem hiding this comment.
Why sleep? I you need to verify that the MDSs are no longer active, then there are constructs in class Filesystem that have helpers for this purpose.
qa/tasks/cephfs/test_scrub_checks.py
Outdated
| test_dir_path = "test_dir" | ||
| self.mount_a.run_shell(["mkdir", test_dir_path]) | ||
|
|
||
| file_dir_path = test_dir_path + "/file_dir" |
qa/tasks/cephfs/test_scrub_checks.py
Outdated
|
|
||
| file_dir_path = test_dir_path + "/file_dir" | ||
| self.mount_a.run_shell(["mkdir", file_dir_path]) | ||
| file_path = file_dir_path + "/test_file.txt" |
qa/tasks/cephfs/test_scrub_checks.py
Outdated
| self.fs.rados(["rm", rados_obj_dir], pool=self.fs.get_metadata_pool_name()) | ||
| self.fs.fail() | ||
| time.sleep(10) | ||
| self.fs.set_joinable() |
There was a problem hiding this comment.
All this is doing is to startup the MDS with an empty cache. Why not failover the MDS in that case?
src/mds/ScrubStack.cc
Outdated
| dout(4) << "scrub: remote dentry points to damaged ino " << *dn << dendl; | ||
| return; | ||
| } | ||
| mdcache->open_remote_dentry(dn, true, nullptr); |
There was a problem hiding this comment.
I revisited this today after a long delay. Sorry for that!
Doesn't passing nullptr to open_remote_dentry() results in a fault since _open_remote_dentry_finish does fin->complete(...)?
doc/cephfs/scrub.rst
Outdated
| If scrub is able to repair the damage, the corresponding entry is automatically | ||
| removed from the damage table. | ||
|
|
||
| Note: scrub invoked with ``repair`` option can identify an damaged hard link but not repair it. |
There was a problem hiding this comment.
Note: A scrub invoked with the
@vshankar Initially I added a nullptr checks, but now I just replaced it with |
anthonyeleven
left a comment
There was a problem hiding this comment.
Docs lgtm, I'm not qualified to approve the code.
|
@vshankar a friendly ping |
Sure 👍 |
vshankar
left a comment
There was a problem hiding this comment.
LGTM. Will run this through tests.
|
This PR is under test in https://tracker.ceph.com/issues/73082. |
For a remote link, try to open the dentry (this part of code is copied from the mds path_traversal) which internally pushes the dir_frag in the damage list if applicable. Fixes: https://tracker.ceph.com/issues/69345 Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
…crub Also update ignorelist with the expected cluster warning. Fixes: https://tracker.ceph.com/issues/69345 Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
Fixes: https://tracker.ceph.com/issues/69345 Signed-off-by: Md Mahamudur Rahaman Sajib <mahamudur.sajib@croit.io>
|
Added the expected cluster warning [object missing on disk] in ignore list and cleaned up commit messaged. |
|
jenkins test windows |
|
This is nearly good to go 👍 |
Fixes: https://tracker.ceph.com/issues/69345
Signed-off-by: Md Mahamudur Rahaman Sajib mahamudur.sajib@croit.io
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e