tools/cephfs/DataScan: test and handle multiple duplicate injected primary links to a directory#61562
tools/cephfs/DataScan: test and handle multiple duplicate injected primary links to a directory#61562
Conversation
|
This PR is under test in https://tracker.ceph.com/issues/69704. |
gregsfortytwo
left a comment
There was a problem hiding this comment.
Looks broadly good and the data scan work seems fine but I'm concerned about the forward scrub damage handling.
| dout(10) << "newest is: " << newest << dendl; | ||
|
|
||
| for (auto& q : p.second) { | ||
| // in the middle of dir fragmentation? |
There was a problem hiding this comment.
A bit unrelated to this change: how does this check for an in-progress fragmentation without considering frag_t?
There was a problem hiding this comment.
it doesn't. Perhaps it should be
if (newest == q)
just below this line.
There was a problem hiding this comment.
Nice. That's what my fix did here: https://github.com/ceph/ceph/pull/54971/files#diff-41a41f8397ae011bd1ab0c92e4ff99b498ee6b525961e9ec5360ed5cfd55fd6cR1284
But I like this way 👍
There was a problem hiding this comment.
FWIW I added your signed-off-by to that commit.
| << q.dirfrag() << "/" << q.name << dendl; | ||
| { | ||
| /* we've removed the injected linkage: don't fix it later */ | ||
| auto range = injected_inos.equal_range(p.first); |
There was a problem hiding this comment.
So, after we have gone through this, there remains only one dirfrag for the directory (once to_remove gets processed). What happens to the backtraces in the data pool for files that might be outdated? My guess is this is taken care in commit 7487845 (which I have to still check).
There was a problem hiding this comment.
scrub will repair the backtraces
There was a problem hiding this comment.
OK, yeh. The backtraces remain outdated and for a very long time (until scrub is run).
|
jenkins test make check |
|
jenkins test make check arm64 |
* refs/pull/61562/head: mds: skip scrubbing damaged dirfrag tools/cephfs/DataScan: test equality of link including frag tools/cephfs/DataScan: skip linkages that have been removed tools/cephfs/DataScan: do not error out when failing to read a dentry tools/cephfs/DataScan: create all ancestors during scan_inodes qa: add data scan tests for ancestry rebuild
|
This PR is under test in https://tracker.ceph.com/issues/69722. |
4136be3 to
b482c3f
Compare
* refs/pull/61562/head: mds: skip scrubbing damaged dirfrag tools/cephfs/DataScan: test equality of link including frag tools/cephfs/DataScan: skip linkages that have been removed tools/cephfs/DataScan: do not error out when failing to read a dentry tools/cephfs/DataScan: create all ancestors during scan_inodes tools/cephfs/DataScan: cleanup debug prints qa: remove old MovedDir test qa: add data scan tests for ancestry rebuild qa: make the directory non-empty to force migration qa: avoid unnecessary mds restart
The MDS are already sitting in standby because the fs is damaged. Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
rank 0 will not migrate the directory until it is non-empty. And creating a file and promptly deleting is not reliable; the vstart_runner.py test will just loop forever. Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
6548c56 to
d0e6ba8
Compare
When one PG is lost in the metadata pool, it will seem as if arbitrary dirfrags have been lost (and possibly other things, ignored for now). The cephfs-data-scan tool does not presently try to reconstruct those missing parents so these tests will fail. Importantly, also test the case where multiple ancestry paths may exist due to out-of-date backtraces on regular files. Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
This is now superseded by test from the previous commit. The MovedDir workload assumes that an ancestry with missing dirfrags will not be recreated. Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
So it doesn't look like:
2025-01-29T18:47:36.592 INFO:teuthology.orchestra.run.smithi043.stderr:2025-01-29T18:47:36.583+0000 7fc310d1e840 20 datascan.find_or_create_dirfrag: Dirfrag already exists: 0x0x10000000000 *
or
2025-01-29T18:47:36.589 INFO:teuthology.orchestra.run.smithi043.stderr:2025-01-29T18:47:36.578+0000 7fc310d1e840 10 datascan.find_or_create_dirfrag: Created dirfrag: 0x0x1
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
When arbitrary PGs are lost which consequently lose random dirfrag objects, we may need to recover the full ancestry when the immediate parent exists. So, always recover the ancestry and fixup the potential duplicate linkages to a directory during scan_links. Fixes: https://tracker.ceph.com/issues/69692 Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Give a chance for the scan tool to proceed with other inodes/dentries. Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Also: injected_inos may need to store multiple injected primary links found in
the metadata pool. This is especially true now that we may have multiple
primary links to a directory due to prior commit
tools/cephfs/DataScan: create all ancestors during scan_inodes
Fixes: https://tracker.ceph.com/issues/63301
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
To address the possibility of two primary links existing in different fragments. Signed-off-by: Venky Shankar <vshankar@redhat.com> Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
This only happens when the omap fetch fails or the fnode is corrupt. MDS can't
presently repair that damage. Without this change, the MDS enters an infinite loop of repair:
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 MDSContext::complete: 12C_RetryScrub
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs: state=RUNNING
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs entering with 0 in progress and 1 in the stack
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 ap=1+0 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=1 scrubqueue=1 0x55b1a50fa880]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.den(0x10000000000 dir_xx) scrubbing [dentry #0x1/dir_x/dir_xx [2,head] auth (dversion lock) pv=0 v=8 ino=0x10000000001 state=1073741824 0x55b1a50eaf00] next_seq = 2
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.snaprealm(0x1 seq 1 0x55b1a50da240) get_snaps (seq 1 cached_seq 1)
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack _enqueue with {[inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680]}, top=0
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.ino(0x10000000001) scrub_initialize with scrub_version 6
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.ino(0x10000000001) uninline_initialize with scrub_version 6
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack enqueue [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680] to bottom of ScrubStack
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) get_num_head_items() = 1; fnode.fragstat.nfiles=0 fnode.fragstat.nsubdirs=1
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) total of child dentries: n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) my rstats: n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000000) check_rstats complete on 0x55b1a50fa880
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) scrub_finished
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000000) auth_unpin by 0x55b1a4f7b600 on [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] count now 0
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag done
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs dirfrag, done
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack dequeue [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] from ScrubStack
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs examining [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) can_auth_pin: auth!
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dir_inode [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode recursive mode, frags [*]
2025-01-28T19:25:46.153+0000 7f9626cc5640 15 mds.0.cache.ino(0x10000000001) maybe_export_pin update=0 [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000001) can_auth_pin: auth!
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode barebones [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000001) fetch_keys 0 keys on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000001) auth_pin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180] count now 1
2025-01-28T19:25:46.153+0000 7f9626cc5640 1 -- [v2:172.21.10.4:6867/526112796,v1:172.21.10.4:6872/526112796] --> [v2:172.21.10.4:6802/3852331191,v1:172.21.10.4:6803/3852331191] -- osd_op(unknown.0.340:50 42.7 42:e2e07930:::10000000001.00000000:head [omap-get-header,omap-get-vals-by-keys in=4b,getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e564) -- 0x55b1a50d8c00 con 0x55b1a50d9000
2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.bal hit_dir 3 pop is 1, frag * size 0 [pop IRD:[C 0.00e+00] IWR:[C 0.00e+00] RDR:[C 0.00e+00] FET:[C 1.00e+00] STR:[C 0.00e+00] *LOAD:2.0]
2025-01-28T19:25:46.153+0000 7f962ecd5640 1 -- [v2:172.21.10.4:6867/526112796,v1:172.21.10.4:6872/526112796] <== osd.0 v2:172.21.10.4:6802/3852331191 3 ==== osd_op_reply(50 10000000001.00000000 [omap-get-header,omap-get-vals-by-keys,getxattr] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) ==== 248+0+0 (crc 0 0 0) 0x55b1a4444280 con 0x55b1a50d9000
2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched
2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) _fetched header 0 bytes 0 keys for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f96254c2640 0 mds.0.cache.dir(0x10000000001) _fetched missing object for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f96254c2640 -1 log_channel(cluster) log [ERR] : dir 0x10000000001 object missing on disk; some files may be lost (/dir_x/dir_xx)
2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) go_bad *
2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) auth_unpin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180] count now 0
2025-01-28T19:25:46.153+0000 7f96254c2640 11 mds.0.cache.dir(0x10000000001) finish_waiting mask 2 result -5 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSContext::complete: 12C_RetryScrub
Note that this partially reverts 5b56098. That commit incorrectly marked a
dirfrag as repaired when it may not even exist in the metadata pool.
Fixes: 5b56098
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
mchangir
left a comment
There was a problem hiding this comment.
nit:
for the commit message of qa: make the directory non-empty to force migration, I think it would be better to rephrase
rank 0 will not migrate the directory until it is non-empty ...
to
rank 0 will not migrate the directory as long as it is empty ...
Maybe its just my English comprehension skills, but the double negation makes my head hurt.
Commit messages are generally phrased as an imperative (i.e. |
I was about to delete my comment but then I noticed your response. |
Scrub does not fix damaged dirfrags for any type of damage we currently mark dirfrags damaged for (corrupt fnode / missing dirfrag object). In any case, this scenario is covered in cephfs_data_scan with correct checks for damage / handling. Fixes: 7f0cf0b Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
|
jenkins test windows |
|
This PR is under test in https://tracker.ceph.com/issues/69828. |
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e