Skip to content

tools/cephfs/DataScan: test and handle multiple duplicate injected primary links to a directory#61562

Merged
batrick merged 11 commits intoceph:mainfrom
batrick:i69692
Feb 6, 2025
Merged

tools/cephfs/DataScan: test and handle multiple duplicate injected primary links to a directory#61562
batrick merged 11 commits intoceph:mainfrom
batrick:i69692

Conversation

@batrick
Copy link
Member

@batrick batrick commented Jan 29, 2025

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@batrick
Copy link
Member Author

batrick commented Jan 29, 2025

This PR is under test in https://tracker.ceph.com/issues/69704.

Copy link
Member

@gregsfortytwo gregsfortytwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks broadly good and the data scan work seems fine but I'm concerned about the forward scrub damage handling.

Copy link
Contributor

@vshankar vshankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good. Still have to go through commit 7487845.

dout(10) << "newest is: " << newest << dendl;

for (auto& q : p.second) {
// in the middle of dir fragmentation?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit unrelated to this change: how does this check for an in-progress fragmentation without considering frag_t?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't. Perhaps it should be

if (newest == q)

just below this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now addressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I added your signed-off-by to that commit.

<< q.dirfrag() << "/" << q.name << dendl;
{
/* we've removed the injected linkage: don't fix it later */
auto range = injected_inos.equal_range(p.first);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, after we have gone through this, there remains only one dirfrag for the directory (once to_remove gets processed). What happens to the backtraces in the data pool for files that might be outdated? My guess is this is taken care in commit 7487845 (which I have to still check).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scrub will repair the backtraces

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yeh. The backtraces remain outdated and for a very long time (until scrub is run).

@mchangir
Copy link
Contributor

jenkins test make check

@mchangir
Copy link
Contributor

jenkins test make check arm64

batrick added a commit to batrick/ceph that referenced this pull request Jan 29, 2025
* refs/pull/61562/head:
	mds: skip scrubbing damaged dirfrag
	tools/cephfs/DataScan: test equality of link including frag
	tools/cephfs/DataScan: skip linkages that have been removed
	tools/cephfs/DataScan: do not error out when failing to read a dentry
	tools/cephfs/DataScan: create all ancestors during scan_inodes
	qa: add data scan tests for ancestry rebuild
@batrick
Copy link
Member Author

batrick commented Jan 29, 2025

This PR is under test in https://tracker.ceph.com/issues/69722.

@batrick batrick force-pushed the i69692 branch 2 times, most recently from 4136be3 to b482c3f Compare January 30, 2025 02:07
batrick added a commit to batrick/ceph that referenced this pull request Jan 30, 2025
* refs/pull/61562/head:
	mds: skip scrubbing damaged dirfrag
	tools/cephfs/DataScan: test equality of link including frag
	tools/cephfs/DataScan: skip linkages that have been removed
	tools/cephfs/DataScan: do not error out when failing to read a dentry
	tools/cephfs/DataScan: create all ancestors during scan_inodes
	tools/cephfs/DataScan: cleanup debug prints
	qa: remove old MovedDir test
	qa: add data scan tests for ancestry rebuild
	qa: make the directory non-empty to force migration
	qa: avoid unnecessary mds restart
The MDS are already sitting in standby because the fs is damaged.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
rank 0 will not migrate the directory until it is non-empty. And creating a
file and promptly deleting is not reliable; the vstart_runner.py test will just
loop forever.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
@batrick batrick force-pushed the i69692 branch 2 times, most recently from 6548c56 to d0e6ba8 Compare January 30, 2025 03:32
When one PG is lost in the metadata pool, it will seem as if arbitrary dirfrags
have been lost (and possibly other things, ignored for now). The
cephfs-data-scan tool does not presently try to reconstruct those missing
parents so these tests will fail.

Importantly, also test the case where multiple ancestry paths may exist due to
out-of-date backtraces on regular files.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
This is now superseded by test from the previous commit. The MovedDir workload
assumes that an ancestry with missing dirfrags will not be recreated.

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
So it doesn't look like:

    2025-01-29T18:47:36.592 INFO:teuthology.orchestra.run.smithi043.stderr:2025-01-29T18:47:36.583+0000 7fc310d1e840 20 datascan.find_or_create_dirfrag: Dirfrag already exists: 0x0x10000000000 *

or

    2025-01-29T18:47:36.589 INFO:teuthology.orchestra.run.smithi043.stderr:2025-01-29T18:47:36.578+0000 7fc310d1e840 10 datascan.find_or_create_dirfrag: Created dirfrag: 0x0x1

Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
batrick and others added 5 commits January 29, 2025 23:19
When arbitrary PGs are lost which consequently lose random dirfrag objects, we
may need to recover the full ancestry when the immediate parent exists. So,
always recover the ancestry and fixup the potential duplicate linkages to a
directory during scan_links.

Fixes: https://tracker.ceph.com/issues/69692
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Give a chance for the scan tool to proceed with other inodes/dentries.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
Also: injected_inos may need to store multiple injected primary links found in
the metadata pool. This is especially true now that we may have multiple
primary links to a directory due to prior commit

    tools/cephfs/DataScan: create all ancestors during scan_inodes

Fixes: https://tracker.ceph.com/issues/63301
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
To address the possibility of two primary links existing in different
fragments.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
This only happens when the omap fetch fails or the fnode is corrupt. MDS can't
presently repair that damage. Without this change, the MDS enters an infinite loop of repair:

    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 MDSContext::complete: 12C_RetryScrub
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs: state=RUNNING
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs entering with 0 in progress and 1 in the stack
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 ap=1+0 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=1 scrubqueue=1 0x55b1a50fa880]
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.den(0x10000000000 dir_xx) scrubbing [dentry #0x1/dir_x/dir_xx [2,head] auth (dversion lock) pv=0 v=8 ino=0x10000000001 state=1073741824 0x55b1a50eaf00] next_seq = 2
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10  mds.0.cache.snaprealm(0x1 seq 1 0x55b1a50da240) get_snaps  (seq 1 cached_seq 1)
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack _enqueue with {[inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680]}, top=0
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.ino(0x10000000001) scrub_initialize with scrub_version 6
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.ino(0x10000000001) uninline_initialize with scrub_version 6
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack enqueue [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) 0x55b1a4fac680] to bottom of ScrubStack
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) get_num_head_items() = 1; fnode.fragstat.nfiles=0 fnode.fragstat.nsubdirs=1
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) total of child dentries: n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) my rstats:              n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2)
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000000) check_rstats complete on 0x55b1a50fa880
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) scrub_finished
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000000) auth_unpin by 0x55b1a4f7b600 on [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] count now 0
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dirfrag done
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs dirfrag, done
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack dequeue [dir 0x10000000000 /dir_x/ [2,head] auth v=8 cv=7/7 state=1610612737|complete f(v0 m2025-01-28T19:25:31.191802+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) hs=1+0,ss=0+0 | child=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 0x55b1a50fa880] from ScrubStack
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack kick_off_scrubs examining [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000000) can_auth_pin: auth!
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.scrubstack scrub_dir_inode [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode recursive mode, frags [*]
    2025-01-28T19:25:46.153+0000 7f9626cc5640 15 mds.0.cache.ino(0x10000000001) maybe_export_pin update=0 [inode 0x10000000001 [...2,head] /dir_x/dir_xx/ auth v6 f(v0 m2025-01-28T19:25:31.193448+0000 1=0+1) n(v0 rc2025-01-28T19:25:31.306508+0000 b1 3=1+2) | scrubqueue=1 0x55b1a4fac680]
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.cache.dir(0x10000000001) can_auth_pin: auth!
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.scrubstack scrub_dir_inode barebones [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000001) fetch_keys 0 keys on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
    2025-01-28T19:25:46.153+0000 7f9626cc5640 10 mds.0.cache.dir(0x10000000001) auth_pin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180] count now 1
    2025-01-28T19:25:46.153+0000 7f9626cc5640  1 -- [v2:172.21.10.4:6867/526112796,v1:172.21.10.4:6872/526112796] --> [v2:172.21.10.4:6802/3852331191,v1:172.21.10.4:6803/3852331191] -- osd_op(unknown.0.340:50 42.7 42:e2e07930:::10000000001.00000000:head [omap-get-header,omap-get-vals-by-keys in=4b,getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e564) -- 0x55b1a50d8c00 con 0x55b1a50d9000
    2025-01-28T19:25:46.153+0000 7f9626cc5640 20 mds.0.bal hit_dir 3 pop is 1, frag * size 0 [pop IRD:[C 0.00e+00] IWR:[C 0.00e+00] RDR:[C 0.00e+00] FET:[C 1.00e+00] STR:[C 0.00e+00] *LOAD:2.0]
    2025-01-28T19:25:46.153+0000 7f962ecd5640  1 -- [v2:172.21.10.4:6867/526112796,v1:172.21.10.4:6872/526112796] <== osd.0 v2:172.21.10.4:6802/3852331191 3 ==== osd_op_reply(50 10000000001.00000000 [omap-get-header,omap-get-vals-by-keys,getxattr] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) ==== 248+0+0 (crc 0 0 0) 0x55b1a4444280 con 0x55b1a50d9000
    2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSIOContextBase::complete: 21C_IO_Dir_OMAP_Fetched
    2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSContext::complete: 21C_IO_Dir_OMAP_Fetched
    2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) _fetched header 0 bytes 0 keys for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
    2025-01-28T19:25:46.153+0000 7f96254c2640  0 mds.0.cache.dir(0x10000000001) _fetched missing object for [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 ap=1+0 state=1073741824 f() n() hs=0+0,ss=0+0 | authpin=1 0x55b1a50fb180]
    2025-01-28T19:25:46.153+0000 7f96254c2640 -1 log_channel(cluster) log [ERR] : dir 0x10000000001 object missing on disk; some files may be lost (/dir_x/dir_xx)
    2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) go_bad *
    2025-01-28T19:25:46.153+0000 7f96254c2640 10 mds.0.cache.dir(0x10000000001) auth_unpin by 0x55b1a50fb180 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180] count now 0
    2025-01-28T19:25:46.153+0000 7f96254c2640 11 mds.0.cache.dir(0x10000000001) finish_waiting mask 2 result -5 on [dir 0x10000000001 /dir_x/dir_xx/ [2,head] auth v=0 cv=0/0 state=1073741824 f() n() hs=0+0,ss=0+0 0x55b1a50fb180]
    2025-01-28T19:25:46.153+0000 7f96254c2640 10 MDSContext::complete: 12C_RetryScrub

Note that this partially reverts 5b56098. That commit incorrectly marked a
dirfrag as repaired when it may not even exist in the metadata pool.

Fixes: 5b56098
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
@gregsfortytwo gregsfortytwo dismissed their stale review January 30, 2025 13:54

Questions answered

Copy link
Contributor

@mchangir mchangir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
for the commit message of qa: make the directory non-empty to force migration, I think it would be better to rephrase
rank 0 will not migrate the directory until it is non-empty ...
to
rank 0 will not migrate the directory as long as it is empty ...

Maybe its just my English comprehension skills, but the double negation makes my head hurt.

@batrick
Copy link
Member Author

batrick commented Jan 31, 2025

nit: for the commit message of qa: make the directory non-empty to force migration, I think it would be better to rephrase rank 0 will not migrate the directory until it is non-empty ... to rank 0 will not migrate the directory as long as it is empty ...

Maybe its just my English comprehension skills, but the double negation makes my head hurt.

Commit messages are generally phrased as an imperative (i.e. <verb> <object>). So your suggestions would be unsuitable. But in any case, I don't see a double negation?

@mchangir
Copy link
Contributor

nit: for the commit message of qa: make the directory non-empty to force migration, I think it would be better to rephrase rank 0 will not migrate the directory until it is non-empty ... to rank 0 will not migrate the directory as long as it is empty ...
Maybe its just my English comprehension skills, but the double negation makes my head hurt.

Commit messages are generally phrased as an imperative (i.e. <verb> <object>). So your suggestions would be unsuitable. But in any case, I don't see a double negation?

I was about to delete my comment but then I noticed your response.
The commit title is good enough too.

Scrub does not fix damaged dirfrags for any type of damage we currently mark
dirfrags damaged for (corrupt fnode / missing dirfrag object).

In any case, this scenario is covered in cephfs_data_scan with correct checks
for damage / handling.

Fixes: 7f0cf0b
Signed-off-by: Patrick Donnelly <pdonnell@ibm.com>
@batrick
Copy link
Member Author

batrick commented Feb 5, 2025

jenkins test windows

@batrick
Copy link
Member Author

batrick commented Feb 5, 2025

This PR is under test in https://tracker.ceph.com/issues/69828.

@batrick
Copy link
Member Author

batrick commented Feb 6, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants