Project

General

Profile

Actions

Bug #54546

open

mds: crash due to corrupt inode and omap entry

Added by Venky Shankar about 4 years ago. Updated 8 months ago.

Status:
In Progress
Priority:
Normal
Category:
Correctness/Safety
Target version:
-
% Done:

100%

Source:
Community (dev)
Backport:
squid,reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
crash
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

A corrupted on-disk inode causes the MDS to crash with an assert. The backtrace looks something like:

 1: (()+0x4058c0) [0x5588549cd8c0]
 2: (()+0x12890) [0x7fdf788d3890]
 3: (gsignal()+0xc7) [0x7fdf779c6e97]
 4: (abort()+0x141) [0x7fdf779c8801]
 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x250) [0x7fdf78fbd530]
 6: (()+0x26d5a7) [0x7fdf78fbd5a7]
 7: (Server::_unlink_local(boost::intrusive_ptr<MDRequestImpl>&, CDentry*, CDentry*)+0x15f8) [0x5588547482e8]
 8: (Server::handle_client_unlink(boost::intrusive_ptr<MDRequestImpl>&)+0x961) [0x558854748cd1]
 9: (Server::handle_client_request(MClientRequest*)+0x49b) [0x55885476305b]
 10: (Server::dispatch(Message*)+0x2db) [0x558854766d1b]
 11: (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5588546da1e4]
 12: (MDSRank::_dispatch(Message*, bool)+0x89b) [0x5588546e7a1b]
 13: (MDSRank::retry_dispatch(Message*)+0x12) [0x5588546e8012]
 14: (MDSInternalContextBase::complete(int)+0x67) [0x55885494df87]
 15: (MDSRank::_advance_queues()+0xf1) [0x5588546e69c1]
 16: (MDSRank::ProgressThread::entry()+0x43) [0x5588546e7043]
 17: (()+0x76db) [0x7fdf788c86db]
 18: (clone()+0x3f) [0x7fdf77aa988f]

This is hard to reproduce and was seen on a couple of occasions with PostgreSQL doing I/O on CephFS. AFAICT, this happend even with latest pacific/master. The failed assert this in Server::_unlink_local():

  if (straydn) {
    ceph_assert(in->first <= straydn->first);
    in->first = straydn->first;
  }

`in->first` is not a sane value. From one setup the inode was something like (not the nonsensically large value):

[inode 0x1000014e91a [...1000010a4a0,head" 

Surprisingly, `1000010a4a0` is the inode of another (ancestor) directory:

debug 2022-02-23 11:37:33.107 7f59a55ca700 15 mds.0.cache  chose lock states on [inode 0x1000010a4a0 [...d2,head] /path/to/ancestor auth v7874399 snaprealm=0x55b5cc463900 f(v0 m2021-06-25 07:24:25.703543 1=0+1) n(v252768 rc2022-02-23 11:36:29.112471 b226649760 2906=2877+29) (inest lock dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55b5c9c3b100]

This is same as tracker https://tracker.ceph.com/issues/38452 where in note-14 the omap dump also confirms the same:

00000000  fe ff ff ff ff ff ff ff  49 0f 06 a3 01 00 00 24  |........I......$|
00000010  68 03 00 00 01 00 00 00  00 00 00 16 8d 7c 5c f6  |h............|\.|
00000020  62 bc 0d 00 80 00 00 00  00 00 00 00 00 00 00 01  |b...............|

Although in this case the value is 0xfffffffffffffffe.


Subtasks 1 (0 open1 closed)

Feature #55470: qa: postgresql test suite workunitResolvedPatrick Donnelly

Actions

Related issues 3 (2 open1 closed)

Related to CephFS - Feature #55414: mds:asok interface to cleanup permanently damaged inodes New

Actions
Related to CephFS - Feature #56140: cephfs: tooling to identify inode (metadata) corruptionResolvedPatrick Donnelly

Actions
Related to CephFS - Bug #38452: mds: assert crash loop while unlinking fileIn ProgressPatrick Donnelly

Actions
Actions #1

Updated by Venky Shankar almost 4 years ago

  • Related to Feature #55414: mds:asok interface to cleanup permanently damaged inodes added
Actions #2

Updated by Venky Shankar almost 4 years ago

Saw this in another cluster. The corruption is seen in the EMetaBlob journal event. The inode+dentry fetch from the journal (fullbit) has corrupted `dnfirst' field. This narrows down the scope of the problem to code paths that are journaling operations.

Actions #3

Updated by Venky Shankar almost 4 years ago

  • Assignee changed from Venky Shankar to Patrick Donnelly

Patrick, assigning this to you since you are making progress on this.

Actions #4

Updated by Patrick Donnelly over 3 years ago

  • Related to Feature #56140: cephfs: tooling to identify inode (metadata) corruption added
Actions #5

Updated by Dhairya Parmar over 2 years ago

  • Related to Bug #38452: mds: assert crash loop while unlinking file added
Actions #6

Updated by Patrick Donnelly almost 2 years ago

  • Status changed from New to In Progress
  • Target version set to v20.0.0
  • Backport changed from quincy, pacific to squid,reef,quincy
  • Labels (FS) crash added
Actions #7

Updated by Patrick Donnelly almost 2 years ago

Corruption trace below, observed during replay. File names have been obfuscated.

# The last event where inode 0x10003bb2d9f finished with a valid [first,last]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.log _replay 1141413939372635~56342 / 1141414175965184 2024-05-23T04:25:39.930850+0000: EUpdate setxattr [metablob 0x5000002c5c0, 2 dirs]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay 2 dirlumps by unknown.0
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x5000002c5c0
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.cache.dir(0x5000002c5c0) _mark_dirty (already dirty) [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096108 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80] version 881096108
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay      clean nestinfo on [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096108 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay      clean fragstat on [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096108 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096108 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.622+0000 7fe8f354f700 20 mds.0.cache.dir(0x5000002c5c0) lookup_exact_snap (head, 'c')
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume/uuid/a/b/c [2,head] auth (dversion lock) v=881096107 ino=0x5000002ca75 state=1610612736 | inodepin=1 dirty=1 0x55c2d0df5180]
2024-05-23T05:58:37.622+0000 7fe8f354f700 15 mds.0.cache.ino(0x5000002ca75) maybe_export_pin update=0 [inode 0x5000002ca75 [...2,head] /volumes/group/volume/uuid/a/b/c/ auth v881096107 f(v1440560 m2024-05-23T04:25:39.917887+0000 6=6+0) n(v438633 rc2024-05-23T04:25:39.935887+0000 b4479483356 7=6+1)/n(v438633 rc2024-05-23T04:23:51.363154+0000 b4479483356 6=5+1) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2ccfbe100]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x5000002ca75 [...3556,head] /volumes/group/volume/uuid/a/b/c/ auth v881096107 f(v1440560 m2024-05-23T04:25:39.917887+0000 6=6+0) n(v438633 rc2024-05-23T04:25:39.935887+0000 b4479483356 7=6+1)/n(v438633 rc2024-05-23T04:23:51.363154+0000 b4479483356 6=5+1) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2ccfbe100]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x5000002ca75
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x5000002ca75 /volumes/group/volume/uuid/a/b/c/ [2,head] auth v=46254609 cv=0/0 state=1610612736 f(v1440560 m2024-05-23T04:25:39.917887+0000 6=6+0) n(v438633 rc2024-05-23T04:25:39.935887+0000 b4479483356 6=6+0) hs=3+0,ss=0+0 dirty=3 | child=1 dirty=1 0x55c2cfe98d80]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x5000002ca75 /volumes/group/volume/uuid/a/b/c/ [2,head] auth v=46254611 cv=0/0 state=1610612736 f(v1440560 m2024-05-23T04:25:39.917887+0000 6=6+0) n(v438633 rc2024-05-23T04:25:39.987888+0000 b6020469579 6=6+0) hs=3+0,ss=0+0 dirty=3 | child=1 dirty=1 0x55c2cfe98d80]
2024-05-23T05:58:37.622+0000 7fe8f354f700 20 mds.0.cache.dir(0x5000002ca75) lookup_exact_snap (head, 'interesting')
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [3557,head] had [dentry #0x1/volumes/group/volume/uuid/a/b/c/interesting [3557,head] auth (dversion lock) v=46254608 ino=0x10003bb2d9f state=1610612736 | inodepin=1 dirty=1 0x55c2cbef1900]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [3557,head] had [inode 0x10003bb2d9f [3557,head] /volumes/group/volume/uuid/a/b/c/interesting auth v46254608 DIRTYPARENT s=0 n(v0 rc2024-05-23T04:25:39.935887+0000 1=1+0) (iversion lock) cr={1264134=0-4194304@3556} | dirtyparent=1 dirty=1 0x55c2cf3a8c00]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay request client.1264134:468991474 trim_to 468991450

2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.log _replay 1141413939428997~54285 / 1141414175965184 2024-05-23T04:25:39.952052+0000: EOpen [metablob 0x5000002c5c0, 1 dirs], 1 open files
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EOpen.replay
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay 1 dirlumps by unknown.0
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x5000002c5c0
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096108 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.622+0000 7fe8f354f700 20 mds.0.cache.dir(0x5000002c5c0) lookup_exact_snap (head, 'irrelevant')
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume/uuid/a/b/irrelevant [2,head] auth (dversion lock) v=881095400 ino=0x300000007ec state=1610612736 | inodepin=1 dirty=1 0x55c2cffb2500]
2024-05-23T05:58:37.622+0000 7fe8f354f700 15 mds.0.cache.ino(0x300000007ec) maybe_export_pin update=0 [inode 0x300000007ec [...2,head] /volumes/group/volume/uuid/a/b/irrelevant/ auth v881095400 f(v0 m2024-05-23T04:23:54.469203+0000 1=1+0) n(v0 rc2024-05-23T04:23:54.469203+0000 b47 2=1+1) old_inodes=28 (iversion lock) | dirfrag=1 dirty=1 0x55c2cc093600]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x300000007ec [...3557,head] /volumes/group/volume/uuid/a/b/irrelevant/ auth v881095400 f(v0 m2024-05-23T04:23:54.469203+0000 1=1+0) n(v0 rc2024-05-23T04:23:54.469203+0000 b47 2=1+1) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2cc093600]

# probably irrelevant:
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.log _replay 1141413939483302~872 / 1141414175965184 2024-05-23T04:25:39.952060+0000: ESubtreeMap 2 subtrees , 0 ambiguous [metablob 0x1, 2 dirs]
2024-05-23T05:58:37.622+0000 7fe8f354f700 10 mds.0.journal ESubtreeMap.replay -- i already have import map; verifying

# included because snaprealm(0x10000000be0...) is interesting because it's one less than the corruption value, 10000000be1
2024-05-23T05:58:37.624+0000 7fe8f354f700 10 mds.0.log _replay 1141413939484194~11339156 / 1141414175965184 2024-05-23T04:25:39.952214+0000: EUpdate openc [metablob 0x1, 8 dirs]
2024-05-23T05:58:37.625+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay 8 dirlumps by client.1264134
2024-05-23T05:58:37.625+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x1
2024-05-23T05:58:37.625+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x1 / [2,head] auth v=46202548 cv=0/0 dir_auth=0 state=1610612736 f(v0 m2023-02-14T15:27:55.071711+0000 1=0+1) n(v2900339 rc2024-05-23T04:24:00.454513+0000 b1727349780880191 rs27 4446327=3294816+1151511) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 0x55c2ca8bed00]
2024-05-23T05:58:37.644+0000 7fe8f354f700 20 mds.0.cache.dir(0x1) lookup_exact_snap (head, 'volumes')
2024-05-23T05:58:37.644+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes [2,head] auth (dversion lock) v=46202547 ino=0x10000000000 state=1610612736 | inodepin=1 dirty=1 0x55c2ca8bb680]
2024-05-23T05:58:37.644+0000 7fe8f354f700 15 mds.0.cache.ino(0x10000000000) maybe_export_pin update=0 [inode 0x10000000000 [...2,head] /volumes/ auth v46202547 f(v0 m2023-10-16T13:20:57.971728+0000 13=10+3) n(v3571309 rc2024-05-23T04:24:00.454513+0000 b1727349780880191 rs27 4446327=3294816+1151511) old_inodes=13196 (iversion lock) | dirfrag=1 dirty=1 0x55c2cb5fb600]
2024-05-23T05:58:37.645+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x10000000000 [...3557,head] /volumes/ auth v46202547 f(v0 m2023-10-16T13:20:57.971728+0000 13=10+3) n(v3571309 rc2024-05-23T04:24:00.454513+0000 b1727349780880191 rs27 4446327=3294816+1151511) old_inodes=13196 (iversion lock) | dirfrag=1 dirty=1 0x55c2cb5fb600]
2024-05-23T05:58:37.645+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x10000000000
2024-05-23T05:58:37.645+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x10000000000 /volumes/ [2,head] auth v=50328480 cv=0/0 state=1610612736 f(v0 m2023-10-16T13:20:57.971728+0000 13=10+3) n(v3571309 rc2024-05-23T04:24:00.454513+0000 b1727349780880191 rs27 4446326=3294816+1151510) hs=1+0,ss=0+0 dirty=1 | child=1 dirty=1 0x55c2ca8bf600]
2024-05-23T05:58:37.661+0000 7fe8f354f700 20 mds.0.cache.dir(0x10000000000) lookup_exact_snap (head, 'group')
2024-05-23T05:58:37.661+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group [2,head] auth (dversion lock) v=50328479 ino=0x10000000002 state=1610612736 | inodepin=1 dirty=1 0x55c2ca8bbb80]
2024-05-23T05:58:37.661+0000 7fe8f354f700 15 mds.0.cache.ino(0x10000000002) maybe_export_pin update=0 [inode 0x10000000002 [...2,head] /volumes/group/ auth v50328479 f(v0 m2023-10-16T13:18:45.125124+0000 6=0+6) n(v3309657 rc2024-05-23T04:24:00.454513+0000 b1727349780879021 rs27 4446309=3294804+1151505) old_inodes=10980 (iversion lock) | dirfrag=1 dirty=1 0x55c2cb5fbb80]
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x10000000002 [...3557,head] /volumes/group/ auth v50328479 f(v0 m2023-10-16T13:18:45.125124+0000 6=0+6) n(v3309657 rc2024-05-23T04:24:00.454513+0000 b1727349780879021 rs27 4446309=3294804+1151505) old_inodes=10980 (iversion lock) | dirfrag=1 dirty=1 0x55c2cb5fbb80]
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x10000000002
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x10000000002 /volumes/group/ [2,head] auth v=51457666 cv=0/0 state=1610612736 f(v0 m2023-10-16T13:18:45.125124+0000 6=0+6) n(v3309657 rc2024-05-23T04:24:00.454513+0000 b1727349780879021 rs27 4446308=3294804+1151504) hs=1+0,ss=0+0 dirty=1 | child=1 dirty=1 0x55c2ca8bfa80]
2024-05-23T05:58:37.662+0000 7fe8f354f700 20 mds.0.cache.dir(0x10000000002) lookup_exact_snap (head, 'a')
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume [2,head] auth (dversion lock) v=51457665 ino=0x10000000be0 state=1610612736 | inodepin=1 dirty=1 0x55c2d00ce280]
2024-05-23T05:58:37.662+0000 7fe8f354f700 15 mds.0.cache.ino(0x10000000be0) maybe_export_pin update=0 [inode 0x10000000be0 [...2,head] /volumes/group/volume/ auth v51457665 snaprealm=0x55c2cb6f6600 f(v0 m2023-05-03T12:18:52.639517+0000 2=1+1) n(v3570449 rc2024-05-23T04:24:00.454513+0000 b1643694677098774 rs27 3286550=2282753+1003797) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2cb5fc100]
2024-05-23T05:58:37.662+0000 7fe8f354f700 20 mds.0.cache.ino(0x10000000be0) decode_snap_blob snaprealm(0x10000000be0 seq 3556 lc 3555 cr 3555 cps 2 snaps={34ed=snap(34ed 0x10000000be0 'daily-snapshot-20240521-0430' 2024-05-21T02:30:02.567663+0000),351f=snap(351f 0x10000000be0 'daily-snapshot-20240522-0430' 2024-05-22T02:30:02.705272+0000),3525=snap(3525 0x10000000be0 'hourly-snapshot-20240522-0700' 2024-05-22T05:00:02.262384+0000),3527=snap(3527 0x10000000be0 'hourly-snapshot-20240522-0800' 2024-05-22T06:00:02.032422+0000),3529=snap(3529 0x10000000be0 'hourly-snapshot-20240522-0900' 2024-05-22T07:00:02.335645+0000),352b=snap(352b 0x10000000be0 'hourly-snapshot-20240522-1000' 2024-05-22T08:00:02.373499+0000),352d=snap(352d 0x10000000be0 'hourly-snapshot-20240522-1100' 2024-05-22T09:00:01.707163+0000),352f=snap(352f 0x10000000be0 'hourly-snapshot-20240522-1200' 2024-05-22T10:00:01.842024+0000),3531=snap(3531 0x10000000be0 'hourly-snapshot-20240522-1300' 2024-05-22T11:00:02.216586+0000),3533=snap(3533 0x10000000be0 'hourly-snapshot-20240522-1400' 2024-05-22T12:00:02.298254+0000),3535=snap(3535 0x10000000be0 'hourly-snapshot-20240522-1500' 2024-05-22T13:00:02.418773+0000),3537=snap(3537 0x10000000be0 'hourly-snapshot-20240522-1600' 2024-05-22T14:00:07.238498+0000),3539=snap(3539 0x10000000be0 'hourly-snapshot-20240522-1700' 2024-05-22T15:00:02.392383+0000),353b=snap(353b 0x10000000be0 'hourly-snapshot-20240522-1800' 2024-05-22T16:00:01.861486+0000),353d=snap(353d 0x10000000be0 'hourly-snapshot-20240522-1900' 2024-05-22T17:00:01.916335+0000),353f=snap(353f 0x10000000be0 'hourly-snapshot-20240522-2000' 2024-05-22T18:00:02.159247+0000),3541=snap(3541 0x10000000be0 'hourly-snapshot-20240522-2100' 2024-05-22T19:00:02.114733+0000),3543=snap(3543 0x10000000be0 'hourly-snapshot-20240522-2200' 2024-05-22T20:00:02.743249+0000),3545=snap(3545 0x10000000be0 'hourly-snapshot-20240522-2300' 2024-05-22T21:00:01.745556+0000),3547=snap(3547 0x10000000be0 'hourly-snapshot-20240523-0000' 2024-05-22T22:00:02.877812+0000),3549=snap(3549 0x10000000be0 'hourly-snapshot-20240523-0100' 2024-05-22T23:00:02.007068+0000),354b=snap(354b 0x10000000be0 'hourly-snapshot-20240523-0200' 2024-05-23T00:00:02.463806+0000),354d=snap(354d 0x10000000be0 'hourly-snapshot-20240523-0300' 2024-05-23T01:00:02.691226+0000),354f=snap(354f 0x10000000be0 'hourly-snapshot-20240523-0400' 2024-05-23T02:00:01.670537+0000),3551=snap(3551 0x10000000be0 'daily-snapshot-20240523-0430' 2024-05-23T02:30:02.555903+0000),3553=snap(3553 0x10000000be0 'hourly-snapshot-20240523-0500' 2024-05-23T03:00:02.391532+0000),3555=snap(3555 0x10000000be0 'hourly-snapshot-20240523-0600' 2024-05-23T04:00:02.798591+0000)} past_parent_snaps= 0x55c2cb6f6600)
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x10000000be0 [...3556,head] /volumes/group/volume/ auth v51457665 snaprealm=0x55c2cb6f6600 f(v0 m2023-05-03T12:18:52.639517+0000 2=1+1) n(v3570449 rc2024-05-23T04:24:00.454513+0000 b1643694677098774 rs27 3286550=2282753+1003797) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2cb5fc100]
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x10000000be0
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x10000000be0 /volumes/group/volume/ [2,head] auth v=52863985 cv=0/0 state=1610612736 f(v0 m2023-05-03T12:18:52.639517+0000 2=1+1) n(v3570449 rc2024-05-23T04:24:00.454513+0000 b1643694677098774 3286549=2282753+1003796) hs=1+0,ss=0+0 dirty=1 | child=1 dirty=1 0x55c2d00d8000]
2024-05-23T05:58:37.662+0000 7fe8f354f700 20 mds.0.cache.dir(0x10000000be0) lookup_exact_snap (head, 'uuid')
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume/uuid [2,head] auth (dversion lock) v=52863984 ino=0x10000000be1 state=1610612736 | inodepin=1 dirty=1 0x55c2d00ce780]
2024-05-23T05:58:37.662+0000 7fe8f354f700 15 mds.0.cache.ino(0x10000000be1) maybe_export_pin update=0 [inode 0x10000000be1 [...2,head] /volumes/group/volume/uuid/ auth v52863984 snaprealm=0x55c2cb6f6800 f(v0 m2023-05-19T07:48:19.242750+0000 2=0+2) n(v3975651 rc2024-05-23T04:24:01.869531+0000 b1643694677098651 3286549=2282753+1003796)/n(v3975650 rc2024-05-23T04:24:00.454513+0000 b1643694677098651 3286548=2282752+1003796) old_inodes=27 (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55c2cb5fc680]
2024-05-23T05:58:37.662+0000 7fe8f354f700 20 mds.0.cache.ino(0x10000000be1) decode_snap_blob snaprealm(0x10000000be1 seq 1 lc 0 cr 1 cps 2 snaps={} past_parent_snaps= 0x55c2cb6f6800)
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x10000000be1 [...3556,head] /volumes/group/volume/uuid/ auth v52863984 snaprealm=0x55c2cb6f6800 f(v0 m2023-05-19T07:48:19.242750+0000 2=0+2) n(v3975651 rc2024-05-23T04:24:01.869531+0000 b1643694677098651 3286549=2282753+1003796)/n(v3975650 rc2024-05-23T04:24:00.454513+0000 b1643694677098651 3286548=2282752+1003796) old_inodes=27 (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55c2cb5fc680]
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x10000000be1
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x10000000be1 /volumes/group/volume/uuid/ [2,head] auth v=59363721 cv=0/0 state=1610612736 f(v0 m2023-05-19T07:48:19.242750+0000 2=0+2) n(v3975651 rc2024-05-23T04:24:01.869531+0000 b1643694677098651 3286548=2282753+1003795) hs=2+0,ss=0+0 dirty=2 | child=1 dirty=1 0x55c2d00d8480]
2024-05-23T05:58:37.662+0000 7fe8f354f700 20 mds.0.cache.dir(0x10000000be1) lookup_exact_snap (head, 'a')
2024-05-23T05:58:37.662+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume/uuid/a [2,head] auth (dversion lock) v=59363720 ino=0x20000000444 state=1610612736 | inodepin=1 dirty=1 0x55c2d0df5680]
2024-05-23T05:58:37.662+0000 7fe8f354f700 15 mds.0.cache.ino(0x20000000444) maybe_export_pin update=0 [inode 0x20000000444 [...2,head] /volumes/group/volume/uuid/a/ auth v59363720 snaprealm=0x55c2cb6f6a00 f(v17 m2024-02-23T10:30:15.832409+0000 8=0+8) n(v4546035 rc2024-05-23T04:24:01.869531+0000 b1643687403850517 3278875=2275634+1003241)/n(v4546034 rc2024-05-23T04:24:01.869531+0000 b1643687403850517 3278875=2275634+1003241) old_inodes=27 (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55c2cb5fcc00]
2024-05-23T05:58:37.662+0000 7fe8f354f700 20 mds.0.cache.ino(0x20000000444) decode_snap_blob snaprealm(0x20000000444 seq 1 lc 0 cr 1 cps 2 snaps={} past_parent_snaps= 0x55c2cb6f6a00)
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x20000000444 [...3556,head] /volumes/group/volume/uuid/a/ auth v59363720 snaprealm=0x55c2cb6f6a00 f(v17 m2024-02-23T10:30:15.832409+0000 8=0+8) n(v4546035 rc2024-05-23T04:24:01.869531+0000 b1643687403850517 3278875=2275634+1003241)/n(v4546034 rc2024-05-23T04:24:01.869531+0000 b1643687403850517 3278875=2275634+1003241) old_inodes=27 (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55c2cb5fcc00]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x20000000444
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x20000000444 /volumes/group/volume/uuid/a/ [2,head] auth v=96656134 cv=0/0 state=1610612736 f(v17 m2024-02-23T10:30:15.832409+0000 8=0+8) n(v4546035 rc2024-05-23T04:24:01.869531+0000 b1643687403850517 3278874=2275634+1003240) hs=3+0,ss=0+0 dirty=3 | child=1 dirty=1 0x55c2d00d8900]
2024-05-23T05:58:37.663+0000 7fe8f354f700 20 mds.0.cache.dir(0x20000000444) lookup_exact_snap (head, 'b')
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume/uuid/a/b [2,head] auth (dversion lock) v=96656133 ino=0x5000002c5c0 state=1610612736 | inodepin=1 dirty=1 0x55c2d0df5b80]
2024-05-23T05:58:37.663+0000 7fe8f354f700 15 mds.0.cache.ino(0x5000002c5c0) maybe_export_pin update=0 [inode 0x5000002c5c0 [...2,head] /volumes/group/volume/uuid/a/b/ auth v96656133 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 71=50+21)/n(v4000502 rc2024-05-23T04:24:01.869531+0000 b51936876471 69=48+21) old_inodes=28 (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55c2cb5fd180]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x5000002c5c0 [...3557,head] /volumes/group/volume/uuid/a/b/ auth v96656133 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 71=50+21)/n(v4000502 rc2024-05-23T04:24:01.869531+0000 b51936876471 69=48+21) old_inodes=28 (inest sync dirty) (iversion lock) | dirtyscattered=1 dirfrag=1 dirty=1 0x55c2cb5fd180]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x5000002c5c0
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.cache.dir(0x5000002c5c0) _mark_dirty (already dirty) [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096110 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80] version 881096110
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay      clean nestinfo on [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096110 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay      clean fragstat on [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096110 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096110 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.663+0000 7fe8f354f700 20 mds.0.cache.dir(0x5000002c5c0) lookup_exact_snap (head, 'irrelevant')
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume/uuid/a/b/irrelevant [2,head] auth (dversion lock) v=881096109 ino=0x300000007ec state=1610612736 | inodepin=1 dirty=1 0x55c2cffb2500]
2024-05-23T05:58:37.663+0000 7fe8f354f700 15 mds.0.cache.ino(0x300000007ec) maybe_export_pin update=0 [inode 0x300000007ec [...2,head] /volumes/group/volume/uuid/a/b/irrelevant/ auth v881096109 f(v0 m2024-05-23T04:25:39.956888+0000 2=2+0) n(v0 rc2024-05-23T04:25:39.956888+0000 b47 3=2+1)/n(v0 rc2024-05-23T04:23:54.469203+0000 b47 2=1+1) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2cc093600]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x300000007ec [...3557,head] /volumes/group/volume/uuid/a/b/irrelevant/ auth v881096109 f(v0 m2024-05-23T04:25:39.956888+0000 2=2+0) n(v0 rc2024-05-23T04:25:39.956888+0000 b47 3=2+1)/n(v0 rc2024-05-23T04:23:54.469203+0000 b47 2=1+1) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2cc093600]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x300000007ec
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x300000007ec /volumes/group/volume/uuid/a/b/irrelevant/ [2,head] auth v=43439960 cv=0/0 state=1073741824 f(v0 m2024-05-23T04:25:39.956888+0000 2=2+0) n(v0 rc2024-05-23T04:25:39.956888+0000 b47 2=2+0) hs=0+1,ss=0+0 dirty=1 | child=1 0x55c2cfe99b00]
2024-05-23T05:58:37.663+0000 7fe8f354f700 20 mds.0.cache.dir(0x300000007ec) lookup_exact_snap (head, 'irrelevant_file')
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [3557,head] had [dentry #0x1/volumes/group/volume/uuid/a/b/irrelevant/irrelevant_file [3557,head] auth NULL (dversion lock) v=43439959 ino=(nil) state=1610612800|bottomlru | inodepin=0 dirty=1 0x55c2d075bb80]
2024-05-23T05:58:37.663+0000 7fe8f354f700 12 mds.0.cache.dir(0x300000007ec) link_primary_inode [dentry #0x1/volumes/group/volume/uuid/a/b/irrelevant/irrelevant_file [3557,head] auth NULL (dversion lock) v=43439959 ino=(nil) state=1610612800|bottomlru | inodepin=0 dirty=1 0x55c2d075bb80] [inode 0x10003bb2da0 [3557,head] #10003bb2da0 auth v43439959 s=0 n(v0 1=1+0) (iversion lock) cr={1264134=0-4194304@3556} 0x55c2ccf6d700]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay added [inode 0x10003bb2da0 [3557,head] /volumes/group/volume/uuid/a/b/irrelevant/irrelevant_file auth v43439959 s=0 n(v0 1=1+0) (iversion lock) cr={1264134=0-4194304@3556} 0x55c2ccf6d700]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.cache.ino(0x10003bb2da0) mark_dirty_parent
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay noting opened inode [inode 0x10003bb2da0 [3557,head] /volumes/group/volume/uuid/a/b/irrelevant/irrelevant_file auth v43439959 DIRTYPARENT s=0 n(v0 1=1+0) (iversion lock) cr={1264134=0-4194304@3556} | dirtyparent=1 dirty=1 0x55c2ccf6d700]
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay inotable tablev 125156 <= table 125156
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay sessionmap v 115480684 - 1 == table 115480683 prealloc [] used 0x10003bb2da0
2024-05-23T05:58:37.663+0000 7fe8f354f700 20 mds.0.journal  (session prealloc [0x10000000018~0x1e2,0x10000000203~0x6,0x1000000020a~0x3,0x10000000273~0x4,0x10000041d70~0x2,0x100001abfcc~0x2,0x10003bb2d9e~0x1,0x10003bb2da0~0x28,0x10003bb31b2~0x1f5])
2024-05-23T05:58:37.663+0000 7fe8f354f700 20 mds.0.sessionmap replay_dirty_session s=0x55c2cb5e4000 name=client.1264134 v=115480683
2024-05-23T05:58:37.663+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay request client.1264134:468991483 trim_to 468991450

...

# event which introduced corruption, [10000000be1,head] loaded from replay of a cap update
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.log _replay 1141413951046570~56344 / 1141414175965184 2024-05-23T04:25:39.982580+0000: EUpdate cap update [metablob 0x5000002c5c0, 2 dirs]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay 2 dirlumps by unknown.0
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x5000002c5c0
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.cache.dir(0x5000002c5c0) _mark_dirty (already dirty) [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096120 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80] version 881096120
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay      clean nestinfo on [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096120 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay      clean fragstat on [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096120 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x5000002c5c0 /volumes/group/volume/uuid/a/b/ [2,head] auth v=881096120 cv=0/0 state=1610612736 f(v19 m2023-12-18T13:46:55.280043+0000 20=0+20) n(v4000503 rc2024-05-23T04:25:39.823747+0000 b51936876471 70=50+20) hs=20+0,ss=0+0 dirty=20 | child=1 dirty=1 0x55c2d00d8d80]
2024-05-23T05:58:37.664+0000 7fe8f354f700 20 mds.0.cache.dir(0x5000002c5c0) lookup_exact_snap (head, 'c')
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [dentry #0x1/volumes/group/volume/uuid/a/b/c [2,head] auth (dversion lock) v=881096119 ino=0x5000002ca75 state=1610612736 | inodepin=1 dirty=1 0x55c2d0df5180]
2024-05-23T05:58:37.664+0000 7fe8f354f700 15 mds.0.cache.ino(0x5000002ca75) maybe_export_pin update=0 [inode 0x5000002ca75 [...2,head] /volumes/group/volume/uuid/a/b/c/ auth v881096119 f(v1440560 m2024-05-23T04:25:39.917887+0000 6=6+0) n(v438633 rc2024-05-23T04:25:39.987888+0000 b6020469579 7=6+1)/n(v438633 rc2024-05-23T04:23:51.363154+0000 b4479483356 6=5+1) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2ccfbe100]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 0x5000002ca75 [...3556,head] /volumes/group/volume/uuid/a/b/c/ auth v881096119 f(v1440560 m2024-05-23T04:25:39.917887+0000 6=6+0) n(v438633 rc2024-05-23T04:25:39.987888+0000 b6020469579 7=6+1)/n(v438633 rc2024-05-23T04:23:51.363154+0000 b4479483356 6=5+1) old_inodes=27 (iversion lock) | dirfrag=1 dirty=1 0x55c2ccfbe100]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay dir 0x5000002ca75
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay updated dir [dir 0x5000002ca75 /volumes/group/volume/uuid/a/b/c/ [2,head] auth v=46254611 cv=0/0 state=1610612736 f(v1440560 m2024-05-23T04:25:39.917887+0000 6=6+0) n(v438633 rc2024-05-23T04:25:39.987888+0000 b6020469579 6=6+0) hs=3+0,ss=0+0 dirty=3 | child=1 dirty=1 0x55c2cfe98d80]
2024-05-23T05:58:37.664+0000 7fe8f354f700 20 mds.0.cache.dir(0x5000002ca75) lookup_exact_snap (head, 'interesting')
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [10000000be1,head] had [dentry #0x1/volumes/group/volume/uuid/a/b/c/interesting [3557,head] auth (dversion lock) v=46254610 ino=0x10003bb2d9f state=1610612736 | inodepin=1 dirty=1 0x55c2cbef1900]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay for [10000000be1,head] had [inode 0x10003bb2d9f [10000000be1,head] /volumes/group/volume/uuid/a/b/c/interesting auth v46254610 DIRTYPARENT s=1540986223 n(v0 rc2024-05-23T04:25:39.987888+0000 b1540986223 1=1+0) (iversion lock) cr={1264134=0-3082813440@3556} | dirtyparent=1 dirty=1 0x55c2cf3a8c00]
2024-05-23T05:58:37.664+0000 7fe8f354f700 10 mds.0.journal EMetaBlob.replay flush client.1264134:46042196 trim_to 46042196

The key observation here is that the damage is introduced during cap update. I've isolated the most likely cause to a corrupt "follows":

https://github.com/ceph/ceph/blob/75c41667cd6ff7c6bd44ca37baca19254346cc14/src/messages/MClientCaps.h#L104-L105

which is used with mild handling / error checking in:

https://github.com/ceph/ceph/blob/75c41667cd6ff7c6bd44ca37baca19254346cc14/src/mds/Locker.cc#L3323

which eventually calls

https://github.com/ceph/ceph/blob/75c41667cd6ff7c6bd44ca37baca19254346cc14/src/mds/Locker.cc#L3601

and then

https://github.com/ceph/ceph/blob/75c41667cd6ff7c6bd44ca37baca19254346cc14/src/mds/Locker.cc#L4168

and then eventually

https://github.com/ceph/ceph/blob/75c41667cd6ff7c6bd44ca37baca19254346cc14/src/mds/MDCache.cc#L1702-L1709

To verify this can cause the corruption, I've patched the client to always produce a bogus follows:

 src/client/Client.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/src/client/Client.cc b/src/client/Client.cc
index fffbd11f6007..6d9d270f204c 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -3803,8 +3803,10 @@ void Client::send_cap(Inode *in, MetaSession *session, Cap *cap,

   snapid_t follows = 0;

-  if (flush)
+  if (flush) {
     follows = in->snaprealm->get_snap_context().seq;
+    follows = snapid_t(CEPH_MAXSNAP);
+  }

   auto m = make_message<MClientCaps>(op,
                                   in->ino,

this causes the crash:

2024-06-01T00:42:22.278+0000 7f705d789640  1 -- [v2:127.0.0.1:6862/3009947561,v1:127.0.0.1:6863/3009947561] <== client.747304 127.0.0.1:0/3705246724 47 ==== client_caps(update ino 0x100000005e3 23 seq 5 tid 2 caps=pAsLsXsFsc dirty=Fw wanted=- follows fffffffffffffffd size 4096/4194304 ts 3/0 mtime 2024-06-01T00:42:16.753484+0000 ctime 2024-06-01T00:42:16.753484+0000 change_attr 5) ==== 260+0+0 (crc 0 0 0) 0x55cc1b5ca380 con 0x55cc1b478480
2024-06-01T00:42:22.278+0000 7f705d789640  7 mds.0.locker handle_client_caps on 0x100000005e3 tid 2 follows fffffffffffffffd op update flags 0x0
...
2024-06-01T00:42:22.278+0000 7f705d789640 20 mds.0.journal EMetaBlob::add_dir_context(0x55cc16519200) reached unambig auth subtree, don't need  at [dir 0x1 / [2,head] auth pv=40 v=38 cv=1/1 REP dir_auth=0 ap=1+1 state=1610874881|complete f(v0 m2024-06-01T00:30:00.965065+0000 1=0+1) n(v3 rc2024-06-01T00:42:16.746911+0000 b120 rs1 8=3+5) hs=1+0,ss=0+0 dirty=1 | child=1 subtree=1 dirty=1 authpin=1 0x55cc16519200]
2024-06-01T00:42:22.278+0000 7f705d789640 20 mds.0.journal EMetaBlob::add_dir_context final:
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/volumes [2,head] auth (dversion lock) pv=39 v=37 ino=0x10000000000 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=0 0x55cc1b349400]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows 2 < first on [inode 0x10000000000 [...3,head] /volumes/ auth v37 pv39 ap=1 DIRTYPARENT f(v0 m2024-06-01T00:30:00.991375+0000 2=1+1) n(v2 rc2024-06-01T00:42:16.746911+0000 b120 rs1 8=3+5) old_inodes=1 (inest lock w=1) (ifile excl) (iversion lock w=1) caps={747304=pAsLsXsFsx/-@5},l=747304 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55cc1b4b3180]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/volumes/_nogroup [2,head] auth (dversion lock) pv=37 v=35 ino=0x10000000001 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 clientlease=0 0x55cc1b349900]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows 2 < first on [inode 0x10000000001 [...3,head] /volumes/_nogroup/ auth v35 pv37 ap=1 DIRTYPARENT f(v0 m2024-06-01T00:30:00.970164+0000 1=0+1) n(v2 rc2024-06-01T00:42:16.746911+0000 b120 rs1 6=2+4) old_inodes=1 (inest lock w=1) (ifile excl) (iversion lock w=1) caps={747304=pAsLsXsFsx/-@5},l=747304 | dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=1 dirtyrstat=0 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55cc1b2dc100]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/volumes/_nogroup/1 [2,head] auth (dversion lock) pv=42 v=40 ino=0x10000000002 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55cc1b348000]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows 2 < first on [inode 0x10000000002 [...3,head] /volumes/_nogroup/1/ auth v40 pv42 ap=1 snaprealm=0x55cc1b33efc0 DIRTYPARENT f(v0 m2024-06-01T00:30:00.988878+0000 2=1+1) n(v0 rc2024-06-01T00:42:16.746911+0000 b120 rs1 5=2+3) old_inodes=1 (inest lock w=1) (ifile excl) (iversion lock w=1) caps={747304=pAsLsXsFsx/-@6},l=747304 | request=0 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55cc1b46b080]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a [2,head] auth (dversion lock) pv=34 v=32 ino=0x10000000003 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55cc1b348a00]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows 2 < first on [inode 0x10000000003 [...3,head] /volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a/ auth v32 pv34 ap=1 DIRTYPARENT f(v0 m2024-06-01T00:30:29.359161+0000 1=0+1) n(v1 rc2024-06-01T00:42:16.746911+0000 3=1+2) old_inodes=1 (inest lock w=1 dirty) (ifile excl) (iversion lock w=1) caps={747304=pAsLsXsFsx/-@5},l=747304 | dirtyscattered=1 request=0 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55cc1b47d080]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows head on [dentry #0x1/volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a/deep [2,head] auth (dversion lock) pv=21 v=19 ino=0x100000005e2 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55cc1b588780]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows 2 < first on [inode 0x100000005e2 [...3,head] /volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a/deep/ auth v19 pv21 ap=1 DIRTYPARENT f(v0 m2024-06-01T00:30:45.389723+0000 1=1+0) n(v0 rc2024-06-01T00:42:16.746911+0000 2=1+1) old_inodes=1 (inest lock w=1) (ifile excl) (iversion lock w=1) caps={747304=pAsLsXsFs/-@4},l=747304 | request=0 lock=2 dirfrag=1 caps=1 dirtyparent=1 dirty=1 waiter=0 authpin=1 0x55cc1b306100]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry follows fffffffffffffffd on [dentry #0x1/volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a/deep/foo [3,head] auth (dversion lock) pv=18 v=16 ino=0x100000005e3 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55cc1b588f00]
2024-06-01T00:42:22.278+0000 7f705d789640 10  mds.0.cache.snaprealm(0x10000000002 seq 2 0x55cc1b33efc0) get_snaps 2 (seq 2 cached_seq 2)
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache journal_cow_dentry no snapshot follows fffffffffffffffd on [dentry #0x1/volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a/deep/foo [head,head] auth (dversion lock) pv=18 v=16 ino=0x100000005e3 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55cc1b588f00]
2024-06-01T00:42:22.278+0000 7f705d789640 -1 mds.0.cache.den(0x100000005e2 foo) newly corrupt dentry to be committed: [dentry #0x1/volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a/deep/foo [head,head] auth (dversion lock) pv=18 v=16 ino=0x100000005e3 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55cc1b588f00]
2024-06-01T00:42:22.278+0000 7f705d789640 10 mds.0.cache.dir(0x100000005e2) go_bad_dentry foo
2024-06-01T00:42:22.278+0000 7f705d789640 -1 log_channel(cluster) log [ERR] : MDS abort because newly corrupt dentry to be committed: [dentry #0x1/volumes/_nogroup/1/c3c307d2-ab8e-4712-a459-7eb74ee0f15a/deep/foo [head,head] auth (dversion lock) pv=18 v=16 ino=0x100000005e3 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55cc1b588f00]
2024-06-01T00:42:22.278+0000 7f705d789640  1 -- [v2:127.0.0.1:6862/3009947561,v1:127.0.0.1:6863/3009947561] --> [v2:127.0.0.1:40019/0,v1:127.0.0.1:40020/0] -- log(1 entries from seq 1 at 2024-06-01T00:42:22.280725+0000) -- 0x55cc1b35ee00 con 0x55cc1645b180
2024-06-01T00:42:22.282+0000 7f705d789640 -1 /home/pdonnell/ceph/src/mds/MDSRank.cc: In function 'void MDSRank::abort(std::string_view)' thread 7f705d789640 time 2024-06-01T00:42:22.280759+0000
/home/pdonnell/ceph/src/mds/MDSRank.cc: 952: ceph_abort_msg("abort() called")

(It's nice to see that, indeed, the MDS abort catches it!)

So now there are at least two things to do:

  • Patch the MDS to evict a client which submits a bogus "first", obviously don't process the cap update
  • Figure out how the client is producing these bogus values. CEPH_MAXSNAP (one less than CEPH_NOSNAP) causes the [head,head] issue we've seen sometimes in the past (and reproduced with the above patch). However, we usually see a parent directory inode. In this particular reproduction in a cluster, it was one greater [10000000be1,head] than the subvolume inode 0x10000000be0 (i.e. the directory marked with ceph.dir.subvolume). That would mean the client sent that subvol ino to the MDS for the "follows" field. It was incremented by one in the above code and used for "first". It's also worth noting we've usually seen this with kernel clients but apparently there has been a report of NFS-Ganesha on libcephfs also causing this crash, but that client may not actually be the cause. The origin of the bogus value may be from the MDS ultimately but the durable corruption is introduced here.
Actions #8

Updated by Konstantin Shalygin about 1 year ago

  • Backport changed from squid,reef,quincy to squid,reef
Actions #9

Updated by Patrick Donnelly 9 months ago

  • Target version deleted (v20.0.0)
Actions

Also available in: Atom PDF