Project

General

Profile

Actions

Bug #65020

open

qa: Scrub error on inode 0x1000000356c (/volumes/qa/sv_0/2f8f6bb4-3ea9-47a0-bd79-a0f50dc149d5/client.0/tmp/clients/client7/~dmtmp/PARADOX) see mds.b log and `damage ls` output for details" in cluster log

Added by Patrick Donnelly about 2 years ago. Updated about 1 month ago.

Status:
Fix Under Review
Priority:
Urgent
Category:
fsck/damage handling
Target version:
% Done:

0%

Source:
Q/A
Backport:
tentacle,squid,reef
Regression:
No
Severity:
2 - major
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
MDS
Labels (FS):
scrub
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:


Related issues 1 (0 open1 closed)

Has duplicate CephFS - Bug #66126: Scrub error on inode ... with quiescerDuplicateLeonid Usov

Actions
Actions #1

Updated by Patrick Donnelly about 2 years ago

Maybe also related: https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612993

"2024-03-20T21:19:00.179790+0000 mds.c (mds.0) 1 : cluster [ERR] dir 0x10000000000 object missing on disk; some files may be lost (/dir)" in cluster log

Actions #2

Updated by Venky Shankar almost 2 years ago

  • Assignee set to Milind Changire
Actions #3

Updated by Venky Shankar almost 2 years ago

  • Status changed from New to Triaged
Actions #4

Updated by Milind Changire almost 2 years ago

Patrick Donnelly wrote in #note-1:

Maybe also related: https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612993

"2024-03-20T21:19:00.179790+0000 mds.c (mds.0) 1 : cluster [ERR] dir 0x10000000000 object missing on disk; some files may be lost (/dir)" in cluster log

This job doesn't have the log in the ignore list.

Actions #5

Updated by Venky Shankar almost 2 years ago

Actions #6

Updated by Milind Changire almost 2 years ago

Venky Shankar wrote in #note-5:

Isn't this same as: https://tracker.ceph.com/issues/48562 ?

"object missing on disk" is not the issue here

Actions #7

Updated by Milind Changire almost 2 years ago

Patrick Donnelly wrote:

https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612910/

and many others.

More fallout after https://github.com/ceph/ceph/pull/55455 was merged.

This is the scrub error for the PARADOX item:

"raw_stats":{
    "checked":true,
    "passed":false,
    "read_ret_val":0,
    "ondisk_value.dirstat":"f(v0 m2024-03-20T19:09:04.770954+0000 25=25+0)",
    "ondisk_value.rstat":"n(v0 rc2024-03-20T19:09:04.804841+0000 b5230239 25=24+1)",
    "memory_value.dirstat":"f(v69 m2024-03-20T19:09:04.804841+0000 24=24+0)",
    "memory_value.rstat":"n(v7 rc2024-03-20T19:09:04.804841+0000 b5230239 25=24+1)",
    "error_str":"freshly-calculated rstats don't match existing ones" 
}

So the ondisk and memory dirstat (apart from the timestamp) seem to be off by 1

Actions #8

Updated by Milind Changire almost 2 years ago

Milind Changire wrote in #note-6:

Venky Shankar wrote in #note-5:

Isn't this same as: https://tracker.ceph.com/issues/48562 ?

"object missing on disk" is not the issue here

Or at least the "object missing on disk" log doesn't show up in the cluster logs for the job listed in the description.

Actions #9

Updated by Venky Shankar almost 2 years ago

Milind Changire wrote in #note-8:

Milind Changire wrote in #note-6:

Venky Shankar wrote in #note-5:

Isn't this same as: https://tracker.ceph.com/issues/48562 ?

"object missing on disk" is not the issue here

Or at least the "object missing on disk" log doesn't show up in the cluster logs for the job listed in the description.

OK. So that was a separate issue which is fixed by adding the string to ignorelist. It got mentioned in https://tracker.ceph.com/issues/65020#note-1, so I inferred if this is the same problem.

Actions #10

Updated by Venky Shankar almost 2 years ago

Actions #12

Updated by Milind Changire almost 2 years ago

As far as I checked, none of the parent inodes had the replicated flag set. Anybody else taking up this for RCA might want to double-check this.

Actions #13

Updated by Kotresh Hiremath Ravishankar almost 2 years ago · Edited

I debugged the following run.

https://pulpito.ceph.com/vshankar-2024-05-10_15:47:15-fs-wip-vshankar-testing-20240509.114851-debug-testing-default-smithi/7701381/

2024-05-10T22:03:26.765+0000 7fbd570ad700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=4 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest sync->mix g=1) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value":"(2)0x1000000d4bd:[<0x1000000c5e4/amdtee v1776>,<0x1000000c5db/firmware v6359>,<0x1000000c5da/multiple_rsync_payload.215536 v769>,<0x1000000c5d9/payload.1 v575>,<0x10000000005/tmp v1687>,<0x100000001fb/client.0 v1493>,<0x100000001fa/38c2707a-ec61-441f-9927-99af8ad21869 v1420>,<0x10000000001/sv_1 v1385>,<0x10000000000/qa v1353>,<0x1/volumes v1298>]//[]","memoryvalue":"(2)0x1000000d4bd:[<0x1000000c5e4/amdtee v1776>,<0x1000000c5db/firmware v6359>,<0x1000000c5da/multiple_rsync_payload.215536 v804>,<0x1000000c5d9/payload.1 v657>,<0x10000000005/tmp v1741>,<0x100000001fb/client.0 v1544>,<0x100000001fa/38c2707a-ec61-441f-9927-99af8ad21869 v1466>,<0x10000000001/sv_1 v1433>,<0x10000000000/qa v1400>,<0x1/volumes v1346>]//[]","error_str":""},"raw_stats":{"checked":true,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0)","ondisk_value.rstat":"n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 3=2+1)","memory_value.dirstat":"f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)","memory_value.rstat":"n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1)","error_str":"freshly-calculated rstats don't match existing ones"},"return_code":0}

2024-05-10T22:03:26.765+0000 7fbd570ad700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=4 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest sync->mix g=1) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value":"(2)0x1000000d4bd:[<0x1000000c5e4/amdtee v1776>,<0x1000000c5db/firmware v6359>,<0x1000000c5da/multiple_rsync_payload.215536 v769>,<0x1000000c5d9/payload.1 v575>,<0x10000000005/tmp v1687>,<0x100000001fb/client.0 v1493>,<0x100000001fa/38c2707a-ec61-441f-9927-99af8ad21869 v1420>,<0x10000000001/sv_1 v1385>,<0x10000000000/qa v1353>,<0x1/volumes v1298>]//[]","memoryvalue":"(2)0x1000000d4bd:[<0x1000000c5e4/amdtee v1776>,<0x1000000c5db/firmware v6359>,<0x1000000c5da/multiple_rsync_payload.215536 v804>,<0x1000000c5d9/payload.1 v657>,<0x10000000005/tmp v1741>,<0x100000001fb/client.0 v1544>,<0x100000001fa/38c2707a-ec61-441f-9927-99af8ad21869 v1466>,<0x10000000001/sv_1 v1433>,<0x10000000000/qa v1400>,<0x1/volumes v1346>]//[]","error_str":""},"raw_stats":{"checked":true,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0)", "ondisk_value.rstat":"n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 3=2+1)","memory_value.dirstat":"f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)",
"memory_value.rstat":"n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1)","error_str":"freshly-calculated rstats don't match existing ones"},"return_code":0}

As quoted in other comments the dirstat is different between in memory and on disk for the directory inode 0x1000000d4bd:

"ondisk_value.dirstat":"f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0)",
"ondisk_value.rstat":"n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 3=2+1)",
"memory_value.dirstat":"f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)",
"memory_value.rstat":"n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1)",

I tried catching the timeline of changes happend to the inode 0x1000000d4bd on mds.0

0. Inode created at "2024-05-10T21:56:46.561+0000"

2024-05-10T21:56:46.561+0000 7fbd570ad700  4 mds.0.server handle_client_request client_request(client.25051:139109 mkdir owner_uid=0, owner_gid=0 #0x1000000c5e4/amdtee 2024-05-10T21:56:46.560624+0000 caller_uid=0, caller_gid=0{0,})
2024-05-10T21:56:46.561+0000 7fbd570ad700  7 mds.0.server dispatch_client_request client_request(client.25051:139109 mkdir owner_uid=0, owner_gid=0 #0x1000000c5e4/amdtee 2024-05-10T21:56:46.560624+0000 caller_uid=0, caller_gid=0{0,})
2024-05-10T21:56:46.561+0000 7fbd570ad700 10 mds.0.server prepare_new_inode used_prealloc 0x1000000d4bd (1491 left)
2024-05-10T21:56:46.561+0000 7fbd570ad700 10 mds.0.server prepare_new_inode [inode 0x1000000d4bd [...2,head] #1000000d4bd/ auth v1 f() n() 0x562579b59700]

1. export started "2024-05-10T21:56:51.274+0000" (exported to mds.1)

split_ad:2024-05-10T21:56:51.274+0000 7fbd558aa700  4 mds.0.mig export_dir Starting export to mds.1 [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth v=5 cv=0/0 state=1610874881|complete f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | child=1 dirty=1 authpin=0 0x56257cc70880]
split_ad:2024-05-10T21:56:51.274+0000 7fbd558aa700  7 mds.0.mig dispatch_export_dir request(mds.0:111214 nref=3) [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth v=5 cv=0/0 ap=1+0 state=1610875905|complete|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | request=1 child=1 dirty=1 authpin=1 0x56257cc70880]
split_ad:2024-05-10T21:56:51.291+0000 7fbd570ad700  7 mds.0.mig dispatch_export_dir request(mds.0:111214 nref=3) [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth v=5 cv=0/0 ap=1+0 state=1610875905|complete|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | request=1 child=1 dirty=1 authpin=1 0x56257cc70880]
split_ad:2024-05-10T21:56:51.318+0000 7fbd570ad700  7 mds.0.mig handle_export_discover_ack from mds.1 on [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth v=5 cv=0/0 ap=2+0 state=1610875909|complete|freezingtree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=1 request=1 child=1 dirty=1 waiter=1 authpin=1 0x56257cc70880]
split_ad:2024-05-10T21:56:51.318+0000 7fbd570ad700  7 mds.0.mig export_frozen [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth v=5 cv=0/0 dir_auth=0,0 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 dirty=1 waiter=0 authpin=0 0x56257cc70880]
split_ad:2024-05-10T21:56:51.339+0000 7fbd570ad700  7 mds.0.mig handle_export_prep_ack [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,0 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 0x56257cc70880]
split_ad:2024-05-10T21:56:51.383+0000 7fbd570ad700  7 mds.0.mig export_sessions_flushed [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,0 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 0x56257cc70880]
split_ad:2024-05-10T21:56:51.383+0000 7fbd570ad700  7 mds.0.mig export_go [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,0 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 0x56257cc70880] to 1
split_ad:2024-05-10T21:56:51.426+0000 7fbd508a0700  7 mds.0.mig export_go_synced [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,0 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 0x56257cc70880] to 1
split_ad:2024-05-10T21:56:51.426+0000 7fbd508a0700  7 mds.0.mig encode_export_dir [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,1 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=1 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 0x56257cc70880] 1 head items
split_ad:2024-05-10T21:56:51.440+0000 7fbd508a0700 10 mds.0.mig show_exporting  exporting to 1: (7) exporting 0x1000000d4bd [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,1 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=0 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 tempexporting=1 0x56257cc70880]
split_ad:2024-05-10T21:56:51.442+0000 7fbd508a0700 10 mds.0.mig show_exporting  exporting to 1: (7) exporting 0x1000000d4bd [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,1 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=0 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 tempexporting=1 0x56257cc70880]
split_ad:2024-05-10T21:56:51.449+0000 7fbd508a0700 10 mds.0.mig show_exporting  exporting to 1: (7) exporting 0x1000000d4bd [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0 dir_auth=0,1 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=0 request=0 child=1 frozen=1 subtree=1 replicated=1 dirty=1 waiter=0 authpin=0 tempexporting=1 0x56257cc70880]

2. on disk scrub at "2024-05-10T21:57:22.986+000"

2024-05-10T21:57:22.986+0000 7fbd570ad700 10 mds.0.cache.ino(0x1000000d4bd) scrub starting validate_disk_state on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v430 ap=2 DIRTYPARENT RANDEPHEMERALPIN f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 2=1+1) (inest mix scatter_wanted) (ipolicy sync r=1) (ifile mix) (iquiesce lock x=1 by request(mds.0:114732 nref=3)) caps={25051=pAsLsXs/pAsLsXsFsx@4} | ptrwaiter=0 request=1 lock=2 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700]
split_ad:2024-05-10T21:57:22.986+0000 7fbd570ad700 10 mds.0.cache.ino(0x1000000d4bd) auth_pin by 0x56257f247200 on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v430 ap=3 DIRTYPARENT RANDEPHEMERALPIN f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 2=1+1) (inest mix scatter_wanted) (ipolicy sync r=1) (ifile mix) (iquiesce lock x=1 by request(mds.0:114732 nref=3)) caps={25051=pAsLsXs/pAsLsXsFsx@4} | ptrwaiter=0 request=1 lock=2 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700] now 3
split_ad:2024-05-10T21:57:22.986+0000 7fbd570ad700 20 mds.0.scrubstack dequeue [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v430 ap=3 DIRTYPARENT RANDEPHEMERALPIN f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 2=1+1) (inest mix scatter_wanted) (ipolicy sync r=1) (ifile mix) (iquiesce lock x=1 by request(mds.0:114732 nref=3)) caps={25051=pAsLsXs/pAsLsXsFsx@4} | ptrwaiter=0 request=1 lock=2 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700] from ScrubStack
split_ad:2024-05-10T21:57:22.987+0000 7fbd508a0700 20 mds.0.cache.ino(0x1000000d4bd) ondisk_read_retval: 0
split_ad:2024-05-10T21:57:22.987+0000 7fbd508a0700 20 mds.0.cache.ino(0x1000000d4bd) decode failed; forcing backtrace as passed since inode is dirty parent
split_ad:2024-05-10T21:57:22.987+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) scrub: inotable ino = 0x1000000d4bd
split_ad:2024-05-10T21:57:22.987+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) scrub: inotable free says 0

3. rmdir at "2024-05-10T22:10:36.550+0000 "

2024-05-10T22:10:36.550+0000 7fbd570ad700  7 mds.0.server dispatch_client_request client_request(client.25051:332186 rmdir #0x1000000c5e4/amdtee 2024-05-10T22:10:36.529620+0000 caller_uid=0, caller_gid=0{0,})
2024-05-10T22:10:36.550+0000 7fbd570ad700  7 mds.0.server handle_client_rmdir on [dentry #0x1/volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee [2,head] auth{1=1} (dn xlock x=1 by request(client.25051:332186 nr
ef=7 cr=0x562573726e00)) (dversion lock w=1 last_client=25051) v=4020 ap=2 ino=0x1000000d4bd state=1610612736 | request=1 lock=2 fragmenting=0 inodepin=1 replicated=1 dirty=1 waiter=0 authpin=1 clientlease=0 0x5625753a3680]

So at the time of scrub, the inode was no longer the auth.

Venky,
could it be similar issue as the backtrace issue you fixed ?

Thanks and Regards,
Kotresh H R

Actions #14

Updated by Leonid Usov almost 2 years ago

  • Related to Bug #66126: Scrub error on inode ... with quiescer added
Actions #15

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

  • Assignee changed from Milind Changire to Kotresh Hiremath Ravishankar
Actions #16

Updated by Leonid Usov almost 2 years ago

  • Related to deleted (Bug #66126: Scrub error on inode ... with quiescer)
Actions #17

Updated by Leonid Usov almost 2 years ago

  • Has duplicate Bug #66126: Scrub error on inode ... with quiescer added
Actions #18

Updated by Venky Shankar almost 2 years ago

Kotresh Hiremath Ravishankar wrote in #note-13:

I debugged the following run.

https://pulpito.ceph.com/vshankar-2024-05-10_15:47:15-fs-wip-vshankar-testing-20240509.114851-debug-testing-default-smithi/7701381/

[...]

As quoted in other comments the dirstat is different between in memory and on disk for the directory inode 0x1000000d4bd:

[...]

I tried catching the timeline of changes happend to the inode 0x1000000d4bd on mds.0

0. Inode created at "2024-05-10T21:56:46.561+0000"
[...]

1. export started "2024-05-10T21:56:51.274+0000" (exported to mds.1)
[...]

2. on disk scrub at "2024-05-10T21:57:22.986+000"
[...]

3. rmdir at "2024-05-10T22:10:36.550+0000 "
[...]

So at the time of scrub, the inode was no longer the auth.

Venky,
could it be similar issue as the backtrace issue you fixed ?

As discussed in standup - the divergent backtrace issue was related to ancestors of the inode being scrubbed being replicas. For this tracker, I think the scrub error is related to the rstat mismatch of the inode under scrub which I think has to be auth to the MDS.

Actions #19

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

Some more info from the mds log on the particular inode.

1. Previous scrub of the same directory inode is treated as passed because the inode is marked dirty

...
2024-05-10T22:02:26.500+0000 7fbd570ad700 20 mds.0.scrubstack scrub_dir_inode_final [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 randepin 0x562579b59700]
2024-05-10T22:02:26.500+0000 7fbd570ad700 10 mds.0.cache.ino(0x1000000d4bd) scrub starting validate_disk_state on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=1 randepin 0x562579b59700]
...
...
2024-05-10T22:02:26.500+0000 7fbd570ad700 20 mds.0.cache.ino(0x1000000d4bd) validating a dirty CInode; results will be inconclusive
2024-05-10T22:02:26.500+0000 7fbd570ad700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] --> [v2:172.21.15.70:6816/2021849901,v1:172.21.15.70:6817/2021849901] -- osd_op(unknown.0.14:358137 2.1d 2:bc94edf9:::1000000d4bd.00000000:head [getxattr parent in=6b] snapc 0=[] ondisk+read+known_if_redirected+full_force+supports_pool_eio e86) -- 0x5625758f3400 con 0x562573294d80
...
2024-05-10T22:02:26.500+0000 7fbd570ad700 10 mds.0.scrubstack scrub_dir_inode done
2024-05-10T22:02:26.500+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs dir inode, done
2024-05-10T22:02:26.500+0000 7fbd570ad700 20 mds.0.scrubstack dequeue [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=1 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700] from ScrubStack
        ..
        ..
2024-05-10T22:02:26.501+0000 7fbd508a0700 10 MDSContext::complete: 19MDSIOContextWrapper
2024-05-10T22:02:26.501+0000 7fbd508a0700 20 mds.0.cache.ino(0x1000000d4bd) ondisk_read_retval: 0
2024-05-10T22:02:26.501+0000 7fbd598b2700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] <== osd.9 v2:172.21.15.38:6826/1871599190 17416 ==== osd_op_reply(358135 1000000d51b.00000000 [getxattr] v0'0 uv0 ondisk = -2 ((2) No such file or directory)) ==== 164+0+0 (crc 0 0 0) 0x56257de9cf00 con 0x562573295200
2024-05-10T22:02:26.501+0000 7fbd508a0700 20 mds.0.cache.ino(0x1000000d4bd) decode failed; forcing backtrace as passed since inode is dirty parent
2024-05-10T22:02:26.501+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) scrub: inotable ino = 0x1000000d4bd
2024-05-10T22:02:26.501+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) scrub: inotable free says 0
2024-05-10T22:02:26.501+0000 7fbd508a0700  7 mds.0.cache request_start_internal request(mds.0:154686 nref=2) op 5383
        ..
        ..
        ..
        ..
2024-05-10T22:02:26.502+0000 7fbd570ad700 20 mds.0.cache.ino(0x1000000d4bd) raw stats most likely wont match since inode is dirty; please rerun scrub when system is stable; assuming passed for now;
2024-05-10T22:02:26.502+0000 7fbd570ad700 10 MDSContext::complete: 16C_InodeValidated
2024-05-10T22:02:26.502+0000 7fbd570ad700 10 mds.0.scrubstack _validate_inode_done scrub passed on inode [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payloa
d.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest sync->
mix g=1) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
2024-05-10T22:02:26.502+0000 7fbd570ad700 20 mds.0.cache.ino(0x1000000d4bd) scrub_finished
2024-05-10T22:02:26.502+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs: state=RUNNING

2. Preceding logs of the scrub failure event. Inode backtrace is stored to backend and clear_dirty_parent is called

2024-05-10T22:02:51.415+0000 7fbd558aa700 10 mds.0.cache | |________ 1     rep [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmwa
re/amdtee/ [2,head] rep@1.1 dir_auth=1 state=0 f(v0 m2024-05-10T21:57:37.873685+0000 2=2+0)/f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=0+0,ss=0+0 | ptrwai
ter=0 request=0 child=0 frozen=0 subtree=1 replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x56257cc70880]
        ..
        ..
2024-05-10T22:02:51.902+0000 7fbd528a4700 15 mds.0.journal try_to_expire waiting for storing backtrace on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 randepin 0x562579b59700]
2024-05-10T22:02:51.902+0000 7fbd528a4700 10 mds.0.cache.ino(0x1000000d4bd) _store_backtrace on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 randepin 0x562579b59700]
2024-05-10T22:02:51.902+0000 7fbd528a4700 10 mds.0.cache.ino(0x1000000d4bd) auth_pin by 0x562579b59700 on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=1 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700] now 1
2024-05-10T22:02:51.902+0000 7fbd528a4700 15 mds.0.cache.dir(0x1000000c5e4.001*) adjust_nested_auth_pins 1 on [dir 0x1000000c5e4.001* /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/ [2,head] auth{1=1} v=3316 cv=443/443 ap=0+6 state=1610612737|complete f(v0 m2024-05-10T21:57:34.192753+0000 47=27+20) n(v11 rc2024-05-10T21:57:55.845357+0000 b101120559 434=398+36) hs=47+36,ss=0+0 dirty=78 | child=1 frozen=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x56257c82c900] by 0x562579b59700 count now 0/6
2024-05-10T22:02:51.902+0000 7fbd528a4700 20 mds.0.cache.ino(0x1000000d4bd) _store_backtrace: no dirtypool or no old pools
        ..
        ..
2024-05-10T22:02:51.907+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) _commit_ops
        ..
        ..
2024-05-10T22:02:51.938+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) _stored_backtrace v 194
2024-05-10T22:02:51.938+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) auth_unpin by 0x562579b59700 on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 randepin 0x562579b59700] now 0
2024-05-10T22:02:51.938+0000 7fbd508a0700 15 mds.0.cache.dir(0x1000000c5e4.001*) adjust_nested_auth_pins -1 on [dir 0x1000000c5e4.001* /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/ [2,head] auth{1=1} v=3316 cv=443/443 ap=0+5 state=1610612737|complete f(v0 m2024-05-10T21:57:34.192753+0000 47=27+20) n(v11 rc2024-05-10T21:57:55.845357+0000 b101120559 434=398+36) hs=47+36,ss=0+0 dirty=78 | child=1 frozen=0 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 0x56257c82c900] by 0x562579b59700 count now 0/5
2024-05-10T22:02:51.938+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) clear_dirty_parent
     ...
     ..

3. Quiesce on the directory inode is complete and mark_clean on the inode is called.

2024-05-10T22:03:01.844+0000 7fbd570ad700 10 mds.0.cache dispatch_quiesce_inode: scheduling op to quiesce [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=1 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
        ..
        ..
2024-05-10T22:03:02.160+0000 7fbd570ad700 20 mds.0.cache dispatch_quiesce_inode request(mds.0:157604 nref=2) quiescing [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=1 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
        ..
        ..
        ..
2024-05-10T22:03:04.206+0000 7fbd570ad700 10 MDSContext::complete: 18C_MDS_RetryRequest
2024-05-10T22:03:04.206+0000 7fbd570ad700 20 mds.0.cache dispatch_quiesce_inode request(mds.0:157604 nref=3) quiescing [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/clien
t.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix)
 (ipolicy sync r=1) (ifile lock) (iquiesce lock x=1 by request(mds.0:157604 nref=4)) | ptrwaiter=0 dirtyscattered=0 request=0 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqu
eue=0 randepin 0x562579b59700]
        ..
        ..
        ..
2024-05-10T22:03:04.207+0000 7fbd570ad700 10 mds.0.cache dispatch_quiesce_inode request(mds.0:157604 nref=3) quiesce complete of [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=4)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
        ..
        ..

2024-05-10T22:03:04.207+0000 7fbd570ad700 20 mds.0.locker issue_caps: [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/f
irmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request
(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
2024-05-10T22:03:04.207+0000 7fbd570ad700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] <== mds.2 v2:172.21.15.59:6836/1839585124 11739 ==== peer_request(mds.2:9055.0 authpin) ==== 177+0+
0 (crc 0 0 0) 0x56258042e000 con 0x562573297180
        ..
        ..
        ..
2024-05-10T22:03:05.561+0000 7fbd528a4700 10 mds.0.cache.dir(0x1000000c5e4.001*)  set amdtee [dentry #0x1/volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee [2,head] auth{1=1} (dversion lock) v=1776 ino=0x1000000d4bd state=1610612736 | request=0 lock=0 fragmenting=0 inodepin=1 replicated=1 dirty=1 authpin=0 0x5625753a3680]
2024-05-10T22:03:05.561+0000 7fbd528a4700 14 mds.0.cache.dir(0x1000000c5e4.001*)  dn 'amdtee' inode [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
2024-05-10T22:03:05.561+0000 7fbd528a4700 10 mds.0.cache.ino(0x1000000d4bd) purge_stale_snap_data
        ..
        ..
        ..
2024-05-10T22:03:05.686+0000 7fbd508a0700 15 mds.0.cache.dir(0x1000000c5e4.001*)  dir 3316 >= inode 1776 now clean [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
2024-05-10T22:03:05.686+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) mark_clean [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]
2024-05-10T22:03:05.686+0000 7fbd508a0700 15 mds.0.cache.dir(0x1000000c5e4.001*)  dir 3316 >= dn 1776 now clean [dentry #0x1/volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee [2,head] auth{1=1} (dversion lock) v=1776 ino=0x1000000d4bd state=1610612736 | request=0 lock=0 fragmenting=0 inodepin=1 replicated=1 dirty=1 authpin=0 0x5625753a3680]
2024-05-10T22:03:05.686+0000 7fbd508a0700 10 mds.0.cache.den(0x1000000c5e4.001* amdtee) mark_clean [dentry #0x1/volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee [2,head] auth{1=1} (dversion lock) v=1776 ino=0x1000000d4bd state=1610612736 | request=0 lock=0 fragmenting=0 inodepin=1 replicated=1 dirty=1 authpin=0 0x5625753a3680]
...
...

4. Scrub on the inode is done and failed. Note that the current instance of mds (mds.0) is not the CDir auth but it's the inode auth. CDir auth is the mds.1. So the mds_scrub on this directory inode is sent to mds.1 and receives the information.

...
2024-05-10T22:03:26.558+0000 7fbd558aa700 10 mds.0.cache | |________ 1     rep [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmwa
re/amdtee/ [2,head] rep@1.1 dir_auth=1 state=0 f(v0 m2024-05-10T21:57:37.873685+0000 2=2+0)/f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=0+0,ss=0+0 | ptrwai
ter=0 request=0 child=0 frozen=0 subtree=1 replicated=0 dirty=0 waiter=0 authpin=0 tempexporting=0 0x56257cc70880]
        ..
        ..
        ..
2024-05-10T22:03:26.635+0000 7fbd570ad700 20 mds.0.cache.den(0x1000000c5e4.001* amdtee) scrubbing [dentry #0x1/volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee [2,head] auth{1=1} (dversion lock) v=1776 ino=0x1000000d4bd state=1073741824 | request=0 lock=0 fragmenting=0 inodepin=1 replicated=1 dirty=0 authpin=0 0x5625753a3680] next_seq = 2
2024-05-10T22:03:26.635+0000 7fbd570ad700 10  mds.0.cache.snaprealm(0x100000001fa seq 1 0x5625732e4900) get_snaps  (seq 1 cached_seq 1)
2024-05-10T22:03:26.635+0000 7fbd570ad700 10 mds.0.scrubstack _enqueue with {[inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]}, top=0
2024-05-10T22:03:26.635+0000 7fbd570ad700 20 mds.0.cache.ino(0x1000000d4bd) scrub_initialize with scrub_version 1776
2024-05-10T22:03:26.635+0000 7fbd570ad700 20 mds.0.scrubstack enqueue [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700] to bottom of ScrubStack
        ..
        ..
        ..
2024-05-10T22:03:26.763+0000 7fbd508a0700 20 mds.0.scrubstack kick_off_scrubs examining [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700]
2024-05-10T22:03:26.763+0000 7fbd508a0700 10 mds.0.scrubstack scrub_dir_inode [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700]
2024-05-10T22:03:26.763+0000 7fbd508a0700 20 mds.0.scrubstack scrub_dir_inode recursive mode, frags [*]
2024-05-10T22:03:26.763+0000 7fbd508a0700 20 mds.0.scrubstack scrub_dir_inode forward fragset_t(*) to mds.1
2024-05-10T22:03:26.763+0000 7fbd508a0700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] send_to--> mds [v2:172.21.15.70:6838/4124190419,v1:172.21.15.70:6839/4124190419] -- mds_scrub(queue_dir 0x1000000d4bd fragset_t(*) 6f86be06-95d4-48c4-a5c2-b28f6a9e2bf6 force recursive) -- ?+0 0x56257371de00
2024-05-10T22:03:26.763+0000 7fbd508a0700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] --> [v2:172.21.15.70:6838/4124190419,v1:172.21.15.70:6839/4124190419] -- mds_scrub(queue_dir 0x1000000d4bd fragset_t(*) 6f86be06-95d4-48c4-a5c2-b28f6a9e2bf6 force recursive) -- 0x56257371de00 con 0x562573296d00
        ..
        ..
        ..
2024-05-10T22:03:26.764+0000 7fbd570ad700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] <== mds.1 v2:172.21.15.70:6838/4124190419 11711 ==== mds_scrub(queue_dir_ack 0x1000000d4bd fragset_t(*) 6f86be06-95d4-48c4-a5c2-b28f6a9e2bf6) ==== 72+0+0 (crc 0 0 0) 0x56257371de00 con 0x562573296d00
        ..
        ..
        ..
2024-05-10T22:03:26.764+0000 7fbd570ad700 10 mds.0.scrubstack handle_scrub mds_scrub(queue_dir_ack 0x1000000d4bd fragset_t(*) 6f86be06-95d4-48c4-a5c2-b28f6a9e2bf6) from mds.1
2024-05-10T22:03:26.764+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs: state=RUNNING
2024-05-10T22:03:26.764+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs entering with 4 in progress and 2843 in the stack
2024-05-10T22:03:26.764+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs examining [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700]
2024-05-10T22:03:26.764+0000 7fbd570ad700 10 mds.0.scrubstack scrub_dir_inode [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700]
2024-05-10T22:03:26.764+0000 7fbd570ad700 20 mds.0.scrubstack scrub_dir_inode recursive mode, frags [*]
2024-05-10T22:03:26.764+0000 7fbd570ad700 20 mds.0.scrubstack scrub_dir_inode_final [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700]
2024-05-10T22:03:26.764+0000 7fbd598b2700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] <== osd.1 v2:172.21.15.59:6800/2532270400 33266 ==== osd_op_reply(402416 1000000d9b6.00000000 [getxattr (418) out=418b] v0'0 uv1791 ondisk = 0) ==== 164+0+418 (crc 0 0 0) 0x56257f3e8f00 con 0x5625734c2d80
2024-05-10T22:03:26.764+0000 7fbd570ad700 10 mds.0.cache.ino(0x1000000d4bd) scrub starting validate_disk_state on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700]
         ..
         ..
2024-05-10T22:03:26.764+0000 7fbd570ad700 10 mds.0.scrubstack scrub_dir_inode done
2024-05-10T22:03:26.764+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs dir inode, done
2024-05-10T22:03:26.764+0000 7fbd570ad700 20 mds.0.scrubstack dequeue [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=3 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=1 randepin 0x562579b59700] from ScrubStack
        ..
        ..
        ..
2024-05-10T22:03:26.765+0000 7fbd508a0700 20 mds.0.cache.ino(0x1000000d4bd) ondisk_read_retval: 0
2024-05-10T22:03:26.765+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) decoded 402 bytes of backtrace successfully
2024-05-10T22:03:26.765+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) scrub: inotable ino = 0x1000000d4bd
2024-05-10T22:03:26.765+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) scrub: inotable free says 0
2024-05-10T22:03:26.765+0000 7fbd508a0700  7 mds.0.cache request_start_internal request(mds.0:190875 nref=2) op 5383
        ..
        ..
        ..
2024-05-10T22:03:26.765+0000 7fbd570ad700 10 MDSContext::complete: 25MDSInternalContextWrapper
2024-05-10T22:03:26.765+0000 7fbd570ad700 10 MDSContext::complete: 16C_InodeValidated
2024-05-10T22:03:26.765+0000 7fbd570ad700  0 log_channel(cluster) log [WRN] : Scrub error on inode 0x1000000d4bd (/volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee) see mds.b log and `damage ls` output for details
2024-05-10T22:03:26.765+0000 7fbd5a8b4700  1 -- [v2:172.21.15.59:6838/627887381,v1:172.21.15.59:6839/627887381] <== osd.2 v2:172.21.15.70:6800/184472092 23999 ==== osd_op_reply(402424 1000000d4b5.00000000 [getxattr (417) out=417b] v0'0 uv769 ondisk = 0) ==== 164+0+417 (crc 0 0 0) 0x5625843ec000 con 0x562573296400
2024-05-10T22:03:26.765+0000 7fbd570ad700 -1 mds.0.scrubstack _validate_inode_done scrub error on inode [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=4 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest sync->mix g=1) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=3)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=0 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value":"(2)0x1000000d4bd:[<0x1000000c5e4/amdtee v1776>,<0x1000000c5db/firmware v6359>,<0x1000000c5da/multiple_rsync_payload.215536 v769>,<0x1000000c5d9/payload.1 v575>,<0x10000000005/tmp v1687>,<0x100000001fb/client.0 v1493>,<0x100000001fa/38c2707a-ec61-441f-9927-99af8ad21869 v1420>,<0x10000000001/sv_1 v1385>,<0x10000000000/qa v1353>,<0x1/volumes v1298>]//[]","memoryvalue":"(2)0x1000000d4bd:[<0x1000000c5e4/amdtee v1776>,<0x1000000c5db/firmware v6359>,<0x1000000c5da/multiple_rsync_payload.215536 v804>,<0x1000000c5d9/payload.1 v657>,<0x10000000005/tmp v1741>,<0x100000001fb/client.0 v1544>,<0x100000001fa/38c2707a-ec61-441f-9927-99af8ad21869 v1466>,<0x10000000001/sv_1 v1433>,<0x10000000000/qa v1400>,<0x1/volumes v1346>]//[]","error_str":""},"raw_stats":{"checked":true,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0)","ondisk_value.rstat":"n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 3=2+1)","memory_value.dirstat":"f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)","memory_value.rstat":"n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1)","error_str":"freshly-calculated rstats don't match existing ones"},"return_code":0}
2024-05-10T22:03:26.765+0000 7fbd570ad700 20 mds.0.cache.ino(0x1000000d4bd) scrub_finished
2024-05-10T22:03:26.765+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs: state=RUNNING
2024-05-10T22:03:26.765+0000 7fbd570ad700 20 mds.0.scrubstack kick_off_scrubs entering with 4 in progress and 2838 in the stack
2024-05-10T22:03:26.765+0000 7fbd570ad700 10 mds.0.scrubstack scrub_dirfrag [dir 0x1000000d4b5 /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/5.14.0-437.el9.x86_64/ [2,head] auth v=654 cv=654/654 ap=1+7 state=1073741825|complete f(v0 m2024-05-10T21:57:34.269751+0000 7=6+1) n(v1 rc2024-05-10T21:57:34.851741+0000 b40748 153=152+1) hs=7+6,ss=0+0 | child=1 dirty=0 authpin=1 scrubqueue=1 0x56257e15ba80]
..
..

The scrub has failed because the rstat information is not matching. Since the inode is not dirty and marked clean, it is expected to be same ?

raw_stats":
{"checked":true,"passed":false,"read_ret_val":0,
"ondisk_value.dirstat":"f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0)",
"memory_value.dirstat":"f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)",

"ondisk_value.rstat":"n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 3=2+1)",
"memory_value.rstat":"n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1)",

"error_str":"freshly-calculated rstats don't match existing ones"},"return_code":0}

I am yet to connect the dots properly. Is quiesce on this particular inode is interfering here some way ?

Actions #20

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

Sequence of rstat information at different points

n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - waiting for storing backtrace - same is stored backtrace    
n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1  - scheduling op to quiesce
n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - quiescing
n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - quiesce complete
n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - rstat at the     mark_clean of inode at the beginning
n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - scrub enqueue
n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - scrub_dir_inode
n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - dequeue

n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) - memory
n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 3=2+1) - ondisk

Does this mean, the quiesce didn't allow the in-memory inode to be flushed to disk ? I am still digging into logs

Actions #21

Updated by Leonid Usov almost 2 years ago

There are multiple repeated quiesces of this inode (and all others, too, since we're quiescing the root), not just the one around the time of the scrub error.

I think we should focus on the fact that the inode is marked clean but the on-disk data is not matching the in-memory data.
On-disk we have v0, and in memory - v1.

The last successful quiesce with v0 was

2024-05-10T21:56:53.164+0000 7fbd570ad700 10 mds.0.cache dispatch_quiesce_inode request(mds.0:114732 nref=3) quiesce complete of [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multip
le_rsync_payload.215536/firmware/amdtee/ auth{1=1} v430 ap=2 DIRTYPARENT RANDEPHEMERALPIN f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 2=1+1) (inest mix scatter_wanted) (ipolicy sync r=1) (ifile mix) (i
quiesce lock x=1 by request(mds.0:114732 nref=4)) caps={25051=pAsLsXs/pAsLsXsFsx@4} | ptrwaiter=0 request=1 lock=2 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=0 waiter=0 authpin=1 randepin 0x562579b59700]

and then the next quiesce began with v1 already

2024-05-10T21:59:40.503+0000 7fbd508a0700 10 mds.0.cache dispatch_quiesce_inode: scheduling op to quiesce [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=1 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) caps={25051=pAsLsXsFs/pAsLsXsFsx@12} mcw={1=AsLsXsFs} | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=1 dirtyparent=1 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]

As Kotresh mentioned above, there's a record of clearing the DIRTYPARENT as a result of commit, and the next quiesce is scheduled after that event

2024-05-10T22:02:51.902+0000 7fbd528a4700 20 mds.0.cache.ino(0x1000000d4bd) _store_backtrace: no dirtypool or no old pools
2024-05-10T22:02:51.907+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) _commit_ops
2024-05-10T22:02:51.938+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) _stored_backtrace v 194
2024-05-10T22:02:51.938+0000 7fbd508a0700 10 mds.0.cache.ino(0x1000000d4bd) auth_unpin by 0x562579b59700 on [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.2155
36/firmware/amdtee/ auth{1=1} v1776 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyp
arent=1 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 randepin 0x562579b59700] now 0
 ...
2024-05-10T22:03:01.844+0000 7fbd570ad700 10 mds.0.cache dispatch_quiesce_inode: scheduling op to quiesce [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=1 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700

at the point of committing the inode no quiesce was in progress.

I don't see evidence that quiesce has an impact on this condition. We might have a race and incorrectly mark a node as clean before it's written to the disk, or there's a problem with the scrub not reading the up-to-date state from disk.

Actions #22

Updated by Leonid Usov almost 2 years ago

@Kotresh Hiremath Ravishankar to clarify, you can tell whether an inode is currently quiesced by looking at the xlocked state of the quiescelock:

No quiesce in progress:

 [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.2155
36/firmware/amdtee/ auth{1=1} v1776 DIRTYPARENT RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) | ptrwaiter=0 dirtyscattered=0 request=0 lock=0 dirfrag=1 caps=0 dirtyp
arent=1 replicated=1 dirty=1 waiter=0 authpin=0 scrubqueue=0 randepin 0x562579b59700] now 0

Quiesce in progress:

[inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ auth{1=1} v1776 ap=2 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) (inest mix) (ipolicy sync r=1) (iquiesce lock x=1 by request(mds.0:157604 nref=4)) | ptrwaiter=0 dirtyscattered=0 request=1 lock=2 dirfrag=1 caps=0 dirtyparent=0 replicated=1 dirty=1 waiter=0 authpin=1 scrubqueue=0 randepin 0x562579b59700]

Note:

(iquiesce lock x=1 by request(mds.0:157604 nref=4))

Actions #23

Updated by Leonid Usov almost 2 years ago

It looks like the dir was exported at version v0, probably that's when it was last updated on disk?

2024-05-10T21:56:51.489+0000 7fbd508a0700 10 mds.0.mig finish_export_dir [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{1=1} v=5 cv=0/0
 dir_auth=1,0 state=1610875907|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=0 request=0 child=1 frozen=1 subtree=1 replicated=1 dir
ty=1 waiter=0 authpin=0 tempexporting=1 0x56257cc70880]
2024-05-10T21:56:51.489+0000 7fbd508a0700 10 mds.0.cache.dir(0x1000000d4bd) mark_clean [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] rep@1,
0.1 dir_auth=1,0 state=537134083|complete|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 dirty=1 | ptrwaiter=0 request=0 child=1 frozen=1 subtree=1 replicated=0 d
irty=1 waiter=0 authpin=0 tempexporting=1 0x56257cc70880] version 5
...
2024-05-10T21:56:51.489+0000 7fbd508a0700  7 mds.0.cache adjust_subtree_auth 1,0 -> 1,-2 on [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] r
ep@1,0.1 dir_auth=1,0 state=1026|frozentree|exporting f(v0 m2024-05-10T21:56:50.818546+0000 1=1+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0) hs=1+0,ss=0+0 | ptrwaiter=0 request=0 child=1 frozen=1 subtree=1 replicated=0 dirty=0 waiter=0 a
uthpin=0 tempexporting=0 0x56257cc70880]
Actions #24

Updated by Leonid Usov almost 2 years ago

Moving to mds.1 (mds.i.log)

2024-05-10T21:57:45.462+0000 7feef1a81700 10 mds.1.cache.ino(0x1000000d4bd) decode_lock_inest taking inode rstat n(v1 rc2024-05-10T21:57:40.474638+0000 b5966 3=2+1) for [inode 0x1000000d4bd [...2,head] /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99
af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ rep@0.1 v1207 RANDEPHEMERALPIN f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v0 rc2024-05-10T21:56:50.856545+0000 b42 2=1+1) (inest mix) caps={25051=pAsLsXsFs/pAsL
sXsFsx@7} | dirtyscattered=0 request=0 lock=0 importing=0 dirfrag=1 caps=1 waiter=0 randepin 0x5560e26d4b00]
...
2024-05-10T21:57:45.462+0000 7feef1a81700 10 mds.1.cache.ino(0x1000000d4bd) finish_scatter_update try to assimilate dirty rstat on [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_pay
load.215536/firmware/amdtee/ [2,head] auth{0=1} pv=14 v=14 cv=0/0 dir_auth=1 state=1611399169|complete|auxsubtree f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)->f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v0 rc2024-05-10T21:57:37.873685+0000 b
5966 2=2+0)/n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0)->n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=2+1,ss=0+0 dirty=3 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=0 scrubqueue=0 0x5560e2624
000]
...
2024-05-10T21:57:45.462+0000 7feef1a81700 10 mds.1.cache.ino(0x1000000d4bd) finish_scatter_update finish assimilating dirty rstat on [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{0=1} pv=14 v=14 cv=0/0 dir_auth=1 ap=1+0 state=1611431937|complete|auxsubtree|assimrstat f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)->f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0)/n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0)->n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=2+1,ss=0+0 dirty=3 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 scrubqueue=0 0x5560e2624000]
2024-05-10T21:57:45.462+0000 7feef1a81700 10 mds.1.cache.dir(0x1000000d4bd) assimilate_dirty_rstat_inodes_finish
2024-05-10T21:57:45.462+0000 7feef1a81700 10 mds.1.cache.dir(0x1000000d4bd) pre_dirty 15

2024-05-10T21:57:46.287+0000 7feeeb274700 10 mds.1.cache.dir(0x1000000d4bd) _mark_dirty (already dirty) [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{0=1} v=15 cv=0/0 dir_auth=1 ap=1+0 state=1611399169|complete|auxsubtree f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=2+1,ss=0+0 dirty=3 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=1 scrubqueue=0 0x5560e2624000] version 15
...
2024-05-10T21:58:24.203+0000 7feef1a81700 20 mds.1.cache.den(0x1000000d4bd .773bd96f-b83f-4d52-b12dc529b13d8543.bin.xz.PGQeRk) scrubbing [dentry #0x1/volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/.773bd96f-b83f-4d52-b12dc529b13d8543.bin.xz.PGQeRk [2,head] auth NULL (dversion lock) v=9 ino=(nil) state=1610612736 | request=0 lock=0 inodepin=0 dirty=1 authpin=0 clientlease=0 0x5560e2d7e280] next_seq = 2
2024-05-10T21:58:24.203+0000 7feef1a81700 20 mds.1.cache.dir(0x1000000d4bd) get_num_head_items() = 2; fnode.fragstat.nfiles=2 fnode.fragstat.nsubdirs=0
2024-05-10T21:58:24.203+0000 7feef1a81700 20 mds.1.cache.dir(0x1000000d4bd) total of child dentries: n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0)
2024-05-10T21:58:24.203+0000 7feef1a81700 20 mds.1.cache.dir(0x1000000d4bd) my rstats:              n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0)
2024-05-10T21:58:24.203+0000 7feef1a81700 10 mds.1.cache.dir(0x1000000d4bd) check_rstats complete on 0x5560e2624000
2024-05-10T21:58:24.203+0000 7feef1a81700 20 mds.1.cache.dir(0x1000000d4bd) scrub_finished

I'm not able to find the logs of committing this dir on either of the MDSes

void CDir::_commit(version_t want, int op_prio)
{
  dout(10) << "_commit want " << want << " on " << *this << dendl;
Actions #25

Updated by Leonid Usov almost 2 years ago

On mds.1 (i):

here we updated the dir to v1 and marked as dirty

2024-05-10T21:57:45.462+0000 7feef1a81700 10 mds.1.cache.ino(0x1000000d4bd) finish_scatter_update try to assimilate dirty rstat on [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{0=1} pv=14 v=14 cv=0/0 dir_auth=1 state=1611399169|complete|auxsubtree f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)->f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v0 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0)/n(v0 rc2024-05-10T21:56:50.856545+0000 b42 1=1+0)->n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=2+1,ss=0+0 dirty=3 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=0 scrubqueue=0 0x5560e2624000]

request to scrub from mds.0, still dirty=1

2024-05-10T21:58:24.202+0000 7feef1a81700 10 mds.1.scrubstack handle_scrub mds_scrub(queue_dir 0x1000000d4bd fragset_t(*) 5f929d5d-a7f7-44a6-95b6-7ba1348603d1 force recursive) from mds.0
2024-05-10T21:58:24.202+0000 7feef1a81700 10 mds.1.scrubstack _enqueue with {[dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdtee/ [2,head] auth{0=1} v=15 cv=0/0 dir_auth=1 state=1611137025|complete|auxsubtree f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0) n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=2+1,ss=0+0 dirty=3 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated=1 dirty=1 authpin=0 scrubqueue=0 0x5560e2624000]}, top=1

long after, still dirty!

2024-05-10T22:10:36.542+0000 7feef1a81700 10 mds.1.cache.dir(0x1000000d4bd) auth_pin by 0x5560e65c9b00 on [dir 0x1000000d4bd /volumes/qa/sv_1/38c2707a-ec61-441f-9927-99af8ad21869/client.0/tmp/payload.1/multiple_rsync_payload.215536/firmware/amdt
ee/ [2,head] auth{0=1} pv=19 v=19 cv=0/0 dir_auth=1 ap=1+0 state=1611137025|complete|auxsubtree f(v1 m2024-05-10T22:10:36.527620+0000)/f(v1 m2024-05-10T21:57:37.873685+0000 2=2+0)->f(v2 m2024-05-10T22:10:36.527620+0000) n(v1 rc2024-05-10T22:10:3
6.527620+0000)/n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0)->n(v1 rc2024-05-10T22:10:36.527620+0000)/n(v1 rc2024-05-10T21:57:37.873685+0000 b5966 2=2+0) hs=0+3,ss=0+0 dirty=3 | ptrwaiter=0 child=1 frozen=0 subtree=1 importing=0 replicated
=1 dirty=1 authpin=1 scrubqueue=0 0x5560e2624000] count now 1
Actions #26

Updated by Leonid Usov almost 2 years ago · Edited

I scanned through the flows of remote dir frag scrub, and it doesn't look like the process is considering the dirtiness of a given fragment. This must have always been an issue unless we broke some mechanism that would not allow the parent dir inode to be marked as clean before all its fragments, including replicas, were clean.
Alternatively, some other changes, including, potentially, quiesce, have increased the probability that remote fragments would still be dirty when the parent dir inode is clean and scrubbed

Actions #27

Updated by Venky Shankar almost 2 years ago

Leonid Usov wrote in #note-26:

I scanned through the flows of remote dir frag scrub, and it doesn't look like the process is considering the dirtiness of a given fragment. This must have always been an issue unless we broke some mechanism that would not allow the parent dir inode to be marked as clean before all its fragments, including replicas, were clean.
Alternatively, some other changes, including, potentially, quiesce, have increased the probability that remote fragments would still be dirty when the parent dir inode is clean and scrubbed

Right. It's not the quiesce op itself that's causing this breakage. The first reported test run0 does not even have quiesce in it. So, something broke this or it was already broken but https://github.com/ceph/ceph/pull/55455 bubbled up the scrub error causing the test to fail.

[0]: https://pulpito.ceph.com/pdonnell-2024-03-20_18:16:52-fs-wip-batrick-testing-20240320.145742-distro-default-smithi/7612910/

Actions #29

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

  • Pull request ID set to 57953
Actions #30

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

  • Status changed from Triaged to Fix Under Review
Actions #31

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

Patrick,

All the instances of the qa failure recorded in this tracker has either dynamic balancer or ephemeral random pinning enabled.

fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate 3-modules} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/fuse objectstore-ec/bluestore-bitmap omap_limit/10000 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/multi/{balancer/automatic export-check n/3 replication/default} standby-replay tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/yes 5-workunit/postgres}}

fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate 3-modules} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/fuse objectstore-ec/bluestore-bitmap omap_limit/10000 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/multi/{balancer/automatic export-check n/5 replication/always} standby-replay tasks/{0-subvolume/{with-quota} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/yes 5-workunit/fs/norstats}}

fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/fuse objectstore-ec/bluestore-bitmap omap_limit/10 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/multi/{balancer/random export-check n/5 replication/always} standby-replay tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/yes 5-workunit/suites/dbench}}

fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate 3-modules} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/kclient/{base/{mount-syntax/{v2} mount overrides/{distro/testing/k-testing ms-die-on-skipped}} ms_mode/secure wsync/yes} objectstore-ec/bluestore-bitmap omap_limit/10 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/multi/{balancer/random export-check n/3 replication/default} standby-replay tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/no 5-quiesce/with-quiesce 6-workunit/fs/misc}}

fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate 3-modules} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/kclient/{base/{mount-syntax/{v1} mount overrides/{distro/stock/{centos_9.stream k-stock} ms-die-on-skipped}} ms_mode/legacy wsync/no} objectstore-ec/bluestore-ec-root omap_limit/10000 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/multi/{balancer/automatic export-check n/5 replication/always} standby-replay tasks/{0-subvolume/{with-quota} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/yes 5-quiesce/no 6-workunit/postgres}}

fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/fuse objectstore-ec/bluestore-bitmap omap_limit/10 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/multi/{balancer/random export-check n/5 replication/always} standby-replay tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/yes 5-workunit/suites/dbench}}

fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate} clusters/1a11s-mds-1c-client-3node conf/{client mds mgr mon osd} mount/fuse objectstore-ec/bluestore-bitmap omap_limit/10 overrides/{cephsqlite-timeout frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts pg_health session_timeout} ranks/multi/{balancer/random export-check n/5 replication/always} standby-replay tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/no 4-flush/yes 5-workunit/suites/dbench}}

kotresh:multi$ cat balancer/automatic.yaml 
tasks:
- exec:
    mon.a:
      - ceph fs set cephfs balance_automate true

kotresh:multi$ cat balancer/random.yaml 
overrides:
  ceph:
    conf:
      mds:
        mds_export_ephemeral_random_max: 0.10
tasks:
- exec:
    mon.a:
      - ceph fs set cephfs balance_automate false
      - ceph fs subvolumegroup pin cephfs qa random 0.10

So all the tests definetly involved subtree migrations.

Actions #32

Updated by Patrick Donnelly almost 2 years ago

Kotresh Hiremath Ravishankar wrote in #note-31:

Patrick,

All the instances of the qa failure recorded in this tracker has either dynamic balancer or ephemeral random pinning enabled.

[...]

[...]

So all the tests definetly involved subtree migrations.

+1

Actions #33

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

I see that the test failed even with the fix that ignores the scrub run if the directory dirfrag is dirty. Please check the comment https://github.com/ceph/ceph/pull/57953#issuecomment-2166182429 in the PR.

Actions #34

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

Furthe analysis of the failure mentioned in https://github.com/ceph/ceph/pull/57953#issuecomment-2166182429

The setattr request has modified the mtime on the directory inode (on which scrub is happening), as a result
the r/ctime is updated on the directory inode. This rctime is not updated on its dirfrags which is logically
correct. Both inode and dirfrags are written to disk and marked clean and are no longer dirty.
The scrub on this directory inode is failing as it's comparing the rstat of the directory inode with the dirfrags
accounted rstats which is not updated with that setattr. Hence the scrub is treated as failure on this directory inode.

Actions #35

Updated by Venky Shankar almost 2 years ago

Kotresh Hiremath Ravishankar wrote in #note-34:

Furthe analysis of the failure mentioned in https://github.com/ceph/ceph/pull/57953#issuecomment-2166182429

The setattr request has modified the mtime on the directory inode (on which scrub is happening), as a result
the r/ctime is updated on the directory inode. This rctime is not updated on its dirfrags which is logically
correct. Both inode and dirfrags are written to disk and marked clean and are no longer dirty.
The scrub on this directory inode is failing as it's comparing the rstat of the directory inode with the dirfrags
accounted rstats which is not updated with that setattr. Hence the scrub is treated as failure on this directory inode.

What do you think if scrub compares on-disk inode nesting with maximum of this inodes CDir nestinfo and parent CDir nestinfo?

Actions #36

Updated by Kotresh Hiremath Ravishankar almost 2 years ago

Venky Shankar wrote in #note-35:

Kotresh Hiremath Ravishankar wrote in #note-34:

Furthe analysis of the failure mentioned in https://github.com/ceph/ceph/pull/57953#issuecomment-2166182429

The setattr request has modified the mtime on the directory inode (on which scrub is happening), as a result
the r/ctime is updated on the directory inode. This rctime is not updated on its dirfrags which is logically
correct. Both inode and dirfrags are written to disk and marked clean and are no longer dirty.
The scrub on this directory inode is failing as it's comparing the rstat of the directory inode with the dirfrags
accounted rstats which is not updated with that setattr. Hence the scrub is treated as failure on this directory inode.

What do you think if scrub compares on-disk inode nesting with maximum of this inodes CDir nestinfo and parent CDir nestinfo?

As discussed, this won't work because the parent CDir nestinfo also changes when the other sibilings are modified.

So to fix the rstat mismatch because of the setattr on the directory inode (on which scrub is iniated), we will consider modifying the rstat scrub comparison to >= instead of ==.
i.e., if directory inode's rstat >= ondisk rstat of all the dirfrags added

Actions #37

Updated by Kotresh Hiremath Ravishankar over 1 year ago

Continueing the analysis - QA Failure - https://github.com/ceph/ceph/pull/57953#issuecomment-2166182429

If we see the Scrub Failure, both rstats and dirstats are mismatching. rstats mismatch reason is described in https://tracker.ceph.com/issues/65020#note-34
Lets investigate dirstat mismatch.

"raw_stats": {
"checked":true,
"passed":false,
"read_ret_val":0,
"ondisk_value.dirstat":"f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)","ondisk_value.rstat":"n(v0 rc2024-06-13T08:55:10.383452+0000 b347 9=8+1)",
"memory_value.dirstat":"f(v1 m2024-06-13T08:55:10.381452+0000 8=8+0)","memory_value.rstat":"n(v1 rc2024-06-13T08:55:16.501331+0000 b347 9=8+1)",
"error_str":"freshly-calculated rstats don't match existing ones"},"return_code":0
}

Note:
mds.2 - Auth of directory inode 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/
mds.0 - Auth of CDir 0x10000012f02
The directory is migrated from mds.0 to mds.2 - This changed the auth of CInode but not the CDir

Let's investigate why the dirstat is not matching.

Tracking back the in memory dirstat time - 2024-06-13T08:55:10.381452
This is the time of symlink creation of #0x10000012f02/fc8d63306f48ee70aa4cf8e920188bcd89d3f2 under the directory in question
As a result of this, the CDir's fragstat is updated with symlink creation time. The same time is updated on the CInode as well on the auth mds.
Please check the logs below.

mds.0 (CDir auth)

...
...
2024-06-13T08:55:10.382+0000 7f2aa23d8640  1 -- [v2:172.21.15.123:6838/2791669125,v1:172.21.15.123:6839/2791669125] <== client.24868 172.21.15.113:0/4193977489 398405 ==== client_request(client.24868:218745 symlink owner_uid=0, owner_gid=0 #0x10000012f02/fc8d63306f48ee70aa4cf8e920188bcd89d3f2 ../../../../usr/lib64/security/pam_group.so 2024-06-13T08:55:10.381452+0000 caller_uid=0, caller_gid=0{0,}) ==== 319+0+65 (crc 0 0 0) 0x558840c91500 con 0x5588366b8c00
2024-06-13T08:55:10.382+0000 7f2aa23d8640  4 mds.0.server handle_client_request client_request(client.24868:218745 symlink owner_uid=0, owner_gid=0 #0x10000012f02/fc8d63306f48ee70aa4cf8e920188bcd89d3f2 ../../../../usr/lib64/security/pam_group.so 2024-06-13T08:55:10.381452+0000 caller_uid=0, caller_gid=0{0,})
2024-06-13T08:55:10.382+0000 7f2aa23d8640 20 mds.0.14 get_session have 0x558836440f00 client.24868 172.21.15.113:0/4193977489 state open
2024-06-13T08:55:10.382+0000 7f2aa23d8640 15 mds.0.server  oldest_client_tid=218736
...
...
...
2024-06-13T08:55:10.383+0000 7f2aa23d8640 10 mds.0.cache.dir(0x10000012f02) pre_dirty 31
2024-06-13T08:55:10.383+0000 7f2aa23d8640 10 mds.0.cache predirty_journal_parents bumping fragstat change_attr to 8 on 0x55884cd10d00
2024-06-13T08:55:10.383+0000 7f2aa23d8640 10 mds.0.cache predirty_journal_parents updating mtime on [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/ [2,head] auth{2=1} pv=31 v=17 cv=0/0 dir_auth=0 ap=7+8 state=1611399169|complete|auxsubtree f(v0 m2024-06-13T08:55:10.327453+0000 4=4+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)->f(v0 m2024-06-13T08:55:10.381452+0000 7=7+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T08:55:10.357453+0000 b199 4=4+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0)->n(v0 rc2024-06-13T08:55:10.380452+0000 b304 7=7+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=4+4,ss=0+0 dirty=4 | child=1 subtree=1 exportbound=0 replicated=1 dirty=1 authpin=1 0x55884cd10d00]
2024-06-13T08:55:10.383+0000 7f2aa23d8640 10 mds.0.cache predirty_journal_parents updating size on [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/ [2,head] auth{2=1} pv=31 v=17 cv=0/0 dir_auth=0 ap=7+8 state=1611399169|complete|auxsubtree f(v0 m2024-06-13T08:55:10.327453+0000 4=4+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)->f(v0 m2024-06-13T08:55:10.381452+0000 7=7+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T08:55:10.357453+0000 b199 4=4+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0)->n(v0 rc2024-06-13T08:55:10.381452+0000 b304 7=7+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=4+4,ss=0+0 dirty=4 | child=1 subtree=1 exportbound=0 replicated=1 dirty=1 authpin=1 0x55884cd10d00]
2024-06-13T08:55:10.383+0000 7f2aa23d8640 10 mds.0.cache projected_rstat_inode_to_frag first 2a linkunlink 1 [inode 0x10000013106 [2a,head] /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/fc8d63306f48ee70aa4cf8e920188bcd89d3f2 auth symlink='../../../../usr/lib64/security/pam_group.so' v30 s=43 n(v0 b43 1=1+0) 0x5588538af700]
2024-06-13T08:55:10.383+0000 7f2aa23d8640 20 mds.0.cache     frag head is [2,head]
2024-06-13T08:55:10.383+0000 7f2aa23d8640 20 mds.0.cache  inode update is [2a,head]
...
...
2024-06-13T08:55:15.186+0000 7f2a9bbcb640 10 mds.0.cache.dir(0x10000012f02) _committed v 33 on [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.2     28227/.build-id/17/ [2,head] auth{2=1} v=33 cv=33/0 dir_auth=0 ap=1+0 state=1611399201|complete|auxsubtree|committing f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T     08:55:10.383452+0000 b347 8=8+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=8+0,ss=0+0 dirty=8 | child=1 subtree=1 exportbound=0 replicated=1 dirty=1 authpin=1 0x55884cd10d00]
...
...

mds.1 (CInode auth)

...
...
2024-06-13T08:55:12.966+0000 7f4b55282640 10 mds.2.cache.ino(0x10000012f02) finish_scatter_gather_update 128 on [inode 0x10000012f02 [...2a,head] /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/ auth{0=1} v610 pv948 ap=2 DIRTYPARENT RANDEPHEMERALPIN f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)->f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T08:55:05.4215         51+0000 b161 4=3+1)->n(v0 rc2024-06-13T08:55:05.421551+0000 b161 4=3+1) (inest mix->lock(2) g=0) (ifile mix->lock(2) w=1 dirty) caps={24868=pAsLsXs/pAsLsXsFsx@2} | dirtyscattered=1 lock=1 importingcaps=0 dirfrag=1 caps=1 s         tickydirs=0 dirtyparent=1 dirwaiter=0 replicated=1 dirty=1 authpin=1 discoverbase=0 randepin 0x5564d8188100]
2024-06-13T08:55:12.966+0000 7f4b55282640 20 mds.2.cache.ino(0x10000012f02)   orig dirstat f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)
2024-06-13T08:55:12.966+0000 7f4b55282640 20 mds.2.cache.ino(0x10000012f02) * [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/ [2,h         ead] rep@0.1 dir_auth=0 state=0 f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=0+0,ss=0+0 | subtree=1 importbound=0 sticky=0          0x5564d789e900]
2024-06-13T08:55:12.966+0000 7f4b55282640 20 mds.2.cache.ino(0x10000012f02) *           fragstat f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)
2024-06-13T08:55:12.966+0000 7f4b55282640 20 mds.2.cache.ino(0x10000012f02) * accounted_fragstat f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)
2024-06-13T08:55:12.966+0000 7f4b55282640 20 mds.2.cache.ino(0x10000012f02)  final dirstat f(v1 m2024-06-13T08:55:10.381452+0000 8=8+0)
024-06-13T08:55:12.966+0000 7f4b55282640 10 mds.2.cache predirty_journal_parents linkunlink=0 primary_dn follows head [inode 0x10000012f02 [...2a,head] /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/pa         yload.2/multiple_rsync_payload.228227/.build-id/17/ auth{0=1} v610 pv948 ap=2 DIRTYPARENT RANDEPHEMERALPIN f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)->f(v1 m2024-06-13T08:55:10.381452+0000 8=8+0) n(v0 rc2024-06-13T08:55:         05.421551+0000 b161 4=3+1)->n(v0 rc2024-06-13T08:55:05.421551+0000 b161 4=3+1) (inest mix->lock(2) g=0) (ifile mix->lock(2) w=1 flushing) caps={24868=pAsLsXs/pAsLsXsFsx@2} | dirtyscattered=1 lock=1 importingcaps=0 dirfrag=         1 caps=1 stickydirs=0 dirtyparent=1 dirwaiter=0 replicated=1 dirty=1 authpin=1 discoverbase=0 randepin 0x5564d8188100]
...
...

If we look at the inode and CDir just before Scrub error, we see that both of them are marked clean and the symlink creation time is updated meaning flushed to disk.
But we can also see that this time is not accounted_fragstat in CDir yet.

mds.0

...
...
2024-06-13T08:55:15.186+0000 7f2a9bbcb640 10 mds.0.cache.dir(0x10000012f02) _committed v 33 on [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.2     28227/.build-id/17/ [2,head] auth{2=1} v=33 cv=33/0 dir_auth=0 ap=1+0 state=1611399201|complete|auxsubtree|committing f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T     08:55:10.383452+0000 b347 8=8+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=8+0,ss=0+0 dirty=8 | child=1 subtree=1 exportbound=0 replicated=1 dirty=1 authpin=1 0x55884cd10d00]
2024-06-13T08:55:15.186+0000 7f2a9bbcb640 10 mds.0.cache.dir(0x10000012f02) mark_clean [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.b     uild-id/17/ [2,head] auth{2=1} v=33 cv=33/33 dir_auth=0 ap=1+0 state=1611399169|complete|auxsubtree f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T08:55:10.383452+00     00 b347 8=8+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=8+0,ss=0+0 dirty=8 | child=1 subtree=1 exportbound=0 replicated=1 dirty=1 authpin=1 0x55884cd10d00] version 33
2024-06-13T08:55:15.187+0000 7f2a9bbcb640 10 mds.0.cache.dir(0x10000012f02) auth_unpin by 0x55884cd10d00 on [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rs     ync_payload.228227/.build-id/17/ [2,head] auth{2=1} v=33 cv=33/33 dir_auth=0 state=1074528257|complete|auxsubtree f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T08:5     5:10.383452+0000 b347 8=8+0)/n(v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=8+0,ss=0+0 | child=1 subtree=1 exportbound=0 replicated=1 dirty=0 authpin=0 0x55884cd10d00] count now 0
2024-06-13T08:55:15.776+0000 7f2aa0bd5640 10 mds.0.cache | | |____ 0    auth [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17     / [2,head] auth{2=1} v=33 cv=33/33 dir_auth=0 state=1074528257|complete|auxsubtree f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v0 rc2024-06-13T08:55:10.383452+0000 b347 8=8+0)/n(     v0 rc2024-06-13T08:55:05.421551+0000 b161 3=3+0) hs=8+0,ss=0+0 | child=1 subtree=1 exportbound=0 replicated=1 dirty=0 authpin=0 0x55884cd10d00]
...
...

mds.1

...
2024-06-13T09:02:36.900+0000 7f4b55282640 10 mds.2.cache.ino(0x10000012f02) encode_lock_inest * dir [dir 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/ [2,head] rep@0.1 dir_auth=0 state=0 f(v0 m2024-06-13T08:55:10.381452+0000 8=8+0)/f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0) n(v1 rc2024-06-13T08:55:10.383452+0000 b347 8=8+0) hs=0+0,ss=0+0 | subtree=1 importbound=0 sticky=0 0x5564d789e900]

...
2024-06-13T09:02:36.901+0000 7f4b55282640 10 mds.2.cache.ino(0x10000012f02) auth_unpin by 0x5564dd965600 on [inode 0x10000012f02 [...2a,head] /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/ auth{0=1} v952 ap=2 RANDEPHEMERALPIN f(v1 m2024-06-13T08:55:10.381452+0000 8=8+0) n(v1 rc2024-06-13T08:55:16.501331+0000 b347 9=8+1) (inest sync->mix g=0) mcw={0=AsLsXsFs} | dirtyscattered=0 request=0 lock=0 importingcaps=0 dirfrag=1 caps=0 stickydirs=0 dirtyparent=0 dirwaiter=0 replicated=1 dirty=0 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 randepin 0x5564d8188100] now 2
...
...
2024-06-13T09:02:36.901+0000 7f4b55282640  0 log_channel(cluster) log [WRN] : Scrub error on inode 0x10000012f02 (/volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17) see mds.e log and `damage ls` output for details
2024-06-13T09:02:36.901+0000 7f4b55282640 -1 mds.2.scrubstack _validate_inode_done scrub error on inode [inode 0x10000012f02 [...2a,head] /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/ auth{0=1} v952 ap=2 RANDEPHEMERALPIN f(v1 m2024-06-13T08:55:10.381452+0000 8=8+0) n(v1 rc2024-06-13T08:55:16.501331+0000 b347 9=8+1) (inest sync->mix g=0) mcw={0=AsLsXsFs} | dirtyscattered=0 request=0 lock=0 importingcaps=0 dirfrag=1 caps=0 stickydirs=0 dirtyparent=0 dirwaiter=0 replicated=1 dirty=0 waiter=0 authpin=1 discoverbase=0 scrubqueue=0 randepin 0x5564d8188100]: {"performed_validation":true,"passed_validation":false,"backtrace":{"checked":true,"passed":true,"read_ret_val":0,"ondisk_value":"(2)0x10000012f02:[<0x10000012ec8/17 v952>,<0x10000012ec6/.build-id v6429>,<0x10000012ec5/multiple_rsync_payload.228227 v748>,<0x1000000c579/payload.2 v714>,<0x10000000005/tmp v1831>,<0x100000001fb/client.0 v1617>,<0x100000001fa/257f5e61-1d1a-4ee3-8dfe-78182c700535 v1572>,<0x10000000001/sv_1 v1540>,<0x10000000000/qa v1503>,<0x1/volumes v1450>]//[]","memoryvalue":"(2)0x10000012f02:[<0x10000012ec8/17 v952>,<0x10000012ec6/.build-id v6429>,<0x10000012ec5/multiple_rsync_payload.228227 v862>,<0x1000000c579/payload.2 v774>,<0x10000000005/tmp v1996>,<0x100000001fb/client.0 v1769>,<0x100000001fa/257f5e61-1d1a-4ee3-8dfe-78182c700535 v1729>,<0x10000000001/sv_1 v1692>,<0x10000000000/qa v1659>,<0x1/volumes v1608>]//[]","error_str":""},"raw_stats":{"checked":true,"passed":false,"read_ret_val":0,"ondisk_value.dirstat":"f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)","ondisk_value.rstat":"n(v0 rc2024-06-13T08:55:10.383452+0000 b347 9=8+1)","memory_value.dirstat":"f(v1 m2024-06-13T08:55:10.381452+0000 8=8+0)","memory_value.rstat":"n(v1 rc2024-06-13T08:55:16.501331+0000 b347 9=8+1)","error_str":"freshly-calculated rstats don't match existing ones"},"return_code":0}
...
...

Need to investigate, why the updated fragstat is not accounted even after the in memory data structures are marked clean and flushed to disk

Actions #55

Updated by Venky Shankar 9 months ago

  • Backport changed from squid,reef to tentacle,squid,reef
Actions #57

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to e1caea29cff73deb152de4f79914d36ead24c032
  • Fixed In set to v20.3.0-1792-ge1caea29cf
  • Upkeep Timestamp set to 2025-07-21T05:25:54+00:00
Actions #58

Updated by Venky Shankar 8 months ago

  • Merge Commit deleted (e1caea29cff73deb152de4f79914d36ead24c032)
  • Fixed In deleted (v20.3.0-1792-ge1caea29cf)
  • Upkeep Timestamp deleted (2025-07-21T05:25:54+00:00)

Kotresh Hiremath Ravishankar wrote in #note-37:

Continueing the analysis - QA Failure - https://github.com/ceph/ceph/pull/57953#issuecomment-2166182429

If we see the Scrub Failure, both rstats and dirstats are mismatching. rstats mismatch reason is described in https://tracker.ceph.com/issues/65020#note-34
Lets investigate dirstat mismatch.

"raw_stats": {
"checked":true,
"passed":false,
"read_ret_val":0,
"ondisk_value.dirstat":"f(v0 m2024-06-13T08:55:04.955560+0000 3=3+0)","ondisk_value.rstat":"n(v0 rc2024-06-13T08:55:10.383452+0000 b347 9=8+1)",
"memory_value.dirstat":"f(v1 m2024-06-13T08:55:10.381452+0000 8=8+0)","memory_value.rstat":"n(v1 rc2024-06-13T08:55:16.501331+0000 b347 9=8+1)",
"error_str":"freshly-calculated rstats don't match existing ones"},"return_code":0
}

Note:
mds.2 - Auth of directory inode 0x10000012f02 /volumes/qa/sv_1/257f5e61-1d1a-4ee3-8dfe-78182c700535/client.0/tmp/payload.2/multiple_rsync_payload.228227/.build-id/17/
mds.0 - Auth of CDir 0x10000012f02
The directory is migrated from mds.0 to mds.2 - This changed the auth of CInode but not the CDir

Let's investigate why the dirstat is not matching.

Tracking back the in memory dirstat time - 2024-06-13T08:55:10.381452
This is the time of symlink creation of #0x10000012f02/fc8d63306f48ee70aa4cf8e920188bcd89d3f2 under the directory in question
As a result of this, the CDir's fragstat is updated with symlink creation time. The same time is updated on the CInode as well on the auth mds.
Please check the logs below.

mds.0 (CDir auth)
[...]

mds.1 (CInode auth)
[...]

If we look at the inode and CDir just before Scrub error, we see that both of them are marked clean and the symlink creation time is updated meaning flushed to disk.
But we can also see that this time is not accounted_fragstat in CDir yet.

mds.0
[...]

mds.1

[...]

Need to investigate, why the updated fragstat is not accounted even after the in memory data structures are marked clean and flushed to disk

https://github.com/ceph/ceph/pull/57953 is merged, but I think based on the above there is probably additional fixes for this tracker, yes?. @Kotresh Hiremath Ravishankar

Actions #59

Updated by Kotresh Hiremath Ravishankar 8 months ago

Venky Shankar wrote in #note-58:

Kotresh Hiremath Ravishankar wrote in #note-37:

Continueing the analysis - QA Failure - https://github.com/ceph/ceph/pull/57953#issuecomment-2166182429

...
...
Need to investigate, why the updated fragstat is not accounted even after the in memory data structures are marked clean and flushed to disk

https://github.com/ceph/ceph/pull/57953 is merged, but I think based on the above there is probably additional fixes for this tracker, yes?. @Kotresh Hiremath Ravishankar

Yes, that's right. Should we raise a new tracker for that or keep this one itself ? @Venky Shankar

Actions

Also available in: Atom PDF