Skip to content

squid: mds: relax divergent backtrace scrub failures for replicated ancestor inodes#58501

Closed
joscollin wants to merge 1 commit intoceph:squidfrom
joscollin:wip-66272-squid
Closed

squid: mds: relax divergent backtrace scrub failures for replicated ancestor inodes#58501
joscollin wants to merge 1 commit intoceph:squidfrom
joscollin:wip-66272-squid

Conversation

@joscollin
Copy link
Member

backport tracker: https://tracker.ceph.com/issues/66272


backport of #57354
parent tracker: https://tracker.ceph.com/issues/64730

this backport was staged using ceph-backport.sh version 16.0.0.6848
find the latest version at https://github.com/ceph/ceph/blob/main/src/script/ceph-backport.sh

… inodes

scrub could be verifying backtrace for an inode for which some of its
ancestors might be replicas, e.g. (from a custom debug build) some
ancestors of an inode with divergent backtrace were replicas:

```
[inode 0x3000000502f [...122,head] /volumes/qa/sv_0/b98de6ea-ed40-40d0-8e1a-9433a337a387/client.0/tmp/payload.2/multiple_rsync_payload.190107/firmware/ rep@0.1 fragtree_t(*^3) v6663 f(v493 m2024-05-01T06:38:16.403080+0000 388=289+99) n(v139 rc2024-05-01T06:55:35.239345+0000 b467915716 4880=4534+346) old_inodes=24 (inest mix) (ifile mix) | lock=0 importing=0 dirfrag=1 0x55a85d244680]
```

In such cases, the backpointer version (inode_backpointer_t::version) of the
in-memory (cache) inode can fall behind the on-disk version causing scrub to
consider the inode backtrace as divergent (memory version < on-disk version).

Sample:

```
"ondisk_value":"(2)0x30000005bba:

[<0x3000000502f/mwl8k v2126>,
<0x30000005026/firmware v6663>,
<0x30000005025/multiple_rsync_payload.190107 v3041>,
<0x10000005894/payload.2 v4873>,
<0x10000000005/tmp v6193>,<0x10000000003/client.0 v5964>,
<0x10000000002/b98de6ea-ed40-40d0-8e1a-9433a337a387 v5817>
,<0x10000000001/sv_0 v5837>,
<0x10000000000/qa v6241>,
<0x1/volumes v4036>]

"memoryvalue":"(2)0x30000005bba:

[<0x3000000502f/mwl8k v2126>,
<0x30000005026/firmware v6663>,
<0x30000005025/multiple_rsync_payload.190107 v3041>,
<0x10000005894/payload.2 v4873>,
<0x10000000005/tmp v6081>,
<0x10000000003/client.0 v5942>,
<0x10000000002/b98de6ea-ed40-40d0-8e1a-9433a337a387 v5709>,
<0x10000000001/sv_0 v5819>,
<0x10000000000/qa v6121>,
<0x1/volumes v4022>]
```

Fixes: http://tracker.ceph.com/issues/64730
Signed-off-by: Venky Shankar <vshankar@redhat.com>
(cherry picked from commit b98bb86)
@joscollin joscollin added this to the squid milestone Jul 10, 2024
@joscollin joscollin added the cephfs Ceph File System label Jul 10, 2024
@joscollin
Copy link
Member Author

jenkins test api

@joscollin
Copy link
Member Author

This PR is under test in https://tracker.ceph.com/issues/66906.

@lxbsz
Copy link
Member

lxbsz commented Jul 22, 2024

None of the failure is related to this backport PR:

More detail please see section 2024-07-17 in https://tracker.ceph.com/projects/cephfs/wiki/Squid.

@joscollin @vshankar Should we continue to merge this PR ? Because we are planing to revert it in another PR #57962 ?

@joscollin
Copy link
Member Author

Closing this backport, as we are reverting this change.

@joscollin joscollin closed this Jul 23, 2024
@joscollin joscollin deleted the wip-66272-squid branch July 23, 2024 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants