Project

General

Profile

Actions

Bug #74984

open

qa: Test failure: test_cephfs_mirror_incremental_sync/test_cephfs_mirror_sync_with_purged_snapshot (snapdiff bug)

Added by Kotresh Hiremath Ravishankar about 1 month ago. Updated 19 days ago.

Status:
Triaged
Priority:
High
Assignee:
Category:
Correctness/Safety
Target version:
% Done:

0%

Source:
Development
Backport:
tentacle,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
cephfs-mirror
Labels (FS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

With multi-threaded mirroring PR - https://github.com/ceph/ceph/pull/66572 the following test failed.
But I found the previous tracker for the same test failure - https://tracker.ceph.com/issues/68567

https://pulpito.ceph.com/khiremat-2026-02-17_07:06:43-fs-wip-khiremat-mulithread-mirror-66572-sb_fixed-debug-latest-distro-default-trial/51959/

2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_incremental_sync (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_incremental_sync)
2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:Test incremental snapshot synchronization (based on mtime differences).
2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph_ebc89147fc1a1731b84ef7e7003e19f704ba15c0/qa/tasks/cephfs/test_mirroring.py", line 1224, in test_cephfs_mirror_incremental_sync
2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:    self.verify_snapshot(repo_path, 'snap_c')
2026-02-17T08:26:34.377 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph_ebc89147fc1a1731b84ef7e7003e19f704ba15c0/qa/tasks/cephfs/test_mirroring.py", line 355, in verify_snapshot
2026-02-17T08:26:34.378 INFO:tasks.cephfs_test_runner:    self.assertTrue(source_res == dest_res)
2026-02-17T08:26:34.378 INFO:tasks.cephfs_test_runner:AssertionError: False is not true
2026-02-17T08:26:34.378 INFO:tasks.cephfs_test_runner:
2026-02-17T08:26:34.378 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-17T08:26:34.378 INFO:tasks.cephfs_test_runner:Ran 10 tests in 1478.603s
2026-02-17T08:26:34.378 INFO:tasks.cephfs_test_runner:
2026-02-17T08:26:34.378 INFO:tasks.cephfs_test_runner:FAILED (failures=1)

Attached the teuthology log.


Files


Related issues 1 (0 open1 closed)

Related to CephFS - Bug #68567: cephfs/qa : Test failure: test_cephfs_mirror_incremental_sync (tasks.cephfs.test_mirroring.TestMirroring)ResolvedJos Collin

Actions
Actions #1

Updated by Kotresh Hiremath Ravishankar about 1 month ago

  • Related to Bug #68567: cephfs/qa : Test failure: test_cephfs_mirror_incremental_sync (tasks.cephfs.test_mirroring.TestMirroring) added
Actions #2

Updated by Kotresh Hiremath Ravishankar about 1 month ago

Analysis:
Let's find out what files have a checksum mismatch between snap_b and snap_c. Following is vimdiff output of the relevant checksums. The files prefixed with >>>>>>> are the files whose checksum is mismatched.

1. ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/HEAD
2. ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/refs/heads/giant

> (cd /home/ubuntu/cephtest/mnt.0 && exec find -L ceph_repo/ceph-qa-suite/.snap/snap_c -type f -exec md5sum '{}' +)          |  > (cd /home/ubuntu/cephtest/mnt.1 && exec find -L ceph_repo/ceph-qa-suite/.snap/snap_c -type f -exec md5sum '{}' +)
  0325ee9c9e6f77d5225fe18047f164c5  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=lrc-k=4-m=2-l=3.yaml     |  0325ee9c9e6f77d5225fe18047f164c5  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=lrc-k=4-m=2-l=3.yaml
  1a5c8baa549bb0c3829b19679595bdec  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=jerasure-k=3-m=1.yaml    |  1a5c8baa549bb0c3829b19679595bdec  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=jerasure-k=3-m=1.yaml
  8c56465ee9a6befedf0331238c6e99b2  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=isa-k=2-m=1.yaml         |  8c56465ee9a6befedf0331238c6e99b2  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=isa-k=2-m=1.yaml
  dd04e4b851be6a74494d883501fbcd92  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=jerasure-k=2-m=1.yaml    |  dd04e4b851be6a74494d883501fbcd92  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-plugin=jerasure-k=2-m=1.yaml
  63c228035e8eaadae6544472d4356d3e  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-default.yaml                    |  63c228035e8eaadae6544472d4356d3e  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-default.yaml
  d9e6c95a9fba03c3d0483ff35f5781ca  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-parallel.yaml                   |  d9e6c95a9fba03c3d0483ff35f5781ca  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-rados-parallel.yaml
+ +-- 32 lines: a9572c1de83c967b7568ffc2d30d9955  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-feature-plugins-v2.yaml-|+ +-- 32 lines: a9572c1de83c967b7568ffc2d30d9955  ceph_repo/ceph-qa-suite/.snap/snap_c/erasure-code/ec-feature-plugins-v2.yaml-
  2c642152299a94e05ea26eae11993b13  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/pre-push.sample                            |  2c642152299a94e05ea26eae11993b13  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/pre-push.sample
  054f9ffb8bfe04a599751cc757226dda  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/pre-applypatch.sample                      |  054f9ffb8bfe04a599751cc757226dda  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/pre-applypatch.sample
  c7ab00c7784efeadad3ae9b228d4b4db  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/push-to-checkout.sample                    |  c7ab00c7784efeadad3ae9b228d4b4db  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/push-to-checkout.sample
  2b7ea5cee3c49ff53d41e00785eb974c  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/post-update.sample                         |  2b7ea5cee3c49ff53d41e00785eb974c  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/post-update.sample
  ce562e08d8098926a3862fc6e7905199  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/applypatch-msg.sample                      |  ce562e08d8098926a3862fc6e7905199  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/hooks/applypatch-msg.sample
  e83c4980f9df005c2895481bcc21d7e9  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/FETCH_HEAD                                       |  e83c4980f9df005c2895481bcc21d7e9  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/FETCH_HEAD
>>>>>>>>> 3be557a48c3359d288a46cae7f4ad551  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/HEAD                                        |  f3024b5d1a846ae105d7d4cacfdae513  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/HEAD
>>>>>>>>> 3be557a48c3359d288a46cae7f4ad551  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/refs/heads/giant                            |  f3024b5d1a846ae105d7d4cacfdae513  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/refs/heads/giant
  bcab67411bcdcc060e91f92abac02e05  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/refs/remotes/origin/HEAD                    |  bcab67411bcdcc060e91f92abac02e05  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/refs/remotes/origin/HEAD
  af5764e80ecbb8ebd2c80ce52245a7cb  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/packed-refs                                      |  af5764e80ecbb8ebd2c80ce52245a7cb  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/packed-refs
  8567afb7f379a049879428930232de4f  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/config                                           |  8567afb7f379a049879428930232de4f  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/config
  036208b4a1ab4a235d75c181e685e5a3  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/info/exclude                                     |  036208b4a1ab4a235d75c181e685e5a3  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/info/exclude
  12b0ca1b66811d4ba7b1995888364b89  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/refs/heads/giant                                 |  12b0ca1b66811d4ba7b1995888364b89  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/refs/heads/giant
  73a00957034783b7b5c8294c54cd3e12  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/refs/remotes/origin/HEAD                         |  73a00957034783b7b5c8294c54cd3e12  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/refs/remotes/origin/HEAD
+ +--1469 lines: 955820e484e24c8b38508e33f716cdf0  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/ORIG_HEAD-------------------------|+ +--1469 lines: 955820e484e24c8b38508e33f716cdf0  ceph_repo/ceph-qa-suite/.snap/snap_c/.git/ORIG_HEAD-------------------------
  95bdf80fe0825fdf398bcf6213093fca  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/calamari/http_client.py                         |  95bdf80fe0825fdf398bcf6213093fca  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/calamari/http_client.py
  92e0cf9059081b44542098467733c00a  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/blktrace.py                                     |  92e0cf9059081b44542098467733c00a  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/blktrace.py
  5510abee35b0218b0434cf6bbfc00ca3  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/thrashosds.py                                   |  5510abee35b0218b0434cf6bbfc00ca3  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/thrashosds.py
  90bb4a99dd25e96bba48535d418b8bb9  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/radosgw_admin_rest.py                           |  90bb4a99dd25e96bba48535d418b8bb9  ceph_repo/ceph-qa-suite/.snap/snap_c/tasks/radosgw_admin_rest.py
  3d6437a5afcd6b97cc47c6528512d9d6  ceph_repo/ceph-qa-suite/.snap/snap_c/.gitignore                                            |  3d6437a5afcd6b97cc47c6528512d9d6  ceph_repo/ceph-qa-suite/.snap/snap_c/.gitignore
  b4308a8dd3a52ad4bc2cf282239fc740  ceph_repo/ceph-qa-suite/.snap/snap_c/debug/mds_client.yaml                                 |  b4308a8dd3a52ad4bc2cf282239fc740  ceph_repo/ceph-qa-suite/.snap/snap_c/debug/mds_client.yaml

Let's see if the file ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/HEAD is queued for data sync by the crawler thread. The following confirms the file is not queued for the snapshot snap_c sync

grep "cephfs::mirror" ceph-client.mirror.22423.log | grep -E 'copy_to_remote|push_dataq_entry|wait_until_safe_to_snapshot' | grep -E '.git/logs/HEAD|wait_until_safe_to_snapshot' | less
------------------------------------
2026-02-17T08:24:38.343+0000 7f5a660c7640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) push_dataq_entry: snapshot data replayer dataq pushed syncm=0x563dfdce3210 epath=./.git/logs/HEAD
2026-02-17T08:24:38.347+0000 7f5a648c4640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/logs/HEAD, num_blocks=1
2026-02-17T08:24:38.351+0000 7f5a648c4640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/logs/HEAD, block: [0~180]
2026-02-17T08:24:38.351+0000 7f5a648c4640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: truncating epath=./.git/logs/HEAD to 180 bytes
2026-02-17T08:24:38.351+0000 7f5a648c4640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/logs/HEAD error=0 synced
2026-02-17T08:24:44.327+0000 7f5a660c7640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) wait_until_safe_to_snapshot: Waiting for data sync to be done to take snapshot - dir_root=/ceph_repo/ceph-qa-suite current=snap_a,2 prev= syncm=0x563dfdce3210
2026-02-17T08:24:44.339+0000 7f5a660c7640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) wait_until_safe_to_snapshot: Woke up to take snapshot - dir_root=/ceph_repo/ceph-qa-suite current=snap_a,2 prev= syncm=0x563dfdce3210
2026-02-17T08:25:18.743+0000 7f5a660c7640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) push_dataq_entry: snapshot data replayer dataq pushed syncm=0x563dfe0cf690 epath=./.git/logs/HEAD
2026-02-17T08:25:18.743+0000 7f5a650c5640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/logs/HEAD, num_blocks=1
2026-02-17T08:25:18.743+0000 7f5a650c5640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/logs/HEAD, block: [0~335]
2026-02-17T08:25:18.747+0000 7f5a650c5640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: truncating epath=./.git/logs/HEAD to 335 bytes
2026-02-17T08:25:18.747+0000 7f5a650c5640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/logs/HEAD error=0 synced
2026-02-17T08:25:21.287+0000 7f5a660c7640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) wait_until_safe_to_snapshot: Waiting for data sync to be done to take snapshot - dir_root=/ceph_repo/ceph-qa-suite current=snap_b,3 prev= snap_a,2 syncm=0x563dfe0cf690
2026-02-17T08:25:22.531+0000 7f5a660c7640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) wait_until_safe_to_snapshot: Woke up to take snapshot - dir_root=/ceph_repo/ceph-qa-suite current=snap_b,3 prev= snap_a,2 syncm=0x563dfe0cf690
2026-02-17T08:25:53.882+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) wait_until_safe_to_snapshot: Waiting for data sync to be done to take snapshot - dir_root=/ceph_repo/ceph-qa-suite current=snap_c,4 prev= snap_b,3 syncm=0x563e12c24010
2026-02-17T08:25:53.882+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) wait_until_safe_to_snapshot: Woke up to take snapshot - dir_root=/ceph_repo/ceph-qa-suite current=snap_c,4 prev= snap_b,3 syncm=0x563e12c24010
--------------------------------------

Let's check whether the Snapdiff readdir really returned the file ceph_repo/ceph-qa-suite/.snap/snap_c/.git/logs/HEAD. From below logs, it confirms the snapdiff is not returned that file. It has only returned . and ..

-------------
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: entry=logs, snapid=4
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: epath=./.git/logs
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) init_directory: epath=./.git/logs
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) init_directory: open_snapdiff for dir_root=/ceph_repo/ceph-qa-suite, path=./.git/logs, prev=snap_b, current=snap_c
2026-02-17T08:25:51.306+0000 7f5a640c3640  5 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) should_sync_entry: entry=./.git/FETCH_HEAD, r=-2
2026-02-17T08:25:51.306+0000 7f5a640c3640  5 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) run_datasync: syncm=0x563e12c24010 entry=./.git/FETCH_HEAD data_sync=1 attr_sync=1
2026-02-17T08:25:51.306+0000 7f5a640c3640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) remote_file_op: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/FETCH_HEAD, need_data_sync=1, need_attr_sync=1
2026-02-17T08:25:51.306+0000 7f5a640c3640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_changed_blocks: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/FETCH_HEAD, sync_check=1
2026-02-17T08:25:51.306+0000 7f5a640c3640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_changed_blocks: epath=./.git/FETCH_HEAD
2026-02-17T08:25:51.306+0000 7f5a640c3640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) copy_to_remote: dir_root=/ceph_repo/ceph-qa-suite, epath=./.git/FETCH_HEAD, num_blocks=1
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: Added directory to stack =./.git/logs
2026-02-17T08:25:51.306+0000 7f5a668c8640 10 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) remote_mkdir: remote epath=./.git/logs
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) do_synchronize: r=0
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) do_synchronize: epath=./.git/logs
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: sync stack size=3
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: top of stack path=./.git/logs
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) next_entry: is snapdiff
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: entry=., snapid=3
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) next_entry: is snapdiff
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: entry=.., snapid=3
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) next_entry: is snapdiff
2026-02-17T08:25:51.306+0000 7f5a668c8640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: entry=refs, snapid=4
------------

@Venky Shankar As I had mentioned over the chat, the test was failing on main without multithreading intermittently.
https://tracker.ceph.com/issues/68567 - #61068 (comment)
You did mention you fixed an issue with snapdiff while working on block diff, could you please check if this is the same or different issue?

Actions #3

Updated by Kotresh Hiremath Ravishankar about 1 month ago

Here is the analysis for second file `giant`. Between `snap_a` and `snap_b`, the snapdiff returned the file. But between `snap_b` and `snap_c` the file is not returned because of `mtime` not changes. The logs alone won't confirm it. Logs along with code reading confirms.

For snap_b sync, giant is successfully returned by snapdiff. MIRRO CLIENT LOGS below (SUCCESS CASE)

--------------------
T08:23:57.164054+0000 change_attr=1 caps=pAsLsXsFs parents=0x10000000223.3["heads" ref=2 ino=0x1000000022b.3 csg=0] 0x563e5264ca00) n = 1
2026-02-17T08:25:18.751+0000 7f5a660c7640 10 client.7438 send_request client_request(unknown.0:5483 readdir_snapdiff #0x10000000001//snap_a/.git/logs/refs/heads 2026-02-17T08:25:18.754966+0000 caller_uid=0, caller_gid=0{}) to mds.0
2026-02-17T08:25:18.751+0000 7f5a660c7640  1 -- 10.20.193.90:0/1293359666 --> [v2:10.20.193.90:6833/3272978257,v1:10.20.193.90:6834/3272978257] -- client_request(unknown.0:5483 readdir_snapdiff #0x10000000001//snap_a/.git/logs/refs/heads 2026-02-17T08:25:
18.754966+0000 caller_uid=0, caller_gid=0{}) -- 0x563e52612a80 con 0x563e55eacc00
2026-02-17T08:25:18.751+0000 7f5a660c7640 20 client.7438 awaiting reply|forward|kick on 0x7f5a660c2b90
2026-02-17T08:25:18.751+0000 7f5a738e2640  1 -- 10.20.193.90:0/1293359666 <== mds.0 v2:10.20.193.90:6833/3272978257 5493 ==== client_reply(???:5483 = 0 (0) Success safe) ==== 784+0+0 (crc 0 0 0) 0x563dfe066c00 con 0x563e55eacc00
2026-02-17T08:25:18.751+0000 7f5a738e2640 20 client.7438 handle_client_reply got a reply. Safe:1 tid 5483
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438 insert_trace from 2026-02-17T08:25:18.754981+0000 mds.0 is_target=1 is_dentry=0
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438  features 0xffffffffffffffff
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438  hrm  is_target=1 is_dentry=0
2026-02-17T08:25:18.751+0000 7f5a738e2640 12 client.7438 add_update_inode had 0x1000000022b.3(faked_ino=0 nref=9 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.162395+0000 mtime=2026-02-17T08:23:57.164054+0000 ctime=202
6-02-17T08:23:57.164054+0000 change_attr=1 caps=pAsLsXs parents=0x10000000223.3["heads" ref=1 ino=0x1000000022b.3 csg=0] 0x563e5264ca00) caps pAsLsXsFs
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438 update_inode_file_time 0x1000000022b.3(faked_ino=0 nref=9 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.162395+0000 mtime=2026-02-17T08:23:57.164054+0000 ctime=2
026-02-17T08:23:57.164054+0000 change_attr=1 caps=pAsLsXs parents=0x10000000223.3["heads" ref=1 ino=0x1000000022b.3 csg=0] 0x563e5264ca00) - ctime 2026-02-17T08:23:57.164054+0000 mtime 2026-02-17T08:23:57.164054+0000
2026-02-17T08:25:18.751+0000 7f5a738e2640 20 client.7438  dir hash is 2
2026-02-17T08:25:18.751+0000 7f5a738e2640 12 client.7438 add_update_inode client inode change_attr: 1 , mds inodestat change_attr:  1
2026-02-17T08:25:18.751+0000 7f5a738e2640 15 open_dir 0x563dff1cde00 on 0x563e5264ca00
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438 insert_readdir_results 1 readdir items, end=1, hash_order=1, readdir_start , last_hash 0, next_offset 2
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438 _readdir_drop_dirp_buffer 0x563dfe1e0e00
2026-02-17T08:25:18.751+0000 7f5a738e2640 15 client.7438 0: 'giant'
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438 update_inode_file_time 0x1000000022c.3(faked_ino=0 nref=1 ll_ref=0 cap_refs={} open={} mode=100664 size=0/0 nlink=1 btime=2026-02-17T08:23:57.164054+0000 mtime=0.000000 ctime=0.000000 change_attr=0
caps=- objectset[0x1000000022c ts 0/0 objects 0 dirty_or_tx 0] 0x563e5264d800) - ctime 2026-02-17T08:25:15.747320+0000 mtime 2026-02-17T08:25:15.747320+0000
2026-02-17T08:25:18.751+0000 7f5a738e2640 10 client.7438 size 0 -> 335
2026-02-17T08:25:18.751+0000 7f5a738e2640 12 client.7438 add_update_inode client inode change_attr: 0 , mds inodestat change_attr:  2
2026-02-17T08:25:18.751+0000 7f5a738e2640 12 client.7438 add_update_inode adding 0x1000000022c.3(faked_ino=0 nref=1 ll_ref=0 cap_refs={} open={} mode=100664 size=335/0 nlink=1 btime=2026-02-17T08:23:57.164054+0000 mtime=2026-02-17T08:25:15.747320+0000 cti
me=2026-02-17T08:25:15.747320+0000 change_attr=2 caps=- objectset[0x1000000022c ts 0/0 objects 0 dirty_or_tx 0] 0x563e5264d800) caps pLs
2026-02-17T08:25:18.751+0000 7f5a738e2640 15 client.7438 link dir 0x1000000022b.3(faked_ino=0 nref=10 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.162395+0000 mtime=2026-02-17T08:23:57.164054+0000 ctime=2026-02-17T08:
23:57.164054+0000 change_attr=1 caps=pAsLsXsFs parents=0x10000000223.3["heads" ref=2 ino=0x1000000022b.3 csg=0] 0x563e5264ca00) 'giant' to inode 0x1000000022c.3(faked_ino=0 nref=1 ll_ref=0 cap_refs={} open={} mode=100664 size=335/0 nlink=1 btime=2026-02-1
7T08:23:57.164054+0000 mtime=2026-02-17T08:25:15.747320+0000 ctime=2026-02-17T08:25:15.747320+0000 change_attr=2 caps=pLs objectset[0x1000000022c ts 0/0 objects 0 dirty_or_tx 0] 0x563e5264d800) dn 0x1000000022b.3["giant" ref=1 ino=nil csg=-1] (new dn)
2026-02-17T08:25:18.751+0000 7f5a738e2640 20 client.7438 put_inode on 0x1000000022c.3(faked_ino=0 nref=3 ll_ref=0 cap_refs={} open={} mode=100664 size=335/0 nlink=1 btime=2026-02-17T08:23:57.164054+0000 mtime=2026-02-17T08:25:15.747320+0000 ctime=2026-02-
17T08:25:15.747320+0000 change_attr=2 caps=pLs objectset[0x1000000022c ts 0/0 objects 0 dirty_or_tx 0] parents=0x1000000022b.3["giant" ref=1 ino=0x1000000022c.3 csg=-1] 0x563e5264d800) n = 1
2026-02-17T08:25:18.751+0000 7f5a738e2640 20 client.7438 link  inode 0x563e5264d800 parents now 0x1000000022b.3["giant" ref=1 ino=0x1000000022c.3 csg=-1]
2026-02-17T08:25:18.751+0000 7f5a738e2640 15 client.7438 update_dentry_lease 0x1000000022b.3["giant" ref=1 ino=0x1000000022c.3 csg=-1] lease(mask 0 dur 0) from 2026-02-17T08:25:18.754981+0000
2026-02-17T08:25:18.751+0000 7f5a738e2640 15 client.7438 insert_readdir_results  ffbbdabc0000002: 'giant' -> 0x1000000022c
2026-02-17T08:25:18.751+0000 7f5a738e2640 20 client.7438 handle_client_reply signalling caller 0x7f5a660c2b90
2026-02-17T08:25:18.751+0000 7f5a738e2640 20 client.7438 handle_client_reply awaiting kickback on tid 5483 0x7f5a738de840
2026-02-17T08:25:18.751+0000 7f5a660c7640 20 client.7438 sendrecv kickback on tid 5483 0x7f5a738de840
2026-02-17T08:25:18.751+0000 7f5a660c7640 20 client.7438 lat 2049-12-20T22:30:42.378612+0000
2026-02-17T08:25:18.751+0000 7f5a660c7640 10 client.7438 _readdir_get_frag 0x563dfe1e0e00 got frag * size 1
2026-02-17T08:25:18.751+0000 7f5a660c7640 10 client.7438 _readdir_r_cb frag * buffer size 1 offset 2
2026-02-17T08:25:18.751+0000 7f5a660c7640 10 client.7438 fill_statx on 0x1000000022c snap/dev3 mode 0100664 mtime 2026-02-17T08:25:15.747320+0000 ctime 2026-02-17T08:25:15.747320+0000 change_attr 2
2026-02-17T08:25:18.751+0000 7f5a660c7640 20 client.7438 _unwrap_name: (giant, ) on 0x1000000022b.2(faked_ino=0 nref=9 ll_ref=0 cap_refs={} open={1=1} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.162395+0000 mtime=2026-02-17T08:23:57.164054+0000
ctime=2026-02-17T08:23:57.164054+0000 change_attr=1 caps=pAsLsXsFs COMPLETE parents=0x10000000223.2["heads" ref=2 ino=0x1000000022b.2 csg=0] 0x563e3d102700)
2026-02-17T08:25:18.751+0000 7f5a660c7640 10 client.7438 fill_dirent 'giant' -> 0x1000000022c type 8 w/ next_off ffbbdabc0000003
2026-02-17T08:25:18.751+0000 7f5a660c7640 15 client.7438 _readdir_r_cb de giant off ffbbdabc0000002 snap 3 = 1
2026-02-17T08:25:18.751+0000 7f5a660c7640 15 client.7438 readdir_snapdiff 1 giant 3
2026-02-17T08:25:18.751+0000 7f5a660c7640 10 client.7438 path_walk: cur=0x10000000001.3(faked_ino=0 nref=219 ll_ref=6 cap_refs={} open={0=6} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:53.765968+0000 mtime=2026-02-17T08:24:03.332008+0000 ctime=2026
-02-17T08:25:15.770711+0000 change_attr=16 caps=pAsLsXsFs parents=0x10000000001.snapdir["snap_b" ref=3 ino=0x10000000001.3 csg=0] 0x563e523ef100) path=
2026-02-17T08:25:18.751+0000 7f5a660c7640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: entry=giant, snapid=3
2026-02-17T08:25:18.751+0000 7f5a660c7640 20 cephfs::mirror::PeerReplayer(85ef01e2-ddbc-48a0-b81f-73a105f4bf91) get_entry: epath=./.git/logs/refs/heads/giant
----------------------------

The corresponding mds logs for snap_b snapdiff for file giant. MDS RESPONSE (SUCCESS CASE)

------------------------
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.server handle_client_readdir_snapdiff frag * offset '' offset_hash 0 flags 1
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.server handle_client_readdir_snapdiff on [dir 0x1000000022b /ceph_repo/ceph-qa-suite/.git/logs/refs/heads/ [2,head] auth v=9 cv=0/0 state=1610874881|complete f(v0 m2026-02-17T08:23:57.164054+0000 1=1+0) n(v0 rc2026-02-17T08:25:15.747320+0000 b335 1=1+0) hs=1+0,ss=1+0
 dirty=2 | child=1 dirty=1 authpin=0 0x55f599dac400]
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.server handle_client_readdir_snapdiff snap 2 vs. snap 3
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.server build_snap_diff (giant,2)->[dentry #0x1/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant [2,2] auth (dversion lock) v=7 ino=0x1000000022c remote_ino=0x0 referent_inode_ptr=(nil) referent_ino=0x0 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f59b50d180]
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.server build_snap_diff (giant,head)->[dentry #0x1/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant [3,head] auth (dversion lock) v=8 ino=0x1000000022c remote_ino=0x0 referent_inode_ptr=(nil) referent_ino=0x0 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 au
thpin=0 0x55f599b0f180]
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.server build_snap_diff new file giant 3/head
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.server inc dn [dentry #0x1/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant [3,head] auth (dversion lock) v=8 ino=0x1000000022c remote_ino=0x0 referent_inode_ptr=(nil) referent_ino=0x0 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55f599b0f180]
 as giant hash 0xbbdabc 3
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.locker issue_client_lease no/null lease on [dentry #0x1/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant [3,head] auth (dversion lock) v=8 ino=0x1000000022c remote_ino=0x0 referent_inode_ptr=(nil) referent_ino=0x0 state=1610612736 | request=0 lock=0 inodepin=1 dirt
y=1 authpin=0 0x55f599b0f180]
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.server inc inode [inode 0x1000000022c [3,head] /ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant auth v8 DIRTYPARENT s=335 n(v0 rc2026-02-17T08:25:15.747320+0000 b335 1=1+0) (iauth excl) (ifile excl) (ixattr excl) cr={7336=0-4194304@3} caps={7336=pAsxLsXsxFsxcrwb/p
AsxXsxFxwb@4},l=7336 | ptrwaiter=0 request=0 lock=0 caps=1 dirtyparent=1 dirty=1 authpin=0 0x55f599e4a000] snap 3
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10  mds.0.cache.snaprealm(0x10000000001 seq 3 0x55f599c24000) get_snaps 2,3 (seq 3 cached_seq 3)
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10  mds.0.cache.snaprealm(0x10000000001 seq 3 0x55f599c24000) get_snap_info snaps 2,3
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10  mds.0.cache.snaprealm(0x1 seq 1 0x55f598c758c0) get_snaps  (seq 1 cached_seq 1)
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10  mds.0.cache.snaprealm(0x1 seq 1 0x55f598c758c0) get_snap_info snaps
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.cache.ino(0x1000000022c)  pfile 0 pauth 0 plink 0 pxattr 0 plocal 0 mtime 2026-02-17T08:25:15.747320+0000 ctime 2026-02-17T08:25:15.747320+0000 change_attr 2 valid=1
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.cache.ino(0x1000000022c) encode_inodestat caps pLs seq 0 mseq 0 xattrv 1
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.142 get_session have 0x55f599bb2800 client.7438 10.20.193.90:0/1293359666 state open
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.server reply to client_request(client.7438:5483 readdir_snapdiff #0x10000000001//snap_a/.git/logs/refs/heads 2026-02-17T08:25:18.754966+0000 caller_uid=0, caller_gid=0{}) readdir num=1 bytes=405 start=1 end=1
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.bal hit_dir 2 pop is 1.01631, frag * size 1 [pop IRD:[C 3.41e-02] IWR:[C 2.16e+00] RDR:[C 1.02e+00] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:5.4]
2026-02-17T08:25:18.751+0000 7f8fe76eb640 20 mds.0.bal hit_dir 2 pop 10.1794 spread in [dir 0x1000000022b /ceph_repo/ceph-qa-suite/.git/logs/refs/heads/ [2,head] auth v=9 cv=0/0 state=1610874881|complete f(v0 m2026-02-17T08:23:57.164054+0000 1=1+0) n(v0 rc2026-02-17T08:25:15.747320+0000 b335 1=1+0) hs=1+0,ss=1+0 dirt
y=2 | child=1 dirty=1 authpin=0 0x55f599dac400]
2026-02-17T08:25:18.751+0000 7f8fe76eb640  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.7438:5483 readdir_snapdiff #0x10000000001//snap_a/.git/logs/refs/heads 2026-02-17T08:25:18.754966+0000 caller_uid=0, caller_gid=0{})
2026-02-17T08:25:18.751+0000 7f8fe76eb640 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
-----------------------------------------

Let's analyse the same for snap_c - giant file is not returned - MIRROR CLIENT LOGS (FAILURE CASE)

------------------------
2026-02-17T08:25:51.314+0000 7f5a668c8640 10 client.7438 send_request client_request(unknown.0:9584 readdir_snapdiff #0x10000000001//snap_b/.git/logs/refs/heads 2026-02-17T08:25:51.319212+0000 caller_uid=0, caller_gid=0{}) to mds.0
2026-02-17T08:25:51.314+0000 7f5a668c8640  1 -- 10.20.193.90:0/1293359666 --> [v2:10.20.193.90:6833/3272978257,v1:10.20.193.90:6834/3272978257] -- client_request(unknown.0:9584 readdir_snapdiff #0x10000000001//snap_b/.git/logs/refs/heads 2026-02-17T08:25:51.319212+0000 caller_uid=0, caller_gid=0{}) -- 0x563e52443880 con 0x563e55eacc00
2026-02-17T08:25:51.314+0000 7f5a668c8640 20 client.7438 awaiting reply|forward|kick on 0x7f5a668c3b90
2026-02-17T08:25:51.314+0000 7f5a738e2640 20 client.7438 put_inode on 0x10000000223.3(faked_ino=0 nref=14 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.110493+0000 mtime=2026-02-17T08:23:57.162395+0000 ctime=2026-02-17T08:23:57.162395+0000 change_attr=2 caps=pAsLsXsFs parents=0x10000000222.3["refs" ref=2 ino=0x10000000223.3 csg=0] 0x563e5264c300) n = 1
2026-02-17T08:25:51.314+0000 7f5a738e2640 20 client.7438 put_inode on 0x10000000223.3(faked_ino=0 nref=14 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.110493+0000 mtime=2026-02-17T08:23:57.162395+0000 ctime=2026-02-17T08:23:57.162395+0000 change_attr=2 caps=pAsLsXsFs parents=0x10000000222.3["refs" ref=2 ino=0x10000000223.3 csg=0] 0x563e5264c300) n = 1
2026-02-17T08:25:51.314+0000 7f5a738e2640  1 -- 10.20.193.90:0/1293359666 <== mds.0 v2:10.20.193.90:6833/3272978257 9597 ==== client_reply(???:9584 = 0 (0) Success safe) ==== 403+0+0 (crc 0 0 0) 0x563e3cef8200 con 0x563e55eacc00
2026-02-17T08:25:51.314+0000 7f5a738e2640 20 client.7438 handle_client_reply got a reply. Safe:1 tid 9584
2026-02-17T08:25:51.314+0000 7f5a738e2640 10 client.7438 insert_trace from 2026-02-17T08:25:51.319225+0000 mds.0 is_target=1 is_dentry=0
2026-02-17T08:25:51.314+0000 7f5a738e2640 10 client.7438  features 0xffffffffffffffff
2026-02-17T08:25:51.314+0000 7f5a738e2640 10 client.7438  hrm  is_target=1 is_dentry=0
2026-02-17T08:25:51.314+0000 7f5a738e2640 12 client.7438 add_update_inode had 0x1000000022b.4(faked_ino=0 nref=9 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.162395+0000 mtime=2026-02-17T08:23:57.164054+0000 ctime=2026-02-17T08:23:57.164054+0000 change_attr=1 caps=pAsLsXs parents=0x10000000223.4["heads" ref=1 ino=0x1000000022b.4 csg=0] 0x563e12c40700) caps pAsLsXsFs
2026-02-17T08:25:51.314+0000 7f5a738e2640 10 client.7438 update_inode_file_time 0x1000000022b.4(faked_ino=0 nref=9 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.162395+0000 mtime=2026-02-17T08:23:57.164054+0000 ctime=2026-02-17T08:23:57.164054+0000 change_attr=1 caps=pAsLsXs parents=0x10000000223.4["heads" ref=1 ino=0x1000000022b.4 csg=0] 0x563e12c40700) - ctime 2026-02-17T08:23:57.164054+0000 mtime 2026-02-17T08:23:57.164054+0000
2026-02-17T08:25:51.314+0000 7f5a738e2640 20 client.7438  dir hash is 2
2026-02-17T08:25:51.314+0000 7f5a738e2640 12 client.7438 add_update_inode client inode change_attr: 1 , mds inodestat change_attr:  1
2026-02-17T08:25:51.314+0000 7f5a738e2640 15 open_dir 0x563dff0aa0c0 on 0x563e12c40700
2026-02-17T08:25:51.314+0000 7f5a738e2640 10 client.7438 insert_readdir_results 0 readdir items, end=1, hash_order=1, readdir_start , last_hash 0, next_offset 2
2026-02-17T08:25:51.314+0000 7f5a738e2640 10 client.7438 _readdir_drop_dirp_buffer 0x563e12c26700
2026-02-17T08:25:51.314+0000 7f5a738e2640 15 client.7438 close_dir dir 0x563dff0aa0c0 on 0x563e12c40700
2026-02-17T08:25:51.314+0000 7f5a738e2640 20 client.7438 put_inode on 0x1000000022b.4(faked_ino=0 nref=10 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2026-02-17T08:23:57.162395+0000 mtime=2026-02-17T08:23:57.164054+0000 ctime=2026-02-17T08:23:57.164054+0000 change_attr=1 caps=pAsLsXsFs parents=0x10000000223.4["heads" ref=1 ino=0x1000000022b.4 csg=0] 0x563e12c40700) n = 1
2026-02-17T08:25:51.314+0000 7f5a738e2640 20 client.7438 handle_client_reply signalling caller 0x7f5a668c3b90
2026-02-17T08:25:51.314+0000 7f5a738e2640 20 client.7438 handle_client_reply awaiting kickback on tid 9584 0x7f5a738de840
2026-02-17T08:25:51.314+0000 7f5a668c8640 20 client.7438 sendrecv kickback on tid 9584 0x7f5a738de840
2026-02-17T08:25:51.314+0000 7f5a668c8640 20 client.7438 lat 2049-12-20T22:30:42.378632+0000
2026-02-17T08:25:51.314+0000 7f5a668c8640 10 client.7438 _readdir_get_frag 0x563e12c26700 got frag * size 0
2026-02-17T08:25:51.314+0000 7f5a668c8640 10 client.7438 _readdir_r_cb frag * buffer size 0 offset 2
2026-02-17T08:25:51.314+0000 7f5a668c8640 15 client.7438 readdir_snapdiff 0 .. 3
2026-02-17T08:25:51.314+0000 7f5a668c8640  3 client.7438 closedir(0x563e12c26540) = 0

------------------

snap_c - giant file mds response - MDS RESPONSE (FAILURE CASE)

----------------
2026-02-17T08:25:51.314+0000 7f8fe76eb640 10 mds.0.server handle_client_readdir_snapdiff frag * offset '' offset_hash 0 flags 1
2026-02-17T08:25:51.314+0000 7f8fe76eb640 10 mds.0.server handle_client_readdir_snapdiff on [dir 0x1000000022b /ceph_repo/ceph-qa-suite/.git/logs/refs/heads/ [2,head] auth pv=15 v=13 cv=0/0 ap=1+1 state=1610874881|complete f(v0 m2026-02-17T08:23:57.164054
+0000 1=1+0)->f(v0 m2026-02-17T08:23:57.164054+0000 1=1+0) n(v0 rc2026-02-17T08:25:15.747320+0000 b335 1=1+0)->n(v0 rc2026-02-17T08:25:50.901948+0000 b485 1=1+0) hs=1+0,ss=2+0 dirty=3 | child=1 dirty=1 authpin=1 0x55f599dac400]
2026-02-17T08:25:51.314+0000 7f8fe76eb640 10 mds.0.server handle_client_readdir_snapdiff snap 3 vs. snap 4
2026-02-17T08:25:51.314+0000 7f8fe76eb640 20 mds.0.server build_snap_diff (giant,2)->[dentry #0x1/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant [2,2] auth (dversion lock) v=7 ino=0x1000000022c remote_ino=0x0 referent_inode_ptr=(nil) referent_ino=0x0 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f59b50d180]
2026-02-17T08:25:51.314+0000 7f8fe76eb640 20 mds.0.server build_snap_diff not in range, skipping
2026-02-17T08:25:51.314+0000 7f8fe76eb640 20 mds.0.server build_snap_diff (giant,3)->[dentry #0x1/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant [3,3] auth (dversion lock) v=11 ino=0x1000000022c remote_ino=0x0 referent_inode_ptr=(nil) referent_ino=0x0 state=1610612736 | request=0 inodepin=1 dirty=1 0x55f59a589180]
2026-02-17T08:25:51.314+0000 7f8fe76eb640 20 mds.0.server build_snap_diff (giant,head)->[dentry #0x1/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant [4,head] auth (dversion lock) pv=14 v=12 ino=0x1000000022c remote_ino=0x0 referent_inode_ptr=(nil) referent_ino=0x0 state=1610612736 | request=0 lock=0 inodepin=1 dirty=1 authpin=0 0x55f599b0f180]
2026-02-17T08:25:51.314+0000 7f8fe76eb640 20 mds.0.142 get_session have 0x55f599bb2800 client.7438 10.20.193.90:0/1293359666 state open
2026-02-17T08:25:51.314+0000 7f8fe76eb640 10 mds.0.server reply to client_request(client.7438:9584 readdir_snapdiff #0x10000000001//snap_b/.git/logs/refs/heads 2026-02-17T08:25:51.319212+0000 caller_uid=0, caller_gid=0{}) readdir num=0 bytes=24 start=1 end=1
2026-02-17T08:25:51.314+0000 7f8fe76eb640 20 mds.0.bal hit_dir 2 pop is 0.0431002, frag * size 1 [pop IRD:[C 9.85e-02] IWR:[C 1.93e+00] RDR:[C 4.31e-02] FET:[C 0.00e+00] STR:[C 0.00e+00] *LOAD:4.0]
2026-02-17T08:25:51.314+0000 7f8fe76eb640 20 mds.0.bal hit_dir 2 pop 0.474103 spread in [dir 0x1000000022b /ceph_repo/ceph-qa-suite/.git/logs/refs/heads/ [2,head] auth pv=15 v=13 cv=0/0 ap=1+1 state=1610874881|complete f(v0 m2026-02-17T08:23:57.164054+0000 1=1+0)->f(v0 m2026-02-17T08:23:57.164054+0000 1=1+0) n(v0 rc2026-02-17T08:25:15.747320+0000 b335 1=1+0)->n(v0 rc2026-02-17T08:25:50.901948+0000 b485 1=1+0) hs=1+0,ss=2+0 dirty=3 | child=1 dirty=1 authpin=1 0x55f599dac400]
2026-02-17T08:25:51.314+0000 7f8fe76eb640  7 mds.0.server reply_client_request 0 ((0) Success) client_request(client.7438:9584 readdir_snapdiff #0x10000000001//snap_b/.git/logs/refs/heads 2026-02-17T08:25:51.319212+0000 caller_uid=0, caller_gid=0{})
2026-02-17T08:25:51.314+0000 7f8fe76eb640 10 mds.0.server apply_allocated_inos 0x0 / [] / 0x0
------------------------

Actions #4

Updated by Kotresh Hiremath Ravishankar 27 days ago

  • Subject changed from qa: Test failure: test_cephfs_mirror_incremental_sync (tasks.cephfs.test_mirroring.TestMirroring) to qa: Test failure: test_cephfs_mirror_incremental_sync/test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring)
Actions #5

Updated by Kinjal Dhaduk 26 days ago

I am able to reproduce this issue with ceph 9.0 GA build.

Setup:
-------
1. Created fresh primary_fs and secondary_fs filesystems
2. Created subvolume subvol_1 under subvolgroup_1 on primary_fs
3. Mounted primary_fs at /mnt/primary and secondary_fs at /mnt/secondary
4. Enabled CephFS mirroring with bootstrap token peering
5. Added /volumes/subvolgroup_1/subvol_1 for mirroring
6. Enabled mirror daemon debug logging (debug_cephfs_mirror 20)

Steps Performed:
1. Cloned ceph-qa-suite repo (branch giant) onto primary mount:
git clone --branch giant http://github.com/ceph/ceph-qa-suite /mnt/primary/volumes/subvolgroup_1/subvol_1/ceph_repo/ceph-qa-suite
2. Created snap_a — waited for full sync to complete. Checksums matched.
3. Ran git reset --hard HEAD~7 — created snap_b — waited for incremental sync. Checksums matched.
4. Ran git pull — created snap_c — waited for incremental sync.

Result
-------

Checksum mismatch on snap_c for the same two files reported in the original issue:
< 102d3866b54d386ef781863f9573be9e .snap/snap_c/ceph_repo/ceph-qa-suite/.git/logs/HEAD

b3b247fe5260fa8d7fb22585b5d30ac5 .snap/snap_c/ceph_repo/ceph-qa-suite/.git/logs/HEAD

< 102d3866b54d386ef781863f9573be9e .snap/snap_c/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant

b3b247fe5260fa8d7fb22585b5d30ac5 .snap/snap_c/ceph_repo/ceph-qa-suite/.git/logs/refs/heads/giant

Primary (correct — 3 reflog entries):
clone: from http://github.com/ceph/ceph-qa-suite
reset: moving to HEAD~7
pull: Fast-forward

Secondary (stale — missing pull entry):
clone: from http://github.com/ceph/ceph-qa-suite
reset: moving to HEAD~7

Actions #7

Updated by Venky Shankar 26 days ago

The reason for the missing entries from snapdiff is due to the fact that the snapshot writeback is still in-progress for some of the inodes and until that finishes and the MDS does a copy-onm-write of the snap inode (omap entry), the HEAD inode for the directory entry is used by the MDS for mtime/{first, last} comparison. Since the mtime hasn't been updated yet (snapshot write back still in-progress), some directory entries get excluded from the snapdiff call.

Actions #8

Updated by Venky Shankar 19 days ago

  • Category set to Correctness/Safety
  • Status changed from New to Triaged
  • Assignee set to Venky Shankar
  • Priority changed from Normal to High
  • Target version set to v21.0.0
  • Source set to Development
  • Backport set to tentacle,squid
Actions #9

Updated by Venky Shankar 19 days ago

  • Subject changed from qa: Test failure: test_cephfs_mirror_incremental_sync/test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring) to qa: Test failure: test_cephfs_mirror_incremental_sync/test_cephfs_mirror_sync_with_purged_snapshot (snapdiff bug)
Actions

Also available in: Atom PDF