Bug #69671
closedTest failure: test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring)
0%
Description
2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner: 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner:====================================================================== 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring) 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner:Test snapshot synchronization in midst of snapshot deletes. 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_07e0ae5452f4466a05fdca6842fb4a6be2c109c3/qa/tasks/cephfs/test_mirroring.py", line 1359, in test_cephfs_mirror_sync_with_purged_snapshot 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner: self.verify_snapshot(repo_path, 'snap_b') 2025-01-16T08:40:08.008 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_07e0ae5452f4466a05fdca6842fb4a6be2c109c3/qa/tasks/cephfs/test_mirroring.py", line 256, in verify_snapshot 2025-01-16T08:40:08.009 INFO:tasks.cephfs_test_runner: self.assertTrue(source_res == dest_res) 2025-01-16T08:40:08.009 INFO:tasks.cephfs_test_runner:AssertionError: False is not true 2025-01-16T08:40:08.009 INFO:tasks.cephfs_test_runner: 2025-01-16T08:40:08.009 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
Seems to be happening from start of 2025 - some PR that was merged perhaps?
vshankar-2025-01-05_17:41:45-fs-wip-vshankar-testing-20250105.140830-squid-debug-testing-default-smithi/8062363/teuthology.log:2025-01-10T11:44:02.078 INFO:tasks.cephfs_test_runner:Starting test: test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring) vshankar-2025-01-05_17:41:45-fs-wip-vshankar-testing-20250105.140830-squid-debug-testing-default-smithi/8062363/teuthology.log:2025-01-10T11:44:02.078 INFO:tasks.cephfs_test_runner:test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring) vshankar-2025-01-05_17:41:45-fs-wip-vshankar-testing-20250105.140830-squid-debug-testing-default-smithi/8062363/teuthology.log:2025-01-10T11:44:02.957 DEBUG:teuthology.orchestra.run.smithi156:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph log 'Starting test tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_sync_with_purged_snapshot' vshankar-2025-01-05_17:41:45-fs-wip-vshankar-testing-20250105.140830-squid-debug-testing-default-smithi/8062363/teuthology.log:2025-01-10T12:00:05.776 DEBUG:teuthology.orchestra.run.smithi156:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph log 'Ended test tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_sync_with_purged_snapshot' vshankar-2025-01-05_17:41:45-fs-wip-vshankar-testing-20250105.140830-squid-debug-testing-default-smithi/8062363/teuthology.log:2025-01-10T12:00:06.479 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring)
Updated by Jos Collin about 1 year ago · Edited
- File Screenshot_2025-01-27_14-09-03.jpg added
- Status changed from New to In Progress
Updated by Jos Collin about 1 year ago
- Status changed from In Progress to Fix Under Review
Updated by Jos Collin about 1 year ago
- File deleted (
Screenshot_2025-01-27_14-09-03.jpg)
Updated by Jos Collin about 1 year ago
This is a regression caused by https://github.com/ceph/ceph/pull/60667/commits/7a747bc55381b840d76202c9897f7a0ab8d143b3.
cc @Igor Fedotov
https://github.com/ceph/ceph/pull/61539 fixes it.
Updated by Milind Changire about 1 year ago
Updated by Venky Shankar 12 months ago · Edited
From the mirror daemon logs for /a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, snap_b was synchronized using snapdiff sync mechanism and the sync finished successfully.
2025-01-16T08:34:09.955+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=tasks/util/test 2025-01-16T08:34:09.959+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: 2 entries in queue 2025-01-16T08:34:09.959+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=.git/logs/refs/heads 2025-01-16T08:34:09.963+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: 1 entries in queue 2025-01-16T08:34:09.963+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=.git/logs/refs/remotes 2025-01-16T08:34:09.967+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize current:10 prev:11 ret = 0
I think the checksum mismatch is not due to the previous (snap_a) being deleted and the mirror daemon incorrectly not switching to remote dir_root. Its likely that the issue was that th mirror daemon was considering changes from live/head version which https://github.com/ceph/ceph/pull/61539/commits/4c1968eed41870bdcf48e7daeacccafede295f16 fixes?
Updated by Jos Collin 12 months ago
Venky Shankar wrote in #note-7:
From the mirror daemon logs for
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, snap_b was synchronized using snapdiff sync mechanism and the sync finished successfully.
wip-vshankar-testing-20250115.164543-debug doesn't have SyncMechanism. Please check.
[...]
I think the checksum mismatch is not due to the previous (
snap_a) being deleted and the mirror daemon incorrectly not switching to remote dir_root. Its likely that the issue was that th mirror daemon was considering changes from live/head version which https://github.com/ceph/ceph/pull/61539/commits/4c1968eed41870bdcf48e7daeacccafede295f16 fixes?
In /a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, test_cephfs_mirror_sync_with_purged_snapshot failed because it didn't switch to the remote dir_root.
wip-vshankar-testing-20250115.164543-debug doesn't have the fix https://github.com/ceph/ceph/pull/61539/commits/1c6bd688dc6e3b9da0b33f3e1c3de76bbae97cc9 too.
Updated by Venky Shankar 12 months ago
Jos Collin wrote in #note-8:
Venky Shankar wrote in #note-7:
From the mirror daemon logs for
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, snap_b was synchronized using snapdiff sync mechanism and the sync finished successfully.wip-vshankar-testing-20250115.164543-debug doesn't have
SyncMechanism. Please check.[...]
I think the checksum mismatch is not due to the previous (
snap_a) being deleted and the mirror daemon incorrectly not switching to remote dir_root. Its likely that the issue was that th mirror daemon was considering changes from live/head version which https://github.com/ceph/ceph/pull/61539/commits/4c1968eed41870bdcf48e7daeacccafede295f16 fixes?In
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, test_cephfs_mirror_sync_with_purged_snapshot failed because it didn't switch to the remote dir_root.
wip-vshankar-testing-20250115.164543-debug doesn't have the fix https://github.com/ceph/ceph/pull/61539/commits/1c6bd688dc6e3b9da0b33f3e1c3de76bbae97cc9 too.
In this run too, I see the snapshot being synchronized to completion using incremental sync:
2025-01-16T08:34:09.943+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=suites/multimds/basic 2025-01-16T08:34:09.947+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: 5 entries in queue 2025-01-16T08:34:09.947+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=suites/multimds/verify 2025-01-16T08:34:09.951+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: 4 entries in queue 2025-01-16T08:34:09.951+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=suites/smoke/basic 2025-01-16T08:34:09.955+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: 3 entries in queue 2025-01-16T08:34:09.955+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=tasks/util/test 2025-01-16T08:34:09.959+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: 2 entries in queue 2025-01-16T08:34:09.959+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=.git/logs/refs/heads 2025-01-16T08:34:09.963+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: 1 entries in queue 2025-01-16T08:34:09.963+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize: syncing entry, path=.git/logs/refs/remotes 2025-01-16T08:34:09.967+0000 7f43e0d64640 20 cephfs::mirror::PeerReplayer(de5c659c-c58b-460b-9af4-9ac22b2101f8) do_synchronize current:10 prev:11 ret = 0
Could you point me to the section in the mirror daemon logs (or better, paste the relevant log entries here) which show that the mirror daemon incorrectly not switching to remote dir_root?
Updated by Jos Collin 12 months ago
Venky Shankar wrote in #note-9:
Jos Collin wrote in #note-8:
Venky Shankar wrote in #note-7:
From the mirror daemon logs for
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, snap_b was synchronized using snapdiff sync mechanism and the sync finished successfully.wip-vshankar-testing-20250115.164543-debug doesn't have
SyncMechanism. Please check.[...]
I think the checksum mismatch is not due to the previous (
snap_a) being deleted and the mirror daemon incorrectly not switching to remote dir_root. Its likely that the issue was that th mirror daemon was considering changes from live/head version which https://github.com/ceph/ceph/pull/61539/commits/4c1968eed41870bdcf48e7daeacccafede295f16 fixes?In
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, test_cephfs_mirror_sync_with_purged_snapshot failed because it didn't switch to the remote dir_root.
wip-vshankar-testing-20250115.164543-debug doesn't have the fix https://github.com/ceph/ceph/pull/61539/commits/1c6bd688dc6e3b9da0b33f3e1c3de76bbae97cc9 too.In this run too, I see the snapshot being synchronized to completion using incremental sync:
[...]
Could you point me to the section in the mirror daemon logs (or better, paste the relevant log entries here) which show that the mirror daemon incorrectly not switching to remote dir_root?
Please clarify which code are you testing in /a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701. The old code or the fix in https://github.com/ceph/ceph/pull/61539 or https://github.com/ceph/ceph/pull/62250. I saw none of the fixes, but only the old code in wip-vshankar-testing-20250115.164543-debug and it fails obviously.
Updated by Venky Shankar 12 months ago
Jos Collin wrote in #note-10:
Venky Shankar wrote in #note-9:
Jos Collin wrote in #note-8:
Venky Shankar wrote in #note-7:
From the mirror daemon logs for
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, snap_b was synchronized using snapdiff sync mechanism and the sync finished successfully.wip-vshankar-testing-20250115.164543-debug doesn't have
SyncMechanism. Please check.[...]
I think the checksum mismatch is not due to the previous (
snap_a) being deleted and the mirror daemon incorrectly not switching to remote dir_root. Its likely that the issue was that th mirror daemon was considering changes from live/head version which https://github.com/ceph/ceph/pull/61539/commits/4c1968eed41870bdcf48e7daeacccafede295f16 fixes?In
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, test_cephfs_mirror_sync_with_purged_snapshot failed because it didn't switch to the remote dir_root.
wip-vshankar-testing-20250115.164543-debug doesn't have the fix https://github.com/ceph/ceph/pull/61539/commits/1c6bd688dc6e3b9da0b33f3e1c3de76bbae97cc9 too.In this run too, I see the snapshot being synchronized to completion using incremental sync:
[...]
Could you point me to the section in the mirror daemon logs (or better, paste the relevant log entries here) which show that the mirror daemon incorrectly not switching to remote dir_root?
Please clarify which code are you testing in
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701. The old code or the fix in https://github.com/ceph/ceph/pull/61539 or https://github.com/ceph/ceph/pull/62250. I saw none of the fixes, but only the old code in wip-vshankar-testing-20250115.164543-debug and it fails obviously.
I'm looking at logs in /a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701 so whatever build ran that.
Updated by Jos Collin 12 months ago
Venky Shankar wrote in #note-11:
Jos Collin wrote in #note-10:
Venky Shankar wrote in #note-9:
Jos Collin wrote in #note-8:
Venky Shankar wrote in #note-7:
From the mirror daemon logs for
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, snap_b was synchronized using snapdiff sync mechanism and the sync finished successfully.wip-vshankar-testing-20250115.164543-debug doesn't have
SyncMechanism. Please check.[...]
I think the checksum mismatch is not due to the previous (
snap_a) being deleted and the mirror daemon incorrectly not switching to remote dir_root. Its likely that the issue was that th mirror daemon was considering changes from live/head version which https://github.com/ceph/ceph/pull/61539/commits/4c1968eed41870bdcf48e7daeacccafede295f16 fixes?In
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, test_cephfs_mirror_sync_with_purged_snapshot failed because it didn't switch to the remote dir_root.
wip-vshankar-testing-20250115.164543-debug doesn't have the fix https://github.com/ceph/ceph/pull/61539/commits/1c6bd688dc6e3b9da0b33f3e1c3de76bbae97cc9 too.In this run too, I see the snapshot being synchronized to completion using incremental sync:
[...]
Could you point me to the section in the mirror daemon logs (or better, paste the relevant log entries here) which show that the mirror daemon incorrectly not switching to remote dir_root?
First of all, there's no switching functionality in the do_synchronize code. So there's no else case log. By reading test_cephfs_mirror_sync_with_purged_snapshot and do_synchronize code we could understand that the switch never happened.
From the logs, this in particular, after the snap_a removal:
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink dir 0x5595d0c18600 'snap_a' dn 0x5595e26cb420 inode 0
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink removing 'snap_a' dn 0x5595e26cb420
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 trim_dentry unlinking dn ceph-qa-suite in dir 0x10000000000
2025-01-16T08:40:00.033+0000 7f43fdd9e640 10 client.10438 clearing (I_COMPLETE|I_DIR_ORDERED) on 0x10000000000.head(faked_ino=0 nref=5 ll_ref=0 cap_refs={1024=0} open={} mode=40755 size=0/0 nlink=1 btime=2025-01-16T08:24:34.819832+0000 mtime=2025-01-16T08:24:34.828410+0000 ctime=2025-01-16T08:24:34.828410+0000 change_attr=1 caps=pAsLsXsFs(0=pAsLsXsFs) COMPLETE \
parents=0x1.head["ceph_repo"] 0x5595c7ab9200)
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink dir 0x5595c7ab9200 'ceph-qa-suite' dn 0x5595e2508000 inode 0x5595c7b40a00
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 put_inode on 0x10000000001.head(faked_ino=0 nref=7 ll_ref=0 cap_refs={1024=0} open={0=0,1=0} mode=40775 size=0/0 nlink=1 btime=2025-01-16T08:24:34.828410+0000 mtime=2025-01-16T08:25:19.095541+0000 ctime=2025-01-16T08:34:10.757365+0000 change_attr=275 caps=pAsLsXsFs(0=pAsLsXsFs) 0x5595c7b40a00) n = 1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 unlink inode 0x5595c7b40a00 parents now
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink removing 'ceph-qa-suite' dn 0x5595e2508000
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 close_dir dir 0x5595c734eea0 on 0x5595c7ab9200
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 put_inode on 0x10000000000.head(faked_ino=0 nref=5 ll_ref=0 cap_refs={1024=0} open={} mode=40755 size=0/0 nlink=1 btime=2025-01-16T08:24:34.819832+0000 mtime=2025-01-16T08:24:34.828410+0000 ctime=2025-01-16T08:24:34.828410+0000 change_attr=1 caps=pAsLsXsFs(0=pAsLsXsFs) parents=0x1.head["ceph_repo"] 0x5595\
c7ab9200) n = 1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 put_inode on 0x10000000001.head(faked_ino=0 nref=7 ll_ref=0 cap_refs={1024=0} open={0=0,1=0} mode=40775 size=0/0 nlink=1 btime=2025-01-16T08:24:34.828410+0000 mtime=2025-01-16T08:25:19.095541+0000 ctime=2025-01-16T08:34:10.757365+0000 change_attr=275 caps=pAsLsXsFs(0=pAsLsXsFs) 0x5595c7b40a00) n = 1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 trim_dentry unlinking dn ceph_repo in dir 0x1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink dir 0x5595de74e600 'ceph_repo' dn 0x5595e0587500 inode 0x5595c7ab9200
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 put_inode on 0x10000000000.head(faked_ino=0 nref=6 ll_ref=0 cap_refs={1024=0} open={} mode=40755 size=0/0 nlink=1 btime=2025-01-16T08:24:34.819832+0000 mtime=2025-01-16T08:24:34.828410+0000 ctime=2025-01-16T08:24:34.828410+0000 change_attr=1 caps=pAsLsXsFs(0=pAsLsXsFs) 0x5595c7ab9200) n = 1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 unlink inode 0x5595c7ab9200 parents now
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink removing 'ceph_repo' dn 0x5595e0587500
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 close_dir dir 0x5595c734f2c0 on 0x5595de74e600
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 put_inode on 0x1.head(faked_ino=0 nref=8 ll_ref=0 cap_refs={1024=0} open={} mode=41777 size=0/0 nlink=1 btime=2025-01-16T08:23:41.081195+0000 mtime=2025-01-16T08:24:34.819832+0000 ctime=2025-01-16T08:24:34.819832+0000 change_attr=3 caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x5595de74e600) n = 1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 put_inode on 0x10000000000.head(faked_ino=0 nref=6 ll_ref=0 cap_refs={1024=0} open={} mode=40755 size=0/0 nlink=1 btime=2025-01-16T08:24:34.819832+0000 mtime=2025-01-16T08:24:34.828410+0000 ctime=2025-01-16T08:24:34.828410+0000 change_attr=1 caps=pAsLsXsFs(0=pAsLsXsFs) 0x5595c7ab9200) n = 1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 trim_dentry unlinking dn snap_b in dir 0x10000000001
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink dir 0x5595d0c18600 'snap_b' dn 0x5595c7df5c00 inode 0x5595e06e4600
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 put_inode on 0x10000000001.3(faked_ino=0 nref=4 ll_ref=0 cap_refs={} open={} mode=40775 size=0/0 nlink=1 btime=2025-01-16T08:24:34.828410+0000 mtime=2025-01-16T08:25:19.095541+0000 ctime=2025-01-16T08:34:09.974748+0000 change_attr=19 caps=pAsLsXsFs 0x5595e06e4600) n = 1
2025-01-16T08:40:00.033+0000 7f43fdd9e640 20 client.10438 unlink inode 0x5595e06e4600 parents now
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 unlink removing 'snap_b' dn 0x5595c7df5c00
2025-01-16T08:40:00.033+0000 7f43fdd9e640 15 client.10438 close_dir dir 0x55960d2b8d20 on 0x5595d0c18600
This looks like ending the test (we don't remove snap_b otherwise), when it's expected to continue the sync using remote dir_root.
So it should be a partial snap_b on the remote and thus the checksum failed.
This also needs line by line comparison of snap_b containing items (local and remote) from teuthology.log, which is a huge list.
For the record, pasting only the checksum:
2025-01-16T08:39:34.238 DEBUG:tasks.cephfs.test_mirroring:source snapshot checksum snap_b f40bfe57ce7617363fa46f1b22f6088d 2025-01-16T08:39:53.476 DEBUG:tasks.cephfs.test_mirroring:destination snapshot checksum snap_b de66db2fdfd291e87cf6104add4effdc
Updated by Jos Collin 12 months ago
- Pull request ID changed from 61539 to 62250
The commit that fixes this issue in https://github.com/ceph/ceph/pull/61539 is moved to https://github.com/ceph/ceph/pull/62250.
Updated by Venky Shankar 11 months ago
Jos Collin wrote in #note-12:
Venky Shankar wrote in #note-11:
Jos Collin wrote in #note-10:
Venky Shankar wrote in #note-9:
Jos Collin wrote in #note-8:
Venky Shankar wrote in #note-7:
From the mirror daemon logs for
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, snap_b was synchronized using snapdiff sync mechanism and the sync finished successfully.wip-vshankar-testing-20250115.164543-debug doesn't have
SyncMechanism. Please check.[...]
I think the checksum mismatch is not due to the previous (
snap_a) being deleted and the mirror daemon incorrectly not switching to remote dir_root. Its likely that the issue was that th mirror daemon was considering changes from live/head version which https://github.com/ceph/ceph/pull/61539/commits/4c1968eed41870bdcf48e7daeacccafede295f16 fixes?In
/a/vshankar-2025-01-16_05:00:30-fs-wip-vshankar-testing-20250115.164543-debug-testing-default-smithi/8079701, test_cephfs_mirror_sync_with_purged_snapshot failed because it didn't switch to the remote dir_root.
wip-vshankar-testing-20250115.164543-debug doesn't have the fix https://github.com/ceph/ceph/pull/61539/commits/1c6bd688dc6e3b9da0b33f3e1c3de76bbae97cc9 too.In this run too, I see the snapshot being synchronized to completion using incremental sync:
[...]
Could you point me to the section in the mirror daemon logs (or better, paste the relevant log entries here) which show that the mirror daemon incorrectly not switching to remote dir_root?
First of all, there's no switching functionality in the do_synchronize code. So there's no else case log. By reading
test_cephfs_mirror_sync_with_purged_snapshotand do_synchronize code we could understand that the switch never happened.From the logs, this in particular, after the snap_a removal:
[...]This looks like ending the test (we don't remove snap_b otherwise), when it's expected to continue the sync using remote dir_root.
So it should be a partial snap_b on the remote and thus the checksum failed.
The above logs are client logs. What I asked was to share mirror daemon logs that show the switch not happening to remote dir_root and incorrectly synchronizing snapshots.
Updated by Venky Shankar 11 months ago
- Related to Bug #68567: cephfs/qa : Test failure: test_cephfs_mirror_incremental_sync (tasks.cephfs.test_mirroring.TestMirroring) added
Updated by Jos Collin 10 months ago
- Backport changed from reef,squid to reef,squid,tentacle
Updated by Jos Collin 10 months ago
- Status changed from Fix Under Review to Pending Backport
Updated by Upkeep Bot 10 months ago
- Copied to Backport #71296: reef: Test failure: test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring) added
Updated by Upkeep Bot 10 months ago
- Copied to Backport #71297: tentacle: Test failure: test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring) added
Updated by Upkeep Bot 10 months ago
- Copied to Backport #71298: squid: Test failure: test_cephfs_mirror_sync_with_purged_snapshot (tasks.cephfs.test_mirroring.TestMirroring) added
Updated by Upkeep Bot 8 months ago
- Merge Commit set to 53e32489fb72b7e12c93b7c413dee198cc1fbae5
- Fixed In set to v20.3.0-302-g53e32489fb7
- Upkeep Timestamp set to 2025-07-08T18:12:13+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.3.0-302-g53e32489fb7 to v20.3.0-302-g53e32489fb72
- Upkeep Timestamp changed from 2025-07-08T18:12:13+00:00 to 2025-07-14T15:22:25+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.3.0-302-g53e32489fb72 to v20.3.0-302-g53e32489fb
- Upkeep Timestamp changed from 2025-07-14T15:22:25+00:00 to 2025-07-14T21:08:27+00:00
Updated by Jos Collin 8 months ago
- Status changed from Pending Backport to Resolved