Skip to content

qa: Add retry logic to remove most sleeps in mirroring tests#67305

Merged
vshankar merged 2 commits intoceph:mainfrom
kotreshhr:qa-mirror
Feb 16, 2026
Merged

qa: Add retry logic to remove most sleeps in mirroring tests#67305
vshankar merged 2 commits intoceph:mainfrom
kotreshhr:qa-mirror

Conversation

@kotreshhr
Copy link
Contributor

@kotreshhr kotreshhr commented Feb 11, 2026

The mirroring tests contain lot of sleeps adding it up to ~1hr.
This patch adds a retry logic and removes most of them.
This is cleaner and saves considerable time in test time for mirroring.

Fixes: https://tracker.ceph.com/issues/74878

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@kotreshhr kotreshhr requested a review from vshankar February 11, 2026 13:29
@github-actions github-actions bot added cephfs Ceph File System tests labels Feb 11, 2026
@kotreshhr kotreshhr requested a review from joscollin February 11, 2026 13:30
@kotreshhr
Copy link
Contributor Author

Schedueled teuthology job with main branch:
https://pulpito.ceph.com/khiremat-2026-02-11_13:41:17-fs-main-distro-default-trial/

@kotreshhr
Copy link
Contributor Author

Schedueled teuthology job with main branch: https://pulpito.ceph.com/khiremat-2026-02-11_13:41:17-fs-main-distro-default-trial/

The test case test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapshot at /teuthology/khiremat-2026-02-11_13:41:17-fs-main-distro-default-trial/45471/teuthology.log

2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:
2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:ERROR: test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapshot (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapsh
ot)
2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:That making changes to the remote .snap directory shows 'peer status' state: "failed"
2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_a1f7622bac1204cbc5a33be93c88796b8ac961af/qa/tasks/cephfs/test_mirroring.py", line 1569, in test_cephfs_mirror_rem
ote_snap_corrupt_fails_synced_snapshot
2026-02-11T18:26:58.368 INFO:tasks.cephfs_test_runner:    while proceed():
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:          ^^^^^^^^^
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_c433f1062990a0488dc29a553589bc609a460691/teuthology/contextutil.py", line 134, in __call__
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:    raise MaxWhileTries(error_msg)
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:teuthology.exceptions.MaxWhileTries: 'wait for idle status: client.mirror_remote@ceph' reached maximum tries (60) after waiting for 60 seconds
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:Ran 14 tests in 1927.560s
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:FAILED (errors=1)
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:
2026-02-11T18:26:58.369 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-11T18:26:58.370 INFO:tasks.cephfs_test_runner:ERROR: test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapshot (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapsh
ot)

The failure was an existing timing issue and nothing related to this PR. Check the analysis below.

The mirror daemon found the state to be failed at 2026-02-11T18:25:43.382. Check the logs below (/teuthology/khiremat-2026-02-11_13:41:17-fs-main-distro-default-trial/45471/remote/trial047/log/ceph-client.mirror.18521.log.gz)

...
2026-02-11T18:25:43.382+0000 7f428f5e4640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) build_snap_map: entry=.
2026-02-11T18:25:43.382+0000 7f428f5e4640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) build_snap_map: entry=..
2026-02-11T18:25:43.382+0000 7f428f5e4640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) build_snap_map: entry=snap_a
2026-02-11T18:25:43.382+0000 7f428f5e4640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) build_snap_map: entry=snap_b
2026-02-11T18:25:43.382+0000 7f428f5e4640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) build_snap_map: snap_path=/d0/.snap/snap_a, metadata={primary_snap_id=2}
2026-02-11T18:25:43.382+0000 7f428f5e4640 -1 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) build_snap_map: snapshot 'snap_b' has invalid metadata
2026-02-11T18:25:43.382+0000 7f428f5e4640 10 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) build_snap_map: remote snap_map={2=snap_a}
2026-02-11T18:25:43.382+0000 7f428f5e4640 -1 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) do_sync_snaps: failed to build remote snap map
2026-02-11T18:25:43.382+0000 7f428f5e4640 -1 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) sync_snaps: failed to sync snapshots for dir_root=/d0
2026-02-11T18:25:43.382+0000 7f428f5e4640 10 cephfs::mirror::ServiceDaemon: 0x55b72ce4a1a0 add_or_update_peer_attribute: fscid=28
2026-02-11T18:25:43.382+0000 7f428f5e4640 10 cephfs::mirror::ServiceDaemon: 0x55b72ce4a1a0 schedule_update_status
2026-02-11T18:25:43.382+0000 7f428f5e4640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) unregister_directory: dir_root=/d0
2026-02-11T18:25:43.382+0000 7f428f5e4640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) unlock_directory: dir_root=/d0
2026-02-11T18:25:43.382+0000 7f428f5e4640 10 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) unlock_directory: dir_root=/d0 unlocked
2026-02-11T18:25:43.546+0000 7f42a8e3c640  1 -- 10.20.193.47:0/3758246215 --> [v2:10.20.193.47:6831/2690813572,v1:10.20.193.47:6835/2690813572] -- mgrreport(cephfs-mirror.4345 +0-0 packed 214) -- 0x55b72df69880 con 0x55b72f
488400
...

The test case test_cephfs_mirror_remote_snap_corrupt_fails_synced_snapshot waits for 60 seconds for the state to come back to idle after removal of remote snap. But the mirror daemon picked the directory for resync after 60 seconds at 2026-02-11T18:26:47.374 (3 seconds past 60s). Check the corresponding logs below

...
2026-02-11T18:26:47.374+0000 7f428fde5640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) run: trying to pick from 1 directories
2026-02-11T18:26:47.374+0000 7f428fde5640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) pick_directory
2026-02-11T18:26:47.374+0000 7f428fde5640  5 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) run: picked dir_root=/d0
2026-02-11T18:26:47.374+0000 7f428fde5640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) register_directory: dir_root=/d0
2026-02-11T18:26:47.374+0000 7f428fde5640 20 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) try_lock_directory: dir_root=/d0
2026-02-11T18:26:47.374+0000 7f428fde5640 10 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) try_lock_directory: dir_root=/d0 locked
2026-02-11T18:26:47.374+0000 7f428fde5640  5 cephfs::mirror::PeerReplayer(e922dcc3-aae9-4894-b6aa-47cd854a0393) register_directory: dir_root=/d0 registered with replayer=0x55b72e53ce80
2026-02-11T18:26:47.374+0000 7f428fde5640 10 client.8522 path_walk: cur=0x1.head(faked_ino=0 nref=8 ll_ref=1 cap_refs={} open={} mode=41777 size=0/0 nlink=1 btime=2026-02-11T18:24:06.137490+0000 mtime=2026-02-11T18:24:50.924774+0000 ctime=2026-02-11T18:24:50
....

I will increase the max_retry seconds to 100 but it should well complete before that.

@kotreshhr
Copy link
Contributor Author

I will increase the max_retry seconds to 100 but it should well complete before that.

Addressed with b5d02cd

@kotreshhr
Copy link
Contributor Author

@kotreshhr
Copy link
Contributor Author

kotreshhr commented Feb 12, 2026

Latest scheduled teuthology run with safe_while - https://pulpito.ceph.com/khiremat-2026-02-12_14:18:34-fs-main-distro-default-trial/

@kotreshhr
Copy link
Contributor Author

Latest scheduled teuthology run with safe_while - https://pulpito.ceph.com/khiremat-2026-02-12_14:18:34-fs-main-distro-default-trial/

Two failures -

  1. I have added an assert with the commit where full sync should take more time than incremental sync but with very limited dataset sometimes, it's taking almost same time.
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_incremental_sync (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_incremental_sync)
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:Test incremental snapshot synchronization (based on mtime differences).
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 1229, in test_cephfs_mirror_incremental_sync
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:    self.assertGreater(float(full_sync_duration), float(inc_sync_duration2))
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:AssertionError: 13.0 not greater than 13.0
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

@vshankar should we modify the data set or remove this assert for now ?

  1. I think it's not related to this changes
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:ERROR: test_cephfs_mirror_restart_sync_on_blocklist (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_restart_sync_on_blocklist)
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 838, in test_cephfs_mirror_rest
art_sync_on_blocklist
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    self.check_peer_status(self.primary_fs_name, self.primary_fs_id,
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 36, in wrapper
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    return func(*args, **kwargs)
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:           ^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 273, in check_peer_status
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    res = self.mirror_daemon_command(f'peer status for fs: {fs_name}',
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 396, in mirror_daemon_command
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    p = self.mount_a.client_remote.run(args=
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/remote.py", line 575, in run
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/run.py", line 461, in run
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:    r.wait()
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/run.py", line 161, in wait
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/run.py", line 181, in _raise_for_status
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:    raise CommandFailedError(
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed (peer status for fs: cephfs) on trial015 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok f
s mirror peer status cephfs@32 b706d3f2-e19b-4cf0-839f-3836e4561d54'
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:Ran 16 tests in 2218.853s

I think this because of admin socket init failed ?

2026-02-12T15:10:50.572+0000 192a4640 20 cephfs::mirror::InstanceWatcher handle_remove_instance: r=-108
2026-02-12T15:10:50.572+0000 e409640 20 cephfs::mirror::FSMirror handle_shutdown_instance_watcher: r=-108
2026-02-12T15:10:50.572+0000 e409640 20 cephfs::mirror::FSMirror cleanup
2026-02-12T15:10:51.556+0000 e409640 20 cephfs::mirror::FSMirror ~FSMirror
2026-02-12T15:10:51.556+0000 e409640 10 cephfs::mirror::Mirror enable_mirroring: starting FSMirror: filesystem={fscid=32, fs_name=cephfs}
2026-02-12T15:10:51.556+0000 e409640 10 cephfs::mirror::ServiceDaemon: 0x11c57840 add_or_update_fs_attribute: fscid=32
2026-02-12T15:10:51.556+0000 e409640 10 cephfs::mirror::ServiceDaemon: 0x11c57840 schedule_update_status
2026-02-12T15:10:51.556+0000 e409640 20 cephfs::mirror::FSMirror init
2026-02-12T15:10:51.556+0000 e409640 20 cephfs::mirror::Utils connect: connecting to cluster=ceph, client=client.mirror, mon_host=
2026-02-12T15:10:51.711+0000 e409640 10 cephfs::mirror::Utils connect: using mon addr=10.20.193.15
2026-02-12T15:10:51.763+0000 e409640 -1 asok(0x15a08dd0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/cephfs-mirror.asok': (17) File exists
2026-02-12T15:10:51.827+0000 e409640 10 cephfs::mirror::Utils connect: connected to cluster=ceph using client=client.mirror
2026-02-12T15:10:51.838+0000 e409640 20 cephfs::mirror::Utils mount: filesystem={fscid=32, fs_name=cephfs}
2026-02-12T15:10:52.003+0000 e409640 10 cephfs::mirror::Utils mount: mounted filesystem={fscid=32, fs_name=cephfs}
2026-02-12T15:10:52.003+0000 e409640 10 cephfs::mirror::FSMirror init: rados addrs=10.20.193.15:0/2782324879
2026-02-12T15:10:52.003+0000 e409640 20 cephfs::mirror::FSMirror init_instance_watcher

The mirroring tests contain lot of sleeps adding it up to ~1hr.
This patch adds a retry logic and removes most of them. This
is cleaner and saves considerable time in test time for mirroring.

Fixes: https://tracker.ceph.com/issues/74878
Signed-off-by: Kotresh HR <khiremat@redhat.com>
@kotreshhr
Copy link
Contributor Author

Latest scheduled teuthology run with safe_while - https://pulpito.ceph.com/khiremat-2026-02-12_14:18:34-fs-main-distro-default-trial/

Two failures -

  1. I have added an assert with the commit where full sync should take more time than incremental sync but with very limited dataset sometimes, it's taking almost same time.
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_incremental_sync (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_incremental_sync)
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:Test incremental snapshot synchronization (based on mtime differences).
2026-02-12T15:00:30.009 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 1229, in test_cephfs_mirror_incremental_sync
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:    self.assertGreater(float(full_sync_duration), float(inc_sync_duration2))
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:AssertionError: 13.0 not greater than 13.0
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:
2026-02-12T15:00:30.010 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------

@vshankar should we modify the data set or remove this assert for now ?

I have reduced the git reset head count. Hopefully this should fix it.
Earlier it was HEAD~{5..20}, now it is HEAD~{5..10}

  1. I think it's not related to this changes
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:======================================================================
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:ERROR: test_cephfs_mirror_restart_sync_on_blocklist (tasks.cephfs.test_mirroring.TestMirroring.test_cephfs_mirror_restart_sync_on_blocklist)
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2026-02-12T15:11:02.704 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 838, in test_cephfs_mirror_rest
art_sync_on_blocklist
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    self.check_peer_status(self.primary_fs_name, self.primary_fs_id,
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 36, in wrapper
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    return func(*args, **kwargs)
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:           ^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 273, in check_peer_status
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    res = self.mirror_daemon_command(f'peer status for fs: {fs_name}',
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_kotreshhr_ceph_25e1648ccec6b660c76b05e7479372cc63a27130/qa/tasks/cephfs/test_mirroring.py", line 396, in mirror_daemon_command
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    p = self.mount_a.client_remote.run(args=
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/remote.py", line 575, in run
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:    r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-02-12T15:11:02.705 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/run.py", line 461, in run
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:    r.wait()
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/run.py", line 161, in wait
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:    self._raise_for_status()
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_teuthology_8bec0da71becad44414c54979f64c9ef0e7099c6/teuthology/orchestra/run.py", line 181, in _raise_for_status
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:    raise CommandFailedError(
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed (peer status for fs: cephfs) on trial015 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok f
s mirror peer status cephfs@32 b706d3f2-e19b-4cf0-839f-3836e4561d54'
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2026-02-12T15:11:02.706 INFO:tasks.cephfs_test_runner:Ran 16 tests in 2218.853s

I think this because of admin socket init failed ?

2026-02-12T15:10:50.572+0000 192a4640 20 cephfs::mirror::InstanceWatcher handle_remove_instance: r=-108
2026-02-12T15:10:50.572+0000 e409640 20 cephfs::mirror::FSMirror handle_shutdown_instance_watcher: r=-108
2026-02-12T15:10:50.572+0000 e409640 20 cephfs::mirror::FSMirror cleanup
2026-02-12T15:10:51.556+0000 e409640 20 cephfs::mirror::FSMirror ~FSMirror
2026-02-12T15:10:51.556+0000 e409640 10 cephfs::mirror::Mirror enable_mirroring: starting FSMirror: filesystem={fscid=32, fs_name=cephfs}
2026-02-12T15:10:51.556+0000 e409640 10 cephfs::mirror::ServiceDaemon: 0x11c57840 add_or_update_fs_attribute: fscid=32
2026-02-12T15:10:51.556+0000 e409640 10 cephfs::mirror::ServiceDaemon: 0x11c57840 schedule_update_status
2026-02-12T15:10:51.556+0000 e409640 20 cephfs::mirror::FSMirror init
2026-02-12T15:10:51.556+0000 e409640 20 cephfs::mirror::Utils connect: connecting to cluster=ceph, client=client.mirror, mon_host=
2026-02-12T15:10:51.711+0000 e409640 10 cephfs::mirror::Utils connect: using mon addr=10.20.193.15
2026-02-12T15:10:51.763+0000 e409640 -1 asok(0x15a08dd0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/cephfs-mirror.asok': (17) File exists
2026-02-12T15:10:51.827+0000 e409640 10 cephfs::mirror::Utils connect: connected to cluster=ceph using client=client.mirror
2026-02-12T15:10:51.838+0000 e409640 20 cephfs::mirror::Utils mount: filesystem={fscid=32, fs_name=cephfs}
2026-02-12T15:10:52.003+0000 e409640 10 cephfs::mirror::Utils mount: mounted filesystem={fscid=32, fs_name=cephfs}
2026-02-12T15:10:52.003+0000 e409640 10 cephfs::mirror::FSMirror init: rados addrs=10.20.193.15:0/2782324879
2026-02-12T15:10:52.003+0000 e409640 20 cephfs::mirror::FSMirror init_instance_watcher

@kotreshhr
Copy link
Contributor Author

Latest QA Run (test_cephfs_mirror_incremental_sync fix - HEAD~{5..10}) - https://pulpito.ceph.com/khiremat-2026-02-12_18:18:49-fs-main-distro-default-trial/

@vshankar
Copy link
Contributor

@vshankar should we modify the data set or remove this assert for now ?

For small file sizes, using blockdiff will certainly not result in better sync times, so the sync time will be closer to full transfer. IMO, lets remove the assert.

@vshankar
Copy link
Contributor

Latest QA Run (test_cephfs_mirror_incremental_sync fix - HEAD~{5..10}) - https://pulpito.ceph.com/khiremat-2026-02-12_18:18:49-fs-main-distro-default-trial/

Nice 👍

@kotreshhr
Copy link
Contributor Author

kotreshhr commented Feb 13, 2026

@vshankar should we modify the data set or remove this assert for now ?

For small file sizes, using blockdiff will certainly not result in better sync times, so the sync time will be closer to full transfer. IMO, lets remove the assert.

With the change of git reset HEAD~{5..10} for the test, I think it should be fine?

@vshankar
Copy link
Contributor

@vshankar should we modify the data set or remove this assert for now ?

For small file sizes, using blockdiff will certainly not result in better sync times, so the sync time will be closer to full transfer. IMO, lets remove the assert.

With the change of git reset HEAD~{5..10} for the test, I think it should be fine?

Yeh.

@vshankar vshankar merged commit 55f8a3e into ceph:main Feb 16, 2026
13 checks passed
@github-actions
Copy link

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/22051934083

@github-actions
Copy link

This is an automated message by src/script/redmine-upkeep.py.

I have resolved the following tracker ticket due to the merge of this PR:

No backports are pending for the ticket. If this is incorrect, please update the tracker
ticket and reset to Pending Backport state.

Update Log: https://github.com/ceph/ceph/actions/runs/22051934083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants