Bug #64711
closedTest failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)
0%
Description
/a/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580933
Probably a racy check
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:======================================================================
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:Test adding a directory path for synchronization post removal of already added directory paths
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_717ce3cce8d0166e4e577faa8004e6f9cb4128c0/qa/tasks/cephfs/test_mirroring.py", line 1276, in test_cephfs_mirror_cancel_mirroring_and_readd
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: self.check_peer_snap_in_progress(self.primary_fs_name, self.primary_fs_id,
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_717ce3cce8d0166e4e577faa8004e6f9cb4128c0/qa/tasks/cephfs/test_mirroring.py", line 231, in check_peer_snap_in_progress
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner: self.assertTrue('syncing' == res[dir_name]['state'])
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner:AssertionError: False is not true
Files
Updated by Jos Collin about 2 years ago
The check_peer_snap_in_progress() doesn't wait for 'syncing' to appear. It just checks the state at the moment and returns. Probably, it returned before the replayer sets the 'syncing'.
Updated by Jos Collin about 2 years ago
2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout:{
2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout: "/d0": {
2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout: "state": "idle",
2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_synced": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_deleted": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_renamed": 0
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: },
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "/d1": {
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "state": "idle",
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_synced": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_deleted": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_renamed": 0
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: },
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "/d2": {
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "state": "syncing",
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "current_sycning_snap": {
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "id": 4,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: "name": "snap0"
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout: },
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_synced": 0,
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_deleted": 0,
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout: "snaps_renamed": 0
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout: }
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout:}
2024-03-04T10:55:59.124 DEBUG:tasks.cephfs.test_mirroring:command returned={
"/d0": {
"state": "idle",
"snaps_synced": 0,
"snaps_deleted": 0,
"snaps_renamed": 0
},
"/d1": {
"state": "idle",
"snaps_synced": 0,
"snaps_deleted": 0,
"snaps_renamed": 0
},
"/d2": {
"state": "syncing",
"current_sycning_snap": {
"id": 4,
"name": "snap0"
},
"snaps_synced": 0,
"snaps_deleted": 0,
"snaps_renamed": 0
}
}
The snapshots are taken in the below order:
# take snapshots
log.debug('taking snapshots')
self.mount_a.run_shell(["mkdir", "d0/.snap/snap0"])
self.mount_a.run_shell(["mkdir", "d1/.snap/snap0"])
self.mount_a.run_shell(["mkdir", "d2/.snap/snap0"])
Thus the d2 started "syncing" first, but the others (d0 and d1) didn't start syncing when we check the status of d0 first and check_peer_snap_in_progress asserts.
Updated by Jos Collin about 2 years ago
- File unnamed.png unnamed.png added
2024-03-04T10:55:50.942+0000 2c66b640 -1 cephfs::mirror::PeerReplayer(e9c703c7-88c9-40ed-b3f8-ff678976a731) should_sync_entry: \
failed to stat prev entry= ./file.2: (9) Bad file descriptor
2024-03-04T10:55:52.370+0000 2ce6c640 -1 cephfs::mirror::PeerReplayer(e9c703c7-88c9-40ed-b3f8-ff678976a731) sync_snaps: failed \
to sync snapshots for dir_root=/d0
Notice (the screenshot) the difference in should_sync_entry. First, it had r=-2, which is good for syncing d2. The next call to should_sync_entry for syncing d1, says 'Bad file descriptor'. The next one for syncing d0 shows r=-2. Looks like there's a race. Should we take a lock at the start of should_sync_entry?
Updated by Jos Collin about 2 years ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 56118
Updated by Venky Shankar almost 2 years ago
Also seen in a Quincy run: /a/vshankar-2024-03-14_16:52:41-fs-wip-vshankar-testing1-quincy-2024-03-14-0655-quincy-testing-default-smithi/7600853
Updated by Xiubo Li over 1 year ago
Updated by Milind Changire over 1 year ago
Updated by Venky Shankar over 1 year ago
- Category set to Testing
- Status changed from Fix Under Review to Pending Backport
- Source set to Q/A
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68407: quincy: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring) added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68408: squid: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring) added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #68409: reef: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring) added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Jos Collin over 1 year ago
Updated by Jos Collin about 1 year ago
Milind Changire wrote in #note-19:
This is a different failure, which is tracked here: https://tracker.ceph.com/issues/68567
Updated by Jos Collin about 1 year ago
Updated by Jos Collin about 1 year ago
Milind Changire wrote in #note-23:
This^ is not a related failure. As I checked the logs, I see the below:
2025-01-10T15:08:06.775 DEBUG:teuthology.orchestra.run.smithi130:peer status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@14 8b7edff2-9c26-4472-a85f-7ec328bfca99 2025-01-10T15:08:06.957 INFO:teuthology.orchestra.run.smithi130.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused 2025-01-10T15:08:06.966 DEBUG:teuthology.orchestra.run:got remote process result: 22 2025-01-10T15:08:06.967 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "peer status for fs: cephfs" failed: Command failed (peer status for fs: cephfs) on smithi130 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@14 8b7edff2-9c26-4472-a85f-7ec32\ 8bfca99'
Updated by Jos Collin about 1 year ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 635b5541074bfded98ec1029942f11f07171edef
- Fixed In set to v19.3.0-5374-g635b5541074
- Upkeep Timestamp set to 2025-07-02T03:31:45+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-5374-g635b5541074 to v19.3.0-5374-g635b554107
- Upkeep Timestamp changed from 2025-07-02T03:31:45+00:00 to 2025-07-14T16:45:20+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~1863
- Upkeep Timestamp changed from 2025-07-14T16:45:20+00:00 to 2025-11-01T01:26:56+00:00