Project

General

Profile

Actions

Bug #64711

closed

Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)

Added by Venky Shankar about 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Testing
Target version:
% Done:

0%

Source:
Q/A
Backport:
quincy,reef,squid
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
cephfs-mirror
Labels (FS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-5374-g635b554107
Released In:
v20.2.0~1863
Upkeep Timestamp:
2025-11-01T01:26:56+00:00

Description

/a/vshankar-2024-03-04_08:26:39-fs-wip-vshankar-testing-20240304.042522-testing-default-smithi/7580933

Probably a racy check

2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:======================================================================
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:FAIL: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:Test adding a directory path for synchronization post removal of already added directory paths
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-04T10:56:28.494 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_717ce3cce8d0166e4e577faa8004e6f9cb4128c0/qa/tasks/cephfs/test_mirroring.py", line 1276, in test_cephfs_mirror_cancel_mirroring_and_readd
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner:    self.check_peer_snap_in_progress(self.primary_fs_name, self.primary_fs_id,
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/git.ceph.com_ceph-c_717ce3cce8d0166e4e577faa8004e6f9cb4128c0/qa/tasks/cephfs/test_mirroring.py", line 231, in check_peer_snap_in_progress
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner:    self.assertTrue('syncing' == res[dir_name]['state'])
2024-03-04T10:56:28.495 INFO:tasks.cephfs_test_runner:AssertionError: False is not true

Files

unnamed.png (335 KB) unnamed.png Jos Collin, 03/06/2024 10:23 AM

Related issues 3 (0 open3 closed)

Copied to CephFS - Backport #68407: quincy: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)RejectedJos CollinActions
Copied to CephFS - Backport #68408: squid: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)ResolvedJos CollinActions
Copied to CephFS - Backport #68409: reef: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring)ResolvedJos CollinActions
Actions #1

Updated by Jos Collin about 2 years ago

The check_peer_snap_in_progress() doesn't wait for 'syncing' to appear. It just checks the state at the moment and returns. Probably, it returned before the replayer sets the 'syncing'.

Actions #2

Updated by Jos Collin about 2 years ago

  • Status changed from New to In Progress
Actions #3

Updated by Jos Collin about 2 years ago

2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout:{
2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout:    "/d0": {
2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout:        "state": "idle",
2024-03-04T10:55:59.104 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_synced": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_deleted": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_renamed": 0
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:    },
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:    "/d1": {
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "state": "idle",
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_synced": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_deleted": 0,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_renamed": 0
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:    },
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:    "/d2": {
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "state": "syncing",
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        "current_sycning_snap": {
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:            "id": 4,
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:            "name": "snap0" 
2024-03-04T10:55:59.105 INFO:teuthology.orchestra.run.smithi105.stdout:        },
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_synced": 0,
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_deleted": 0,
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout:        "snaps_renamed": 0
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout:    }
2024-03-04T10:55:59.106 INFO:teuthology.orchestra.run.smithi105.stdout:}
2024-03-04T10:55:59.124 DEBUG:tasks.cephfs.test_mirroring:command returned={
    "/d0": {
        "state": "idle",
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    },
    "/d1": {
        "state": "idle",
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    },
    "/d2": {
        "state": "syncing",
        "current_sycning_snap": {
            "id": 4,
            "name": "snap0" 
        },
        "snaps_synced": 0,
        "snaps_deleted": 0,
        "snaps_renamed": 0
    }
}

The snapshots are taken in the below order:

# take snapshots                                                                                                        
log.debug('taking snapshots')
self.mount_a.run_shell(["mkdir", "d0/.snap/snap0"])
self.mount_a.run_shell(["mkdir", "d1/.snap/snap0"])
self.mount_a.run_shell(["mkdir", "d2/.snap/snap0"])

Thus the d2 started "syncing" first, but the others (d0 and d1) didn't start syncing when we check the status of d0 first and check_peer_snap_in_progress asserts.

Actions #4

Updated by Jos Collin about 2 years ago

2024-03-04T10:55:50.942+0000 2c66b640 -1 cephfs::mirror::PeerReplayer(e9c703c7-88c9-40ed-b3f8-ff678976a731) should_sync_entry: \
      failed to stat prev entry= ./file.2: (9) Bad file descriptor
2024-03-04T10:55:52.370+0000 2ce6c640 -1 cephfs::mirror::PeerReplayer(e9c703c7-88c9-40ed-b3f8-ff678976a731) sync_snaps: failed \
      to sync snapshots for dir_root=/d0

Notice (the screenshot) the difference in should_sync_entry. First, it had r=-2, which is good for syncing d2. The next call to should_sync_entry for syncing d1, says 'Bad file descriptor'. The next one for syncing d0 shows r=-2. Looks like there's a race. Should we take a lock at the start of should_sync_entry?

Actions #5

Updated by Jos Collin about 2 years ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 56118
Actions #6

Updated by Venky Shankar almost 2 years ago

Also seen in a Quincy run: /a/vshankar-2024-03-14_16:52:41-fs-wip-vshankar-testing1-quincy-2024-03-14-0655-quincy-testing-default-smithi/7600853

Actions #12

Updated by Venky Shankar over 1 year ago

  • Category set to Testing
  • Status changed from Fix Under Review to Pending Backport
  • Source set to Q/A
Actions #13

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #68407: quincy: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring) added
Actions #14

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #68408: squid: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring) added
Actions #15

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #68409: reef: Test failure: test_cephfs_mirror_cancel_mirroring_and_readd (tasks.cephfs.test_mirroring.TestMirroring) added
Actions #16

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #18

Updated by Jos Collin over 1 year ago

Patrick Donnelly wrote in #note-17:

not fixed: https://pulpito.ceph.com/pdonnell-2024-10-04_18:42:30-fs-wip-pdonnell-testing-20241004.144202-debug-distro-default-smithi/7933584/

not rebased: wip-pdonnell-testing-20241004.144202-debug

Actions #24

Updated by Jos Collin about 1 year ago

Milind Changire wrote in #note-23:

squid:
https://pulpito.ceph.com/vshankar-2025-01-05_17:50:50-fs-wip-vshankar-testing-20250105.135958-squid-debug-testing-default-smithi/8062626

This^ is not a related failure. As I checked the logs, I see the below:

2025-01-10T15:08:06.775 DEBUG:teuthology.orchestra.run.smithi130:peer status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@14 8b7edff2-9c26-4472-a85f-7ec328bfca99
2025-01-10T15:08:06.957 INFO:teuthology.orchestra.run.smithi130.stderr:admin_socket: exception getting command descriptions: [Errno 111] Connection refused
2025-01-10T15:08:06.966 DEBUG:teuthology.orchestra.run:got remote process result: 22
2025-01-10T15:08:06.967 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "peer status for fs: cephfs" failed: Command failed (peer status for fs: cephfs) on smithi130 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror peer status cephfs@14 8b7edff2-9c26-4472-a85f-7ec32\
8bfca99'

Actions #25

Updated by Jos Collin about 1 year ago

  • Status changed from Pending Backport to Resolved
Actions #26

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to 635b5541074bfded98ec1029942f11f07171edef
  • Fixed In set to v19.3.0-5374-g635b5541074
  • Upkeep Timestamp set to 2025-07-02T03:31:45+00:00
Actions #27

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-5374-g635b5541074 to v19.3.0-5374-g635b554107
  • Upkeep Timestamp changed from 2025-07-02T03:31:45+00:00 to 2025-07-14T16:45:20+00:00
Actions #28

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~1863
  • Upkeep Timestamp changed from 2025-07-14T16:45:20+00:00 to 2025-11-01T01:26:56+00:00
Actions

Also available in: Atom PDF