Bug #63089
closedqa: tasks/mirror times out
0%
Description
/a/vshankar-2023-09-28_07:23:59-fs-wip-vshankar-testing-20230926.081818-testing-default-smithi/7405363
2023-09-28T11:15:33.524 DEBUG:teuthology.orchestra.run.smithi105:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs mirror enable cephfs 2023-09-28T11:15:33.549 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.549+0000 7f1d69c56040 -1 mgr[py] Module zabbix has missing NOTIFY_TYPES member 2023-09-28T11:15:33.604 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.605+0000 7f1d69c56040 -1 mgr[py] Module balancer has missing NOTIFY_TYPES member 2023-09-28T11:15:33.657 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.657+0000 7f1d69c56040 -1 mgr[py] Module influx has missing NOTIFY_TYPES member 2023-09-28T11:15:33.721 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.721+0000 7f1d69c56040 -1 mgr[py] Module alerts has missing NOTIFY_TYPES member 2023-09-28T11:15:33.794 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.794+0000 7f1d69c56040 -1 mgr[py] Module iostat has missing NOTIFY_TYPES member 2023-09-28T11:15:33.935 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:33.935+0000 7f1d69c56040 -1 mgr[py] Module rgw has missing NOTIFY_TYPES member 2023-09-28T11:15:34.002 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.002+0000 7f1d69c56040 -1 mgr[py] Module rbd_support has missing NOTIFY_TYPES member 2023-09-28T11:15:34.056 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.056+0000 7f1d69c56040 -1 mgr[py] Module progress has missing NOTIFY_TYPES member 2023-09-28T11:15:34.118 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.118+0000 7f1d69c56040 -1 mgr[py] Module pg_autoscaler has missing NOTIFY_TYPES member 2023-09-28T11:15:34.172 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.172+0000 7f1d69c56040 -1 mgr[py] Module devicehealth has missing NOTIFY_TYPES member 2023-09-28T11:15:34.534 INFO:teuthology.orchestra.run:Running command with timeout 30 2023-09-28T11:15:34.534 DEBUG:teuthology.orchestra.run.smithi105:mirror status for fs: cephfs> ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@56 2023-09-28T11:15:34.572 INFO:tasks.ceph.mgr.x.smithi105.stderr:2023-09-28T11:15:34.572+0000 7f1d69c56040 -1 mgr[py] Module rook has missing NOTIFY_TYPES member 2023-09-28T11:15:34.726 INFO:teuthology.orchestra.run.smithi105.stderr:no valid command found; 1 closest matches: 2023-09-28T11:15:34.726 INFO:teuthology.orchestra.run.smithi105.stderr:fs mirror status cephfs@54 2023-09-28T11:15:34.726 INFO:teuthology.orchestra.run.smithi105.stderr:admin_socket: invalid command 2023-09-28T11:15:34.729 DEBUG:teuthology.orchestra.run:got remote process result: 22 2023-09-28T11:15:34.730 WARNING:tasks.cephfs.test_mirroring:mirror daemon command with label "mirror status for fs: cephfs" failed: Command failed (mirror status for fs: cephfs) on smithi105 with status 22: 'ceph --admin-daemon /var/run/ceph/cephfs-mirror.asok fs mirror status cephfs@56'
Updated by Venky Shankar over 2 years ago
- Priority changed from Normal to Urgent
Updated by Venky Shankar over 2 years ago
Another instance, this time from reef branch: vshankar-2023-09-27_10:23:33-fs-wip-vshankar-testing-reef-20230927.021134-testing-default-smithi/7402858
From logs:
2023-09-27T13:52:11.070+0000 d0c7640 20 cephfs::mirror::Mirror schedule_mirror_update_task: scheduling fs mirror update (0x7083620) after 2 seconds
2023-09-27T13:52:11.071+0000 c8c6640 20 cephfs::mirror::FSMirror ~FSMirror
2023-09-27T13:52:11.071+0000 c8c6640 10 cephfs::mirror::Mirror enable_mirroring: starting FSMirror: filesystem={fscid=52, fs_name=cephfs}
2023-09-27T13:52:11.071+0000 c8c6640 10 cephfs::mirror::ServiceDaemon: 0x8fdf7e0 add_or_update_fs_attribute: fscid=52
2023-09-27T13:52:11.071+0000 c8c6640 10 cephfs::mirror::ServiceDaemon: 0x8fdf7e0 schedule_update_status
2023-09-27T13:52:11.071+0000 c8c6640 20 cephfs::mirror::FSMirror init
2023-09-27T13:52:11.071+0000 c8c6640 20 cephfs::mirror::Utils connect: connecting to cluster=ceph, client=client.mirror, mon_host=
2023-09-27T13:52:11.465+0000 c8c6640 10 cephfs::mirror::Utils connect: using mon addr=172.21.15.17
2023-09-27T13:52:12.071+0000 110cf640 20 cephfs::mirror::ServiceDaemon: 0x8fdf7e0 update_status: 1 filesystem(s)
2023-09-27T13:52:13.070+0000 d0c7640 20 cephfs::mirror::Mirror update_fs_mirrors
2023-09-27T13:52:22.110+0000 c8c6640 10 cephfs::mirror::Utils connect: connected to cluster=ceph using client=client.mirror
2023-09-27T13:52:22.169+0000 c8c6640 20 cephfs::mirror::Utils mount: filesystem={fscid=52, fs_name=cephfs}
2023-09-27T13:52:22.609+0000 c8c6640 10 cephfs::mirror::Utils mount: mounted filesystem={fscid=52, fs_name=cephfs}
2023-09-27T13:52:22.609+0000 c8c6640 10 cephfs::mirror::FSMirror init: rados addrs=172.21.15.17:0/3359552797
2023-09-27T13:52:22.609+0000 c8c6640 20 cephfs::mirror::FSMirror init_instance_watcher
2023-09-27T13:52:22.609+0000 c8c6640 20 cephfs::mirror::InstanceWatcher init
2023-09-27T13:52:22.609+0000 c8c6640 20 cephfs::mirror::InstanceWatcher create_instance
The daemon never returned from creating an instance object. Another observation is that the failures are with valgrind/
Updated by Patrick Donnelly almost 2 years ago
- Target version changed from v19.0.0 to v20.0.0
Updated by Jos Collin over 1 year ago
Initially, I was looking for a valgrind failure. But as I checked deeper, this failure happens particularly for the tests 'test_mirroring_init_failure_with_recovery' and 'test_mirroring_init_failure' because the mirror daemon failed to restart after the tests marked the mirror daemon blocklisted/failed. The fix is already provided by https://github.com/ceph/ceph/pull/56193. I'm confirming this and will update the tracker soon.
Updated by Venky Shankar over 1 year ago
Jos Collin wrote in #note-5:
Initially, I was looking for a valgrind failure. But as I checked deeper, this failure happens particularly for the tests 'test_mirroring_init_failure_with_recovery' and 'test_mirroring_init_failure' because the mirror daemon failed to restart after the tests marked the mirror daemon blocklisted/failed. The fix is already provided by https://github.com/ceph/ceph/pull/56193. I'm confirming this and will update the tracker soon.
Thanks for checking. That PR is likely to get merged soon (hopefully today). Will notify when merged.
Updated by Jos Collin over 1 year ago
- Status changed from In Progress to Fix Under Review
- Pull request ID set to 56193
I've verified that this issue has similar logs and failures as in:
https://tracker.ceph.com/issues/64927
https://tracker.ceph.com/issues/51964
https://tracker.ceph.com/issues/63931
So https://github.com/ceph/ceph/pull/56193 should fix this too.
Updated by Venky Shankar over 1 year ago
- Status changed from Fix Under Review to Pending Backport
- Source set to Q/A
- Backport changed from reef,quincy to quincy,reef,squid
Updated by Jos Collin over 1 year ago
- Copied to Backport #66969: squid: qa: tasks/mirror times out added
Updated by Jos Collin over 1 year ago
- Copied to Backport #66970: reef: qa: tasks/mirror times out added
Updated by Jos Collin over 1 year ago
- Copied to Backport #66971: quincy: qa: tasks/mirror times out added
Updated by Upkeep Bot over 1 year ago
- Copied to Backport #67145: quincy: qa: tasks/mirror times out added
Updated by Upkeep Bot over 1 year ago
- Tags (freeform) set to backport_processed
Updated by Jos Collin over 1 year ago
- Status changed from Pending Backport to Resolved
Updated by Upkeep Bot 9 months ago
- Merge Commit set to 62eb72731aca5d403ed6239946c6ea66f3be36e7
- Fixed In set to v19.3.0-3451-g62eb72731ac
- Upkeep Timestamp set to 2025-07-02T03:46:21+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v19.3.0-3451-g62eb72731ac to v19.3.0-3451-g62eb72731a
- Upkeep Timestamp changed from 2025-07-02T03:46:21+00:00 to 2025-07-14T16:45:48+00:00
Updated by Upkeep Bot 5 months ago
- Released In set to v20.2.0~2488
- Upkeep Timestamp changed from 2025-07-14T16:45:48+00:00 to 2025-11-01T01:27:12+00:00