Bug #72892
openrados/cephadm - rm-cluster hung after mgr daemon was recovered
0%
Description
2025-08-19T21:57:56.259 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:57:55 smithi164 ceph-mon[34295]: Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON) │ │2025-08-19T21:57:56.259 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:57:55 smithi164 ceph-mon[34295]: Health check failed: 1 osds down (OSD_DOWN) │ │2025-08-19T21:57:56.259 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:57:55 smithi164 ceph-mon[34295]: Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN) │ │2025-08-19T21:57:56.259 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:57:55 smithi164 ceph-mon[34295]: Health check failed: 1 root (1 osds) down (OSD_ROOT_DOWN) │ │2025-08-19T21:58:03.998 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:58:03 smithi164 ceph-mon[34295]: Health check cleared: OSD_DOWN (was: 1 osds down) │ │2025-08-19T21:58:03.998 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:58:03 smithi164 ceph-mon[34295]: Health check cleared: OSD_HOST_DOWN (was: 1 host (1 osds) down) │ │2025-08-19T21:58:03.998 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:58:03 smithi164 ceph-mon[34295]: Health check cleared: OSD_ROOT_DOWN (was: 1 root (1 osds) down) │ │2025-08-19T21:58:06.570 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:58:06 smithi164 ceph-mon[34295]: Health check cleared: CEPHADM_FAILED_DAEMON (was: 1 failed cephadm daemon(s)) │ │2025-08-19T21:58:16.988 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:58:16 smithi164 ceph-mon[34295]: Health check failed: 1 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON) │ │2025-08-19T21:58:22.258 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:58:22 smithi164 ceph-mon[34295]: Health check cleared: CEPHADM_FAILED_DAEMON (was: 1 failed cephadm daemon(s)) │ │2025-08-19T21:59:14.008 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:59:13 smithi164 ceph-mon[34295]: Health check failed: 1 osds down (OSD_DOWN) │ │2025-08-19T21:59:22.078 INFO:journalctl@ceph.mon.a.smithi164.stdout:Aug 19 21:59:21 smithi164 ceph-mon[34295]: Health check cleared: OSD_DOWN (was: 1 osds down) rados/cephadm test found on main 2025-08-19T22:02:29.759 INFO:journalctl@ceph.osd.2.smithi164.stdout:Aug 19 22:02:29 smithi164 systemd[1]: Stopped Ceph osd.2 for 2f87e142-7d47-11f0-8741-adfe0268badd. 2025-08-19T22:02:29.759 INFO:journalctl@ceph.osd.2.smithi164.stdout:Aug 19 22:02:29 smithi164 systemd[1]: ceph-2f87e142-7d47-11f0-8741-adfe0268badd@osd.2.service: Consumed 3.089s CPU time. 2025-08-19T22:02:30.501 DEBUG:teuthology.orchestra.run:got remote process result: None 2025-08-19T22:02:30.501 INFO:tasks.cephadm.osd.2:Stopped osd.2 2025-08-19T22:02:30.502 DEBUG:teuthology.orchestra.run.smithi164:> sudo /home/ubuntu/cephtest/cephadm rm-cluster --fsid 2f87e142-7d47-11f0-8741-adfe0268badd --force --keep-logs 2025-08-20T05:45:00.800 DEBUG:teuthology.exit:Got signal 15; running 1 handler... 2025-08-20T05:45:00.858 DEBUG:teuthology.task.console_log:Killing console logger for smithi164 2025-08-20T05:45:00.859 DEBUG:teuthology.exit:Finished running handlers
rm-cluster issued after CEPHADM_FAILED_DAEMON was cleared (ie. after mgr daemon was recovered)
rm-cluster command hung
Not convinced this is a duplicate of https://tracker.ceph.com/issues/68586 or https://tracker.ceph.com/issues/69803
as the mgr daemon was up and running at the time.
/a/yuriw-2025-08-19_14:49:40-rados-wip-yuri-testing-2025-08-18-1127-distro-default-smithi/8451583
https://pulpito.ceph.com/yuriw-2025-08-19_14:49:40-rados-wip-yuri-testing-2025-08-18-1127-distro-default-smithi/8451583
Updated by Lee Sanders 4 months ago
- Project changed from Ceph to Orchestrator
Another occurrence here:
/a/skanta-2025-11-01_01:12:28-rados-wip-bharath5-testing-2025-10-31-1454-distro-default-smithi/8578605
Updated by Lee Sanders 4 months ago
/a/skanta-2025-11-01_01:12:28-rados-wip-bharath5-testing-2025-10-31-1454-distro-default-smithi/8578605
Updated by Laura Flores about 2 months ago
/a/lflores-2026-01-26_23:21:06-rados-wip-yuri12-testing-2026-01-22-2045-distro-default-trial/19091
Updated by Nitzan Mordechai about 2 months ago
/a/yuriw-2026-01-29_18:33:05-rados-wip-yuri2-testing-2026-01-28-1643-tentacle-distro-default-trial/26508
Updated by Nitzan Mordechai about 1 month ago
/a/yuriw-2026-02-04_23:08:40-rados-wip-yuri3-testing-2026-02-04-1948-tentacle-distro-default-trial/35396
Updated by Nitzan Mordechai about 1 month ago
/a/skanta-2026-02-02_23:43:28-rados-wip-bharath9-testing-2026-02-02-0839-distro-default-trial/30412
Updated by Nitzan Mordechai 3 days ago
/a/yaarit-2026-03-19_02:36:58-rados:cephadm-wip-rocky10-branch-of-the-day-2026-03-18-1773834065-tentacle-distro-default-trial/
7 jobs: ['108801', '108788', '108763', '108838', '108776', '108851', '108813']