Actions
Bug #46036
closedcephadm: killmode=none: systemd units failed, but containers still running
% Done:
0%
Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
v16.0.0-2809-gd864a73276
Released In:
v16.2.0~2200
Upkeep Timestamp:
2025-07-14T19:40:24+00:00
Description
# ceph orch ps NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID osd.0 hostXXXXX-4 error 6m ago 92m 15.2.3.252 ceph/ceph 33194941836f c5dd2b0cc77d osd.1 hostXXXXX-4 error 6m ago 90m 15.2.3.252 ceph/ceph 33194941836f b65dc56c76a2
turns out, the systemd unit failed:
● ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.2.service - Ceph osd.2 for 92d2d4c0-af05-11ea-9578-0cc47aaa2edc
Loaded: loaded (/etc/systemd/system/ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2020-06-16 12:05:49 UTC; 1h 32min ago
Process: 3861 ExecStopPost=/bin/bash /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.2/unit.poststop (code=exited, status=0/SUCCESS)
Process: 3693 ExecStart=/bin/bash /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.2/unit.run (code=exited, status=125)
Process: 3676 ExecStartPre=/usr/bin/podman rm ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc-osd.2 (code=exited, status=2)
Main PID: 3693 (code=exited, status=125)
Tasks: 34
CGroup: /system.slice/system-ceph\x2d92d2d4c0\x2daf05\x2d11ea\x2d9578\x2d0cc47aaa2edc.slice/ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.2.service
├─28935 /bin/bash /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.2/unit.run
├─29335 /usr/bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc-osd.2 -e CONTAINER_IMAGE=ceph/ceph -e NODE_NAME=hostXXXXX-4 -v /var/run/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc>
└─29396 /usr/bin/conmon --api-version 1 -s -c 2f88b58cb64519fc90842f6a473703da44c5612d2686b5beae86b0ff2a7d50bb -u 2f88b58cb64519fc90842f6a473703da44c5612d2686b5beae86b0ff2a7d50bb -r /usr/sbin/runc -b /var/lib/containers/storage/btrfs-containers/2f88b58cb64519fc90842f6a473703da44c5612d2686b5beae86b0ff2a7d5>
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: Uptime(secs): 5400.0 total, 0.0 interval
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: Flush(GB): cumulative 0.000, interval 0.000
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: AddFile(GB): cumulative 0.000, interval 0.000
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: AddFile(Total Files): cumulative 0, interval 0
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: AddFile(L0 Files): cumulative 0, interval 0
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: AddFile(Keys): cumulative 0, interval 0
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
Jun 16 13:32:08 hostXXXXX-4 bash[28935]: ** File Read Latency Histogram By Level [default] **
where the log shows something like
-- Unit ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service has begun starting up. Jun 16 12:03:06 hostXXXXX-4 podman[31032]: Error: cannot remove container b65dc56c76a247e9178fa81005a93cf44e502a06fb46bcc79b9bf484128b2907 as it is running -> Jun 16 12:03:06 hostXXXXX-4 systemd[1]: Started Ceph osd.1 for 92d2d4c0-af05-11ea-9578-0cc47aaa2edc. -- Subject: Unit ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service has finished start-up -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service has finished starting up. -- -- The start-up result is done. Jun 16 12:03:06 hostXXXXX-4 bash[31047]: WARNING: The same type, major and minor should not be used for multiple devices. Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1 Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-block-78635646-a4a6-474e-> Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/ln -snf /dev/ceph-block-78635646-a4a6-474e-8832-c3a3e668cf9d/osd-block-f645cf27-857b-48ae-> Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--block--78635646--a4a6--474e--8832--c3a3e668cf9d-osd--> Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1 Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/ln -snf /dev/ceph-block-dbs-83a1ab17-f232-4f60-887f-111b89f3f655/osd-block-db-3c9563a7-9ab> Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -h ceph:ceph /dev/ceph-block-dbs-83a1ab17-f232-4f60-887f-111b89f3f655/osd-block-db-3> Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--block--dbs--83a1ab17--f232--4f60--887f--111b89f3f655-> Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-1/block.db Jun 16 12:03:07 hostXXXXX-4 bash[31047]: Running command: /usr/bin/chown -R ceph:ceph /dev/mapper/ceph--block--dbs--83a1ab17--f232--4f60--887f--111b89f3f655-> Jun 16 12:03:07 hostXXXXX-4 bash[31047]: --> ceph-volume lvm activate successful for osd ID: 1 Jun 16 12:03:08 hostXXXXX-4 bash[31047]: WARNING: The same type, major and minor should not be used for multiple devices. Jun 16 12:03:08 hostXXXXX-4 bash[31047]: Error: error creating container storage: the container name "ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc-osd.1" is alr> Jun 16 12:03:08 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Main process exited, code=exited, status=125/n/a Jun 16 12:03:08 hostXXXXX-4 bash[31227]: WARNING: The same type, major and minor should not be used for multiple devices. Jun 16 12:03:09 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Unit entered failed state. Jun 16 12:03:09 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Failed with result 'exit-code'. Jun 16 12:03:19 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Service RestartSec=10s expired, scheduling restart. Jun 16 12:03:19 hostXXXXX-4 systemd[1]: Stopped Ceph osd.1 for 92d2d4c0-af05-11ea-9578-0cc47aaa2edc. -- Subject: Unit ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service has finished shutting down -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service has finished shutting down. Jun 16 12:03:19 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Start request repeated too quickly. Jun 16 12:03:19 hostXXXXX-4 systemd[1]: Failed to start Ceph osd.1 for 92d2d4c0-af05-11ea-9578-0cc47aaa2edc. -- Subject: Unit ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service has failed -- Defined-By: systemd -- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service has failed. -- -- The result is failed. Jun 16 12:03:19 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Unit entered failed state. Jun 16 12:03:19 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Failed with result 'exit-code'.
Adding a set -e changes the output to /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.1/unit.run:
hostXXXXX-4:~ # systemctl status ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1
● ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service - Ceph osd.1 for 92d2d4c0-af05-11ea-9578-0cc47aaa2edc
Loaded: loaded (/etc/systemd/system/ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Tue 2020-06-16 14:07:27 UTC; 4s ago
Process: 10391 ExecStopPost=/bin/bash /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.1/unit.poststop (code=exited, status=0/SUCCESS)
Process: 10216 ExecStart=/bin/bash /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.1/unit.run (code=exited, status=125)
Process: 10201 ExecStartPre=/usr/bin/podman rm ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc-osd.1 (code=exited, status=2)
Main PID: 10216 (code=exited, status=125)
Tasks: 29
CGroup: /system.slice/system-ceph\x2d92d2d4c0\x2daf05\x2d11ea\x2d9578\x2d0cc47aaa2edc.slice/ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service
├─25971 /bin/bash /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.1/unit.run
├─26395 /usr/bin/podman run --rm --net=host --ipc=host --privileged --group-add=disk --name ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc-osd.1 -e CON>
└─26452 /usr/bin/conmon --api-version 1 -s -c b65dc56c76a247e9178fa81005a93cf44e502a06fb46bcc79b9bf484128b2907 -u b65dc56c76a247e9178fa81005a93cf4>
Jun 16 14:07:27 hostXXXXX-4 systemd[1]: ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc@osd.1.service: Failed with result 'exit-code'.
now, let's kill the container:
hostXXXXX-4:~ # podman stop ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc-osd.1 b65dc56c76a247e9178fa81005a93cf44e502a06fb46bcc79b9bf484128b2907
Adding this line into /var/lib/ceph/92d2d4c0-af05-11ea-9578-0cc47aaa2edc/osd.1/unit.run
! /usr/bin/podman rm --storage ceph-92d2d4c0-af05-11ea-9578-0cc47aaa2edc-osd.1
now, the service is up again.
Updated by Sebastian Wagner almost 6 years ago
https://github.com/ceph/ceph/pull/35524 is part of the solution. the other part is adding a set -e
Updated by Sebastian Wagner almost 6 years ago
- Related to Bug #44990: cephadm: exec: "/usr/bin/ceph-mon": stat /usr/bin/ceph-mon: no such file or directory added
Updated by Sebastian Wagner almost 6 years ago
- Status changed from New to Fix Under Review
- Pull request ID set to 35651
Updated by Sebastian Wagner over 5 years ago
- Status changed from Fix Under Review to Pending Backport
Updated by Sebastian Wagner over 5 years ago
- Status changed from Pending Backport to Resolved
- Target version set to v15.2.5
Updated by Sebastian Wagner over 5 years ago
- Related to Bug #46654: Unsupported podman container configuration via systemd added
Updated by Upkeep Bot 8 months ago
- Merge Commit set to d864a7327604fe085f364eee1361af322c837f27
- Fixed In set to v16.0.0-2809-gd864a73276
- Released In set to v16.2.0~2200
- Upkeep Timestamp set to 2025-07-14T19:40:24+00:00
Actions