Bug #62713
opensync && sudo umount -f /var/lib/ceph/osd/ceph-6
0%
Description
Description of problem¶
/a/yuriw-2023-08-16_18:39:08-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7370263/
Traceback:
2023-08-16T22:19:50.948 DEBUG:teuthology.orchestra.run.smithi123:> sync && sudo umount -f /var/lib/ceph/osd/ceph-6 2023-08-16T22:19:51.024 INFO:teuthology.orchestra.run.smithi123.stderr:umount: /var/lib/ceph/osd/ceph-6: target is busy.
We suspect that the problem might be to do with systemd
Updated by Laura Flores over 2 years ago
@Junior have you seen any further instances of this?
Updated by Kamoltat (Junior) Sirivadhna over 2 years ago
/a/yuriw-2023-11-02_14:18:00-rados-wip-yuri-testing-2023-11-01-1538-reef-distro-default-smithi/7444612/
Updated by Laura Flores over 1 year ago
/a/lflores-2024-07-23_16:58:04-rados-squid-distro-default-smithi/7814104
/a/lflores-2024-07-23_16:58:04-rados-squid-distro-default-smithi/7814102
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-06_00:08:05-rados-wip-yuri6-testing-2024-08-05-0905-squid-distro-default-smithi/7838404
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-02_15:32:29-rados-squid-release-distro-default-smithi/7833174
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-28_23:20:36-rados-wip-yuri4-testing-2024-08-28-1359-distro-default-smithi/7879426
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-08-29_14:15:40-rados-wip-yuri8-testing-2024-08-28-1632-squid-distro-default-smithi/7880295
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-10-15_14:06:51-rados-wip-yuri8-testing-2024-10-14-1103-distro-default-smithi/7948329
Updated by Kamoltat (Junior) Sirivadhna over 1 year ago
/a/yuriw-2024-10-16_22:34:49-rados-wip-yuri10-testing-2024-10-15-1007-distro-default-smithi/7952607/
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-10-16_14:57:58-rados-wip-yuri2-testing-2024-10-15-0703-distro-default-smithi/7950794
Updated by Laura Flores over 1 year ago
/a/yuriw-2024-11-18_23:57:04-rados-wip-yuri4-testing-2024-11-18-0759-squid-distro-default-smithi/7999083
Updated by Sridhar Seshasayee over 1 year ago
/a/skanta-2024-11-26_00:08:32-rados-wip-bharath8-testing-2024-11-25-1916-squid-distro-default-smithi/8009802
Updated by Nitzan Mordechai about 1 year ago
/a/skanta-2025-01-24_10:43:00-rados-wip-bharath10-testing-2025-01-24-0941-squid-distro-default-smithi/8091987
/a/skanta-2025-01-24_10:43:00-rados-wip-bharath10-testing-2025-01-24-0941-squid-distro-default-smithi/8091815
/a/skanta-2025-01-24_10:43:00-rados-wip-bharath10-testing-2025-01-24-0941-squid-distro-default-smithi/8091962
Updated by Laura Flores about 1 year ago
/a/skanta-2025-03-01_01:42:21-rados-wip-bharath3-testing-2025-03-01-0356-distro-default-smithi/8162042
Updated by Laura Flores about 1 year ago
/a/yuriw-2025-02-26_00:53:37-rados-wip-yuri2-testing-2025-02-25-1331-squid-distro-default-smithi/8154000
Updated by Laura Flores about 1 year ago
/a/yuriw-2025-03-14_20:32:49-rados-wip-yuri7-testing-2025-03-11-0847-distro-default-smithi/8190596
Updated by Nitzan Mordechai about 1 year ago
/a/yuriw-2025-03-18_01:22:46-rados-wip-yuri5-testing-2025-03-17-1441-reef-distro-default-smithi/
['8196387', '8196551', '8196575', '8196409']
Updated by Adam Kupczyk 12 months ago
There are 16 deployed OSD. All seem to be running fine.
At some point we do:
2025-03-25T10:38:22.772 INFO:teuthology.misc:Shutting down osd daemons...
This initiates shutting down loop (teuthology/misc.py):
def stop_daemons_of_type(ctx, type_, cluster='ceph', timeout=300):
"""
:param type_: type of daemons to be stopped.
:param cluster: Cluster name, default is 'ceph'.
:param timeout: Timeout in seconds for stopping each daemon.
"""
log.info('Shutting down %s daemons...' % type_)
exc = None
for daemon in ctx.daemons.iter_daemons_of_role(type_, cluster):
try:
daemon.stop(timeout)
except (CommandFailedError,
CommandCrashedError,
ConnectionLostError) as e:
exc = e
log.exception('Saw exception from %s.%s', daemon.role, daemon.id_)
if exc is not None:
raise exc
But somehow now all OSDs are processed by "for daemon in ctx.daemons.iter_daemons_of_role(type_, cluster):"
There are messages for stopping:
0, 4, 8, 12,
1, 5, 9, 13,
2, 6, 10, 14,
3
and that's it.
At this point we continue to shuttdown monitors.
This is reflected with sequence of osds getting termination signals.
But 7, 11, 15 are not there.
2025-03-25T10:38:22.776+0000 add6640 -1 osd.0 347 *** Got signal Terminated *** 2025-03-25T10:41:13.732+0000 add6640 -1 osd.4 348 *** Got signal Terminated *** 2025-03-25T10:41:19.794+0000 add6640 -1 osd.8 350 *** Got signal Terminated *** 2025-03-25T10:41:25.899+0000 add6640 -1 osd.12 352 *** Got signal Terminated *** 2025-03-25T10:41:32.000+0000 add6640 -1 osd.1 354 *** Got signal Terminated *** 2025-03-25T10:41:38.103+0000 add6640 -1 osd.5 356 *** Got signal Terminated *** 2025-03-25T10:41:44.204+0000 add6640 -1 osd.9 357 *** Got signal Terminated *** 2025-03-25T10:41:50.307+0000 add6640 -1 osd.13 359 *** Got signal Terminated *** 2025-03-25T10:41:56.411+0000 add6640 -1 osd.2 360 *** Got signal Terminated *** 2025-03-25T10:42:02.514+0000 add6640 -1 osd.6 361 *** Got signal Terminated *** 2025-03-25T10:42:08.616+0000 add6640 -1 osd.10 362 *** Got signal Terminated *** 2025-03-25T10:42:14.718+0000 add6640 -1 osd.14 363 *** Got signal Terminated *** 2025-03-25T10:42:20.822+0000 add6640 -1 osd.3 364 *** Got signal Terminated ***
At this point is seems only natural that we cannot umount osd.7 device.
Updated by Adam Kupczyk 12 months ago
Very strange, this has it exactly the same:
https://pulpito.ceph.com/akupczyk-2025-03-25_11:54:00-rados-aclamk-wip-ifed-fix-bluefs-reserved2-distro-default-smithi/8208400/
Updated by Laura Flores 12 months ago
/a/skanta-2025-04-04_06:10:17-rados-wip-bharath10-testing-2025-04-03-2112-distro-default-smithi/8223483
Updated by Kamoltat (Junior) Sirivadhna 12 months ago
yuriw-2025-03-26_14:16:14-rados-wip-yuri8-testing-2025-03-25-0803-squid-distro-default-smithi/8210861/
Updated by Laura Flores 12 months ago
/a/teuthology-2025-04-06_20:00:14-rados-main-distro-default-smithi/8226707
3 jobs: ['8226707', '8226537', '8226683']
Updated by Laura Flores 12 months ago
- Project changed from Infrastructure to RADOS
- Description updated (diff)
Updated by Laura Flores 12 months ago
/a/skanta-2025-04-05_15:49:33-rados-wip-bharath8-testing-2025-04-05-1439-distro-default-smithi/8225226
Updated by Laura Flores 11 months ago
Pinged in #ceph-devel to see if someone can pick this up.
Updated by Nitzan Mordechai 11 months ago
- Status changed from New to In Progress
- Assignee set to Nitzan Mordechai
@Laura Flores looks like we are not waiting enough for the osd to terminate before umount. we can use "stop-daemons-timeout" that I added recently.
I'll create PR for that
Updated by Nitzan Mordechai 11 months ago
- Status changed from In Progress to Fix Under Review
Updated by Laura Flores 11 months ago
- Backport set to squid
/a/skanta-2025-04-25_13:44:10-rados-wip-bharath10-testing-2025-04-25-0722-squid-distro-default-smithi/8259017
Updated by Laura Flores 11 months ago
- Backport changed from squid to squid,reef
/a/skanta-2025-04-03_15:46:29-rados-wip-bharath5-testing-2025-04-03-1526-reef-distro-default-smithi/8223045
Updated by Laura Flores 11 months ago
Fix has been approved; waiting for ci to pass.
Updated by Radoslaw Zarzynski 10 months ago
- Status changed from Fix Under Review to Pending Backport
- Backport changed from squid,reef to tentacle,squid,reef
Merged in main.
Updated by Upkeep Bot 10 months ago
- Copied to Backport #71540: tentacle: sync && sudo umount -f /var/lib/ceph/osd/ceph-6 added
Updated by Upkeep Bot 10 months ago
- Copied to Backport #71541: reef: sync && sudo umount -f /var/lib/ceph/osd/ceph-6 added
Updated by Upkeep Bot 10 months ago
- Copied to Backport #71542: squid: sync && sudo umount -f /var/lib/ceph/osd/ceph-6 added
Updated by Upkeep Bot 9 months ago
- Merge Commit set to a7052198d9bd906b8a162745c4604a12a44bc94d
- Fixed In set to v20.3.0-598-ga7052198d9b
- Upkeep Timestamp set to 2025-07-09T16:09:40+00:00
Updated by Upkeep Bot 8 months ago
- Fixed In changed from v20.3.0-598-ga7052198d9b to v20.3.0-598-ga7052198d9
- Upkeep Timestamp changed from 2025-07-09T16:09:40+00:00 to 2025-07-14T17:41:50+00:00
Updated by Lee Sanders 8 months ago
/a/skanta-2025-07-16_13:08:53-rados-wip-bharath3-testing-2025-07-16-1352-squid-distro-default-smithi/8390617
Updated by Lee Sanders 8 months ago
/a/skanta-2025-07-16_13:08:53-rados-wip-bharath3-testing-2025-07-16-1352-squid-distro-default-smithi/8390677/
Updated by Lee Sanders 8 months ago
/a/skanta-2025-07-16_22:44:28-rados-wip-bharath3-testing-2025-07-16-1352-squid-distro-default-smithi/8391937
Updated by Jonathan Bailey 8 months ago
/a/skanta-2025-07-20_23:27:54-rados-wip-bharath6-testing-2025-07-20-0524-reef-distro-default-smithi/8399136
/a/skanta-2025-07-20_23:27:54-rados-wip-bharath6-testing-2025-07-20-0524-reef-distro-default-smithi/8399152
/a/skanta-2025-07-20_02:53:19-rados-wip-bharath6-testing-2025-07-20-0524-reef-distro-default-smithi/8397555
Updated by Kamoltat (Junior) Sirivadhna 6 months ago
/a/skanta-2025-08-16_00:23:08-rados-wip-bharath6-testing-2025-08-15-1555-reef-distro-default-smithi/8445287
Updated by Laura Flores 6 months ago
/a/lflores-2025-09-24_21:18:12-rados-reef-distro-default-smithi/8518886