Project

General

Profile

Actions

Bug #62713

open

sync && sudo umount -f /var/lib/ceph/osd/ceph-6

Added by Kamoltat (Junior) Sirivadhna over 2 years ago. Updated 6 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
tentacle,squid,reef
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v20.3.0-598-ga7052198d9
Released In:
Upkeep Timestamp:
2025-07-14T17:41:50+00:00

Description

Description of problem

/a/yuriw-2023-08-16_18:39:08-rados-wip-yuri3-testing-2023-08-15-0955-distro-default-smithi/7370263/

Traceback:

2023-08-16T22:19:50.948 DEBUG:teuthology.orchestra.run.smithi123:> sync && sudo umount -f /var/lib/ceph/osd/ceph-6
2023-08-16T22:19:51.024 INFO:teuthology.orchestra.run.smithi123.stderr:umount: /var/lib/ceph/osd/ceph-6: target is busy.

We suspect that the problem might be to do with systemd


Related issues 3 (3 open0 closed)

Copied to RADOS - Backport #71540: tentacle: sync && sudo umount -f /var/lib/ceph/osd/ceph-6In ProgressNitzan MordechaiActions
Copied to RADOS - Backport #71541: reef: sync && sudo umount -f /var/lib/ceph/osd/ceph-6In ProgressNitzan MordechaiActions
Copied to RADOS - Backport #71542: squid: sync && sudo umount -f /var/lib/ceph/osd/ceph-6In ProgressNitzan MordechaiActions
Actions #1

Updated by Laura Flores over 2 years ago

@Junior have you seen any further instances of this?

Actions #2

Updated by Kamoltat (Junior) Sirivadhna over 2 years ago

/a/yuriw-2023-11-02_14:18:00-rados-wip-yuri-testing-2023-11-01-1538-reef-distro-default-smithi/7444612/

Actions #3

Updated by Laura Flores over 1 year ago

/a/lflores-2024-07-23_16:58:04-rados-squid-distro-default-smithi/7814104
/a/lflores-2024-07-23_16:58:04-rados-squid-distro-default-smithi/7814102

Actions #4

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-08-06_00:08:05-rados-wip-yuri6-testing-2024-08-05-0905-squid-distro-default-smithi/7838404

Actions #5

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-08-02_15:32:29-rados-squid-release-distro-default-smithi/7833174

Actions #6

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-08-28_23:20:36-rados-wip-yuri4-testing-2024-08-28-1359-distro-default-smithi/7879426

Actions #7

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-08-29_14:15:40-rados-wip-yuri8-testing-2024-08-28-1632-squid-distro-default-smithi/7880295

Actions #8

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-10-15_14:06:51-rados-wip-yuri8-testing-2024-10-14-1103-distro-default-smithi/7948329

Actions #9

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

/a/yuriw-2024-10-16_22:34:49-rados-wip-yuri10-testing-2024-10-15-1007-distro-default-smithi/7952607/

Actions #10

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-10-16_14:57:58-rados-wip-yuri2-testing-2024-10-15-0703-distro-default-smithi/7950794

Actions #11

Updated by Laura Flores over 1 year ago

/a/yuriw-2024-11-18_23:57:04-rados-wip-yuri4-testing-2024-11-18-0759-squid-distro-default-smithi/7999083

Actions #12

Updated by Sridhar Seshasayee over 1 year ago

/a/skanta-2024-11-26_00:08:32-rados-wip-bharath8-testing-2024-11-25-1916-squid-distro-default-smithi/8009802

Actions #13

Updated by Nitzan Mordechai about 1 year ago

/a/skanta-2025-01-24_10:43:00-rados-wip-bharath10-testing-2025-01-24-0941-squid-distro-default-smithi/8091987
/a/skanta-2025-01-24_10:43:00-rados-wip-bharath10-testing-2025-01-24-0941-squid-distro-default-smithi/8091815
/a/skanta-2025-01-24_10:43:00-rados-wip-bharath10-testing-2025-01-24-0941-squid-distro-default-smithi/8091962

Actions #14

Updated by Laura Flores about 1 year ago

/a/skanta-2025-03-01_01:42:21-rados-wip-bharath3-testing-2025-03-01-0356-distro-default-smithi/8162042

Actions #15

Updated by Laura Flores about 1 year ago

/a/yuriw-2025-02-26_00:53:37-rados-wip-yuri2-testing-2025-02-25-1331-squid-distro-default-smithi/8154000

Actions #16

Updated by Laura Flores about 1 year ago

/a/yuriw-2025-03-14_20:32:49-rados-wip-yuri7-testing-2025-03-11-0847-distro-default-smithi/8190596

Actions #17

Updated by Nitzan Mordechai about 1 year ago

/a/yuriw-2025-03-18_01:22:46-rados-wip-yuri5-testing-2025-03-17-1441-reef-distro-default-smithi/
['8196387', '8196551', '8196575', '8196409']

Actions #18

Updated by Adam Kupczyk 12 months ago

https://pulpito.ceph.com/akupczyk-2025-03-25_08:05:35-rados-aclamk-wip-ifed-fix-bluefs-reserved2-distro-default-smithi/8207871/

There are 16 deployed OSD. All seem to be running fine.
At some point we do:

2025-03-25T10:38:22.772 INFO:teuthology.misc:Shutting down osd daemons...

This initiates shutting down loop (teuthology/misc.py):

def stop_daemons_of_type(ctx, type_, cluster='ceph', timeout=300):
    """ 
    :param type_: type of daemons to be stopped.
    :param cluster: Cluster name, default is 'ceph'.
    :param timeout: Timeout in seconds for stopping each daemon.
    """ 
    log.info('Shutting down %s daemons...' % type_)
    exc = None
    for daemon in ctx.daemons.iter_daemons_of_role(type_, cluster):
        try:
            daemon.stop(timeout)
        except (CommandFailedError,
                CommandCrashedError,
                ConnectionLostError) as e:
            exc = e
            log.exception('Saw exception from %s.%s', daemon.role, daemon.id_)
    if exc is not None:
        raise exc

But somehow now all OSDs are processed by "for daemon in ctx.daemons.iter_daemons_of_role(type_, cluster):"
There are messages for stopping:
0, 4, 8, 12,
1, 5, 9, 13,
2, 6, 10, 14,
3
and that's it.
At this point we continue to shuttdown monitors.

This is reflected with sequence of osds getting termination signals.
But 7, 11, 15 are not there.

2025-03-25T10:38:22.776+0000 add6640 -1 osd.0 347 *** Got signal Terminated ***
2025-03-25T10:41:13.732+0000 add6640 -1 osd.4 348 *** Got signal Terminated ***
2025-03-25T10:41:19.794+0000 add6640 -1 osd.8 350 *** Got signal Terminated ***
2025-03-25T10:41:25.899+0000 add6640 -1 osd.12 352 *** Got signal Terminated ***
2025-03-25T10:41:32.000+0000 add6640 -1 osd.1 354 *** Got signal Terminated ***
2025-03-25T10:41:38.103+0000 add6640 -1 osd.5 356 *** Got signal Terminated ***
2025-03-25T10:41:44.204+0000 add6640 -1 osd.9 357 *** Got signal Terminated ***
2025-03-25T10:41:50.307+0000 add6640 -1 osd.13 359 *** Got signal Terminated ***
2025-03-25T10:41:56.411+0000 add6640 -1 osd.2 360 *** Got signal Terminated ***
2025-03-25T10:42:02.514+0000 add6640 -1 osd.6 361 *** Got signal Terminated ***
2025-03-25T10:42:08.616+0000 add6640 -1 osd.10 362 *** Got signal Terminated ***
2025-03-25T10:42:14.718+0000 add6640 -1 osd.14 363 *** Got signal Terminated ***
2025-03-25T10:42:20.822+0000 add6640 -1 osd.3 364 *** Got signal Terminated ***

At this point is seems only natural that we cannot umount osd.7 device.

Actions #21

Updated by Laura Flores 12 months ago

/a/skanta-2025-04-04_06:10:17-rados-wip-bharath10-testing-2025-04-03-2112-distro-default-smithi/8223483

Actions #22

Updated by Kamoltat (Junior) Sirivadhna 12 months ago

yuriw-2025-03-26_14:16:14-rados-wip-yuri8-testing-2025-03-25-0803-squid-distro-default-smithi/8210861/

Actions #23

Updated by Laura Flores 12 months ago

/a/teuthology-2025-04-06_20:00:14-rados-main-distro-default-smithi/8226707
3 jobs: ['8226707', '8226537', '8226683']

Actions #24

Updated by Laura Flores 12 months ago

  • Project changed from Infrastructure to RADOS
  • Description updated (diff)
Actions #25

Updated by Laura Flores 12 months ago

/a/skanta-2025-04-05_15:49:33-rados-wip-bharath8-testing-2025-04-05-1439-distro-default-smithi/8225226

Actions #26

Updated by Laura Flores 11 months ago

Pinged in #ceph-devel to see if someone can pick this up.

Actions #27

Updated by Nitzan Mordechai 11 months ago

  • Status changed from New to In Progress
  • Assignee set to Nitzan Mordechai

@Laura Flores looks like we are not waiting enough for the osd to terminate before umount. we can use "stop-daemons-timeout" that I added recently.
I'll create PR for that

Actions #28

Updated by Nitzan Mordechai 11 months ago

  • Pull request ID set to 62823
Actions #29

Updated by Nitzan Mordechai 11 months ago

  • Status changed from In Progress to Fix Under Review
Actions #30

Updated by Laura Flores 11 months ago

Awesome, thanks Nitzan!

Actions #31

Updated by Laura Flores 11 months ago

Approved the fix and added "needs QA".

Actions #32

Updated by Laura Flores 11 months ago

  • Backport set to squid

/a/skanta-2025-04-25_13:44:10-rados-wip-bharath10-testing-2025-04-25-0722-squid-distro-default-smithi/8259017

Actions #33

Updated by Laura Flores 11 months ago

  • Backport changed from squid to squid,reef

/a/skanta-2025-04-03_15:46:29-rados-wip-bharath5-testing-2025-04-03-1526-reef-distro-default-smithi/8223045

Actions #34

Updated by Laura Flores 11 months ago

Fix has been approved; waiting for ci to pass.

Actions #35

Updated by Laura Flores 10 months ago

Retriggered ci again.

Actions #36

Updated by Radoslaw Zarzynski 10 months ago

  • Status changed from Fix Under Review to Pending Backport
  • Backport changed from squid,reef to tentacle,squid,reef

Merged in main.

Actions #37

Updated by Upkeep Bot 10 months ago

  • Copied to Backport #71540: tentacle: sync && sudo umount -f /var/lib/ceph/osd/ceph-6 added
Actions #38

Updated by Upkeep Bot 10 months ago

  • Copied to Backport #71541: reef: sync && sudo umount -f /var/lib/ceph/osd/ceph-6 added
Actions #39

Updated by Upkeep Bot 10 months ago

  • Copied to Backport #71542: squid: sync && sudo umount -f /var/lib/ceph/osd/ceph-6 added
Actions #40

Updated by Upkeep Bot 10 months ago

  • Tags (freeform) set to backport_processed
Actions #41

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to a7052198d9bd906b8a162745c4604a12a44bc94d
  • Fixed In set to v20.3.0-598-ga7052198d9b
  • Upkeep Timestamp set to 2025-07-09T16:09:40+00:00
Actions #42

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v20.3.0-598-ga7052198d9b to v20.3.0-598-ga7052198d9
  • Upkeep Timestamp changed from 2025-07-09T16:09:40+00:00 to 2025-07-14T17:41:50+00:00
Actions #43

Updated by Lee Sanders 8 months ago

/a/skanta-2025-07-16_13:08:53-rados-wip-bharath3-testing-2025-07-16-1352-squid-distro-default-smithi/8390617

Actions #44

Updated by Lee Sanders 8 months ago

/a/skanta-2025-07-16_13:08:53-rados-wip-bharath3-testing-2025-07-16-1352-squid-distro-default-smithi/8390677/

Actions #45

Updated by Lee Sanders 8 months ago

/a/skanta-2025-07-16_22:44:28-rados-wip-bharath3-testing-2025-07-16-1352-squid-distro-default-smithi/8391937

Actions #46

Updated by Jonathan Bailey 8 months ago

/a/skanta-2025-07-20_23:27:54-rados-wip-bharath6-testing-2025-07-20-0524-reef-distro-default-smithi/8399136
/a/skanta-2025-07-20_23:27:54-rados-wip-bharath6-testing-2025-07-20-0524-reef-distro-default-smithi/8399152
/a/skanta-2025-07-20_02:53:19-rados-wip-bharath6-testing-2025-07-20-0524-reef-distro-default-smithi/8397555

Actions #47

Updated by Kamoltat (Junior) Sirivadhna 6 months ago

/a/skanta-2025-08-16_00:23:08-rados-wip-bharath6-testing-2025-08-15-1555-reef-distro-default-smithi/8445287

Actions #48

Updated by Laura Flores 6 months ago

/a/lflores-2025-09-24_21:18:12-rados-reef-distro-default-smithi/8518886

Actions

Also available in: Atom PDF