Project

General

Profile

Actions

Bug #66698

open

daemon_watchdog:BARK! but thrasher will continue looping until timeout

Added by Nitzan Mordechai over 1 year ago. Updated 5 months ago.

Status:
Pending Backport
Priority:
Normal
Category:
-
Target version:
-
% Done:

0%

Source:
Backport:
squid,reef,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(RADOS):
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-3972-g58a668dba4
Released In:
v20.2.0~2301
Upkeep Timestamp:
2025-11-01T01:03:16+00:00

Description

/a/nmordech-2024-06-26_11:08:57-rados:thrash-erasure-code-isa-main-distro-default-smithi/7773254

2024-06-26T11:34:12.906 INFO:tasks.daemonwatchdog.daemon_watchdog:OSDThrasher failed
2024-06-26T11:34:12.906 INFO:tasks.daemonwatchdog.daemon_watchdog:BARK! unmounting mounts and killing all daemons
.
.
2024-06-26T11:34:14.925 INFO:tasks.ceph.osd.2.smithi081.stderr:2024-06-26T11:34:14.938+0000 7f62d9efb640 -1 received  signal: Hangup from /usr/bin/python3 /usr/bin/daemon-helper kill ceph-osd -f --cluster ceph -i 2  (PID: 25394) UID: 0
2024-06-26T11:34:15.025 INFO:tasks.ceph.osd.0:Sent signal 15
2024-06-26T11:34:15.026 INFO:tasks.ceph.osd.4:Sent signal 15
2024-06-26T11:34:15.026 INFO:tasks.ceph.osd.8:Sent signal 15
2024-06-26T11:34:15.027 INFO:tasks.ceph.osd.12:Sent signal 15
2024-06-26T11:34:15.027 INFO:tasks.ceph.osd.1:Sent signal 15
2024-06-26T11:34:15.027 INFO:tasks.ceph.osd.5:Sent signal 15
2024-06-26T11:34:15.027 INFO:tasks.ceph.osd.9:Sent signal 15
2024-06-26T11:34:15.028 INFO:tasks.ceph.osd.13:Sent signal 15
2024-06-26T11:34:15.028 INFO:tasks.ceph.osd.2:Sent signal 15
2024-06-26T11:34:15.028 INFO:tasks.ceph.osd.6:Sent signal 15
2024-06-26T11:34:15.028 INFO:tasks.ceph.osd.10:Sent signal 15
2024-06-26T11:34:15.028 INFO:tasks.ceph.osd.14:Sent signal 15
2024-06-26T11:34:15.029 INFO:tasks.ceph.osd.3:Sent signal 15
2024-06-26T11:34:15.029 INFO:tasks.ceph.osd.7:Sent signal 15
2024-06-26T11:34:15.029 INFO:tasks.ceph.osd.11:Sent signal 15
2024-06-26T11:34:15.029 INFO:tasks.ceph.osd.15:Sent signal 15
2024-06-26T11:34:15.030 INFO:tasks.ceph.mon.a:Sent signal 15
2024-06-26T11:34:15.030 INFO:tasks.ceph.mon.b:Sent signal 15
2024-06-26T11:34:15.030 INFO:tasks.ceph.mon.c:Sent signal 15
2024-06-26T11:34:15.030 INFO:tasks.ceph.mgr.y:Sent signal 15
2024-06-26T11:34:15.031 INFO:tasks.ceph.mgr.x:Sent signal 15

all daemon terminated then we continue looping and trying to get op history:

2024-06-26T11:34:20.665 DEBUG:teuthology.orchestra.run.smithi016:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 30 ceph --cluster ceph --admin-daemon /var/run/ceph/ceph-osd.8.asok dump_ops_in_flight
2024-06-26T11:34:20.685 ERROR:teuthology.orchestra.daemon.state:Failed to send signal 1: None
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/teuthology/orchestra/daemon/state.py", line 108, in signal
    self.proc.stdin.write(struct.pack('!b', sig))
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/file.py", line 389, in write
    self._write_all(data)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/file.py", line 507, in _write_all
    count = self._write(data)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 1362, in _write
    self.channel.sendall(data)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 844, in sendall
    sent = self.send(s)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 799, in send
    return self._send(s, m)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 1196, in _send
    raise socket.error("Socket is closed")
OSError: Socket is closed
2024-06-26T11:34:20.787 ERROR:teuthology.orchestra.daemon.state:Failed to send signal 1: None
Traceback (most recent call last):
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/teuthology/orchestra/daemon/state.py", line 108, in signal
    self.proc.stdin.write(struct.pack('!b', sig))
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/file.py", line 389, in write
    self._write_all(data)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/file.py", line 507, in _write_all
    count = self._write(data)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 1362, in _write
    self.channel.sendall(data)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 844, in sendall
    sent = self.send(s)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 799, in send
    return self._send(s, m)
  File "/home/teuthworker/src/git.ceph.com_teuthology_544fecbcd55f3d2b6f478823823ce40cbefef1d4/virtualenv/lib/python3.10/site-packages/paramiko/channel.py", line 1196, in _send
    raise socket.error("Socket is closed")


Related issues 3 (1 open2 closed)

Copied to RADOS - Backport #67502: reef: daemon_watchdog:BARK! but thrasher will continue looping until timeout ResolvedNitzan MordechaiActions
Copied to RADOS - Backport #67503: quincy: daemon_watchdog:BARK! but thrasher will continue looping until timeout RejectedNitzan MordechaiActions
Copied to RADOS - Backport #67504: squid: daemon_watchdog:BARK! but thrasher will continue looping until timeout In ProgressNitzan MordechaiActions
Actions #1

Updated by Nitzan Mordechai over 1 year ago

  • Status changed from In Progress to Fix Under Review
  • Pull request ID set to 58282
Actions #2

Updated by Laura Flores over 1 year ago

Note: solution needs work

Actions #3

Updated by Radoslaw Zarzynski over 1 year ago

scrub note: approved and in the QA.

Actions #4

Updated by Radoslaw Zarzynski over 1 year ago

scrub note: under test -- see https://tracker.ceph.com/issues/66822.

Actions #5

Updated by Laura Flores over 1 year ago

Still under test.

Actions #6

Updated by Laura Flores over 1 year ago

Waiting on qa.

Actions #7

Updated by Laura Flores over 1 year ago

Needs to resolve conflicts; otherwise was approved to merge.

Actions #8

Updated by Laura Flores over 1 year ago

  • Status changed from Fix Under Review to Pending Backport
Actions #9

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #67502: reef: daemon_watchdog:BARK! but thrasher will continue looping until timeout added
Actions #10

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #67503: quincy: daemon_watchdog:BARK! but thrasher will continue looping until timeout added
Actions #11

Updated by Upkeep Bot over 1 year ago

  • Copied to Backport #67504: squid: daemon_watchdog:BARK! but thrasher will continue looping until timeout added
Actions #12

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #13

Updated by Upkeep Bot 9 months ago

  • Merge Commit set to 58a668dba4f84a87593e375f79b813e3978deec6
  • Fixed In set to v19.3.0-3972-g58a668dba4f
  • Upkeep Timestamp set to 2025-07-08T22:38:11+00:00
Actions #14

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3972-g58a668dba4f to v19.3.0-3972-g58a668dba4f8
  • Upkeep Timestamp changed from 2025-07-08T22:38:11+00:00 to 2025-07-14T15:46:44+00:00
Actions #15

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-3972-g58a668dba4f8 to v19.3.0-3972-g58a668dba4
  • Upkeep Timestamp changed from 2025-07-14T15:46:44+00:00 to 2025-07-14T21:10:45+00:00
Actions #16

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~2301
  • Upkeep Timestamp changed from 2025-07-14T21:10:45+00:00 to 2025-11-01T01:03:16+00:00
Actions

Also available in: Atom PDF