Project

General

Profile

Actions

Bug #64864

closed

cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster log

Added by Sridhar Seshasayee about 2 years ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
orchestrator
Target version:
-
% Done:

0%

Source:
Development
Backport:
squid,reef,quincy
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Pull request ID:
Tags (freeform):
backport_processed
Fixed In:
v19.3.0-1248-gbf893680e5
Released In:
v20.2.0~3165
Upkeep Timestamp:
2025-11-01T01:34:01+00:00

Description

The following tests in the cephadm suite failed with the warning:

/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587779
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587855
/a/yuriw-2024-03-08_16:20:46-rados-wip-yuri4-testing-2024-03-05-0854-distro-default-smithi/7587949

All the tests above add "MON_DOWN" to the ignore list as it's expected. In addition to the health
warning, the health detail is also logged by all the tests shown below:

"cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c" in cluster log

All the tests failed due to the above warning not present in the ignorelist.

Therefore, this tracker may be used to track the addition of "mons down" warning
as well to the ignore list for the tests.

Logs from 7587779 are shown below as an example:

2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.900461+0000 mon.a (mon.0) 274 : cluster [WRN] Health check failed: 1/3 mons down, quorum a,c (MON_DOWN)
2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c
2024-03-10T01:59:07.349 INFO:journalctl@ceph.mon.a.smithi033.stdout:Mar 10 01:59:06 smithi033 bash[21389]: cluster 2024-03-10T01:59:06.908009+0000 mon.a (mon.0) 276 : cluster [WRN] [WRN] MON_DOWN: 1/3 mons down, quorum a,c

...

2024-03-10T02:10:47.804 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[ERR\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1
2024-03-10T02:10:47.859 DEBUG:teuthology.orchestra.run.smithi033:> sudo egrep '\[WRN\]' /var/log/ceph/1bb78214-de81-11ee-95c7-87774f69a715/ceph.log | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | egrep -v MON_DOWN | head -n 1
2024-03-10T02:10:47.915 INFO:teuthology.orchestra.run.smithi033.stdout:2024-03-10T01:59:06.907964+0000 mon.a (mon.0) 275 : cluster [WRN] Health detail: HEALTH_WARN 1/3 mons down, quorum a,c

Related issues 3 (0 open3 closed)

Copied to Orchestrator - Backport #66475: squid: cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster logResolvedLaura FloresActions
Copied to Orchestrator - Backport #66476: reef: cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster logResolvedAdam KingActions
Copied to Orchestrator - Backport #66477: quincy: cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster logResolvedAdam KingActions
Actions #1

Updated by Sridhar Seshasayee about 2 years ago

  • Description updated (diff)
Actions #2

Updated by Sridhar Seshasayee about 2 years ago

  • Category changed from cephadm to orchestrator
Actions #3

Updated by Sridhar Seshasayee about 2 years ago

  • Tags set to test-failure
Actions #4

Updated by Aishwarya Mathuria about 2 years ago

/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609867
/a/yuriw-2024-03-19_00:09:45-rados-wip-yuri5-testing-2024-03-18-1144-distro-default-smithi/7609907

Actions #5

Updated by Nitzan Mordechai almost 2 years ago

/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620793
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620804
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620848
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620903
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620939
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7620978
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621027
/a/yuriw-2024-03-25_00:22:23-rados-wip-yuri3-testing-2024-03-24-1519-distro-default-smithi/7621054

Actions #6

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623410
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623421
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623458
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623475
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623536
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623550
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623575
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623597
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623612
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623620
/a/yuriw-2024-03-26_14:32:05-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/7623624

Actions #7

Updated by Laura Flores almost 2 years ago

  • Assignee set to Laura Flores
Actions #8

Updated by Laura Flores almost 2 years ago

  • Status changed from New to Fix Under Review
  • Pull request ID set to 56619
Actions #9

Updated by Laura Flores almost 2 years ago

  • Status changed from Fix Under Review to Resolved
Actions #10

Updated by Laura Flores almost 2 years ago

  • Status changed from Resolved to Pending Backport
  • Backport set to squid
Will need to be also backported to reef and quincy once the mon_cluster_log_to_file backports merge:
Actions #11

Updated by Laura Flores almost 2 years ago

  • Copied to Backport #66475: squid: cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster log added
Actions #13

Updated by Laura Flores almost 2 years ago

  • Backport changed from squid to squid,reef,quincy
Actions #14

Updated by Laura Flores almost 2 years ago

  • Copied to Backport #66476: reef: cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster log added
Actions #15

Updated by Laura Flores almost 2 years ago

  • Copied to Backport #66477: quincy: cephadm: Health detail: HEALTH_WARN 1/3 mons down, quorum a,c in cluster log added
Actions #17

Updated by Konstantin Shalygin over 1 year ago

  • Source set to Development
Actions #18

Updated by Upkeep Bot over 1 year ago

  • Tags (freeform) set to backport_processed
Actions #19

Updated by Adam King about 1 year ago

  • Status changed from Pending Backport to Resolved
Actions #20

Updated by Naveen Naidu 10 months ago

/a/skanta-2025-05-10_21:45:05-rados-wip-bharath8-testing-2025-05-10-1334-reef-distro-default-smithi

12 jobs: ['8279541', '8279532', '8279534', '8279522', '8279545', '8279543', '8279535', '8279538', '8279531', '8279526', '8279517', '8279528']

Actions #21

Updated by Upkeep Bot 8 months ago

  • Merge Commit set to bf893680e5fdbc89bb915ba97e1a33a134bca832
  • Fixed In set to v19.3.0-1248-gbf893680e5f
  • Upkeep Timestamp set to 2025-07-11T14:17:44+00:00
Actions #22

Updated by Upkeep Bot 8 months ago

  • Fixed In changed from v19.3.0-1248-gbf893680e5f to v19.3.0-1248-gbf893680e5
  • Upkeep Timestamp changed from 2025-07-11T14:17:44+00:00 to 2025-07-14T23:10:21+00:00
Actions #23

Updated by Upkeep Bot 5 months ago

  • Released In set to v20.2.0~3165
  • Upkeep Timestamp changed from 2025-07-14T23:10:21+00:00 to 2025-11-01T01:34:01+00:00
Actions

Also available in: Atom PDF