Bug #65521: Add expected warnings in cluster log to ignorelists - RADOS - Ceph

Actions

Copy link

#1

Updated by Laura Flores almost 2 years ago

Related to Bug #65422: upgrade/quincy-x: "1 pg degraded (PG_DEGRADED)" in cluster log added

Actions

Copy link

#2

Updated by Laura Flores almost 2 years ago

Related to Bug #64868: cephadm/osds, cephadm/workunits: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) in cluster log added

Actions

Copy link

#3

Updated by Laura Flores almost 2 years ago

Related to Bug #65235: upgrade/reef-x/stress-split: "OSDMAP_FLAGS: noscrub flag(s) set" warning in cluster log added
Related to Bug #62776: rados/basic: cluster [WRN] overall HEALTH_WARN - do not have an application enabled added
Related to Bug #64870: Health check failed: 1 osds down (OSD_DOWN)" in cluster log added

Actions

Copy link

#4

Updated by Matan Breizman almost 2 years ago

/a/yuriw-2024-04-16_23:25:35-rados-wip-yuriw-testing-20240416.150233-distro-default-smithi/7659305

Actions

Copy link

#5

Updated by Laura Flores almost 2 years ago

More in this run: https://pulpito.ceph.com/lflores-2024-04-01_18:07:25-rados-wip-yuri8-testing-2024-03-25-1419-distro-default-smithi/

Actions

Copy link

#6

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664685

"2024-04-20T16:10:00.000158+0000 mon.a (mon.0) 1407 : cluster [WRN] Health detail: HEALTH_WARN nodeep-scrub flag(s) set; Reduced data availability: 1 pg peering" in cluster log

Actions

Copy link

#7

Updated by Laura Flores almost 2 years ago

In this one, we are intentionally setting OSDs down, so the warning is expected.

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664689

2024-04-20T16:04:18.136 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:17 smithi012 ceph-mon[17195]: from='mgr.14152 172.21.15.12:0/2573228170' entity='mgr.a' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
2024-04-20T16:04:18.136 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:17 smithi012 ceph-mon[17195]: from='mgr.14152 172.21.15.12:0/2573228170' entity='mgr.a' cmd=[{"prefix": "auth get", "entity": "client.admin"}]: dispatch
2024-04-20T16:04:18.136 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:17 smithi012 ceph-mon[17195]: from='mgr.14152 172.21.15.12:0/2573228170' entity='mgr.a' cmd=[{"prefix": "osd df", "format": "json"}]: dispatch
2024-04-20T16:04:18.136 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:17 smithi012 ceph-mon[17195]: from='mgr.14152 172.21.15.12:0/2573228170' entity='mgr.a' cmd=[{"prefix": "osd df", "format": "json"}]: dispatch
2024-04-20T16:04:18.136 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:17 smithi012 ceph-mon[17195]: from='mgr.14152 172.21.15.12:0/2573228170' entity='mgr.a' cmd=[{"prefix": "osd safe-to-destroy", "ids": ["3"]}]: dispatch
2024-04-20T16:04:18.136 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:17 smithi012 ceph-mon[17195]: from='mgr.14152 172.21.15.12:0/2573228170' entity='mgr.a' cmd=[{"prefix": "osd down", "ids": ["3"]}]: dispatch
2024-04-20T16:04:19.135 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:18 smithi012 ceph-mon[17195]: from='mon.0 -' entity='mon.' cmd=[{"prefix": "osd df", "format": "json"}]: dispatch
2024-04-20T16:04:19.135 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:18 smithi012 ceph-mon[17195]: from='mon.0 -' entity='mon.' cmd=[{"prefix": "osd df", "format": "json"}]: dispatch
2024-04-20T16:04:19.135 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:18 smithi012 ceph-mon[17195]: from='mon.0 -' entity='mon.' cmd=[{"prefix": "osd safe-to-destroy", "ids": ["3"]}]: dispatch
2024-04-20T16:04:19.135 INFO:journalctl@ceph.mon.a.smithi012.stdout:Apr 20 16:04:18 smithi012 ceph-mon[17195]: Health check failed: 1 osds down (OSD_DOWN)

Actions

Copy link

#8

Updated by Laura Flores almost 2 years ago

Related to Bug #64872: rados/cephadm: Health check failed: 1 stray daemon(s) not managed by cephadm (CEPHADM_STRAY_DAEMON) in cluster log added

Actions

Copy link

#9

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664686

2024-04-20T16:26:04.659 INFO:teuthology.orchestra.run.smithi144.stdout:2024-04-20T16:10:00.000158+0000 mon.a (mon.0) 1407 : cluster [WRN] Health detail: HEALTH_WARN nodeep-scrub flag(s) set; Reduced data availability: 1 pg peering

Actions

Copy link

#10

Updated by Laura Flores almost 2 years ago · Edited

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664765
/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664810

Actions

Copy link

#11

Updated by Laura Flores almost 2 years ago · Edited

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664854
/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664891

POOL_APP_NOT_ENABLED

Actions

Copy link

#12

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664903

2024-04-20T17:46:51.770 INFO:teuthology.orchestra.run.smithi012.stdout:2024-04-20T17:44:38.893501+0000 mon.a (mon.0) 1023 : cluster [WRN] Health check failed: 2 Cephadm Agent(s) are not reporting. Hosts may be offline (CEPHADM_AGENT_DOWN)

Actions

Copy link

#13

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-20_15:32:38-rados-wip-yuriw-testing-20240419.185239-main-distro-default-smithi/7664940

OSD_DOWN

Actions

Copy link

#14

Updated by Laura Flores almost 2 years ago

Related to Bug #65728: Daemon managed by cephadm in an unknown state (CEPHADM_FAILED_DAEMON) added

Actions

Copy link

#15

Updated by Matan Breizman almost 2 years ago · Edited

/a/yuriw-2024-04-20_01:10:46-rados-wip-yuri7-testing-2024-04-18-1351-reef-distro-default-smithi/7664127
/a/yuriw-2024-04-20_01:10:46-rados-wip-yuri7-testing-2024-04-18-1351-reef-distro-default-smithi/7664245

Actions

Copy link

#16

Updated by Laura Flores almost 2 years ago

Partial fix for some of the warnings: https://github.com/ceph/ceph/pull/57218

Actions

Copy link

#17

Updated by Laura Flores almost 2 years ago

Related to Bug #65768: rados/verify: Health check failed: 1 osds down (OSD_DOWN)" in cluster log added

Actions

Copy link

#18

Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago

Related to Bug #65824: rados/thrash-old-clients: cluster [WRN] Health detail: HEALTH_WARN noscrub flag(s) set" in cluster log added

Actions

Copy link

#19

Updated by Nitzan Mordechai almost 2 years ago

/a/yuriw-2024-05-04_16:45:43-rados-wip-yuriw-testing-20240503.213524-main-distro-default-smithi/7691265

Actions

Copy link

#20

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652461

Actions

Copy link

#21

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652465

Actions

Copy link

#22

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652467

Actions

Copy link

#23

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652474

Actions

Copy link

#24

Updated by Laura Flores almost 2 years ago

/a/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/7652477

Actions

Copy link

#25

Updated by Laura Flores almost 2 years ago

Lots on https://pulpito.ceph.com/yuriw-2024-04-11_17:03:54-rados-wip-yuri6-testing-2024-04-02-1310-distro-default-smithi/

Actions

Copy link

#26

Updated by Laura Flores almost 2 years ago

Related to Bug #66474: rados/thrash-old-clients: HEALTH_WARN noscrub,nodeep-scrub flag(s) set; Degraded data redundancy added

Actions

Copy link

#27

Updated by Laura Flores almost 2 years ago

Priority changed from Normal to Urgent

Actions

Copy link

#28

Updated by Laura Flores almost 2 years ago

Tracker changed from Cleanup to Bug
Regression set to No
Severity set to 3 - minor

Actions

Copy link

#29

Updated by Laura Flores almost 2 years ago

reef-x upgrade suite needs to be fixed:
https://pulpito.ceph.com/lflores-2024-06-18_15:51:54-upgrade:reef-x-squid-release-distro-default-smithi/

Actions

Copy link

#30

Updated by Laura Flores over 1 year ago

Related to Bug #66602: rados/upgrade: Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED) added

Actions

Copy link

#31

Updated by Laura Flores over 1 year ago

Related to Bug #66603: rados/cephadm/smoke: CEPHADM_AGENT_DOWN: 2 Cephadm Agent(s) are not reporting. Hosts may be offline added

Actions

Copy link

#32

Updated by Laura Flores over 1 year ago

Related to Bug #66604: rados/thrash-old-clients: SLOW_OPS: 17 slow ops, oldest one blocked for 213 sec, osd.11 has slow ops added

Actions

Copy link

#33

Updated by Radoslaw Zarzynski over 1 year ago

Status changed from New to In Progress
Assignee set to Laura Flores

Note from bug scrub: I think @Laura Flores was already working on this.

Actions

Copy link

#34

Updated by Laura Flores over 1 year ago

Related to Bug #66809: upgrade/quincy-x; upgrade/reef-x: Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)" in cluster log added

Actions

Copy link

#35

Updated by Laura Flores over 1 year ago

Related to Bug #66810: upgrade/reef-x: "1 pg degraded (PG_DEGRADED)" in cluster log added

Actions

Copy link

#36

Updated by Laura Flores over 1 year ago

Related to deleted (Bug #66810: upgrade/reef-x: "1 pg degraded (PG_DEGRADED)" in cluster log)

Actions

Copy link

#37

Updated by Laura Flores over 1 year ago

Related to Bug #66811: upgrade/reef-x/stress-split: Health check failed: 1/3 mons down, quorum a,b (MON_DOWN)" in cluster log added

Actions

Copy link

#38

Updated by Laura Flores over 1 year ago

Related to Bug #67181: rados/verify: Health check failed: 1 osds down (OSD_DOWN)" in cluster log' added

Actions

Copy link

#39

Updated by Laura Flores over 1 year ago

Related to Bug #67182: rados/upgrade: Health check failed: Degraded data redundancy: 2/6 objects degraded (33.333%), 1 pg degraded (PG_DEGRADED)" in cluster log added

Actions

Copy link

#40

Updated by Nitzan Mordechai over 1 year ago

Related to Bug #67281: rados/upgrade/parallel - Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY) added

Actions

Copy link

#41

Updated by Laura Flores over 1 year ago

Related to Bug #67584: upgrade:quincy-x: cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)" in cluster log added

Actions

Copy link

#42

Updated by Brad Hubbard over 1 year ago

@Laura Flores I've submitted a PR for https://tracker.ceph.com/issues/65235 which should address the reef-x tests. Might need your help to review which trackers need to be updated, thanks.

Actions

Copy link

#43

Updated by Laura Flores over 1 year ago

Related to Bug #67879: upgrade/cephfs/mds_upgrade_sequence: Health detail: HEALTH_WARN 1 osds down" in cluster log added

Actions

Copy link

#44

Updated by Laura Flores over 1 year ago

Thanks @Brad Hubbard will take a look

Actions

Copy link

#45

Updated by Laura Flores over 1 year ago

Related to Bug #67970: rados/thrash-old-clients: "HEALTH_WARN Degraded data redundancy: 7/134 objects degraded (5.224%), 1 pg degraded" in cluster log added

Actions

Copy link

#46

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

Related to Bug #68602: rados/thrash-old-clients: [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive, 1 pg peering added

Actions

Copy link

#47

Updated by Laura Flores over 1 year ago

Status changed from In Progress to Resolved

Marking this as Resolved since most remaining issues are tracked individually.

Actions

Copy link

#48

Updated by Laura Flores about 1 year ago

Status changed from Resolved to In Progress

Reusing this tracker to track warnings that need to be backported to reef now.

Take this wip run as an example: https://pulpito.ceph.com/yuriw-2025-03-07_23:09:12-rados-wip-yuri5-testing-2025-03-07-1307-reef-distro-default-smithi/

Actions

Copy link

#49

Updated by Laura Flores about 1 year ago

Status changed from In Progress to Closed

I made this tag: https://tracker.ceph.com/projects/ceph/issues?fields%5B%5D=issue_tags&operators%5Bissue_tags%5D=%3D&set_filter=1&values%5Bissue_tags%5D%5B%5D=cluster-log-warning

Closing this issue so we can track things with the tag instead.

Project

General

Profile

Ceph » RADOS

Tags

Custom queries

Bug #65521

Add expected warnings in cluster log to ignorelists

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Matan Breizman almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago · Edited

Updated by Laura Flores almost 2 years ago · Edited

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Matan Breizman almost 2 years ago · Edited

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Kamoltat (Junior) Sirivadhna almost 2 years ago

Updated by Nitzan Mordechai almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores almost 2 years ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Radoslaw Zarzynski over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Nitzan Mordechai over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Brad Hubbard over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Kamoltat (Junior) Sirivadhna over 1 year ago

Updated by Laura Flores over 1 year ago

Updated by Laura Flores about 1 year ago

Updated by Laura Flores about 1 year ago