qa/rgw: add new POOL_APP_NOT_ENABLED failures to log-ignorelist#53074
qa/rgw: add new POOL_APP_NOT_ENABLED failures to log-ignorelist#53074alimaredia merged 1 commit intoceph:mainfrom
Conversation
| - \(PG_AVAILABILITY\) | ||
| - \(PG_DEGRADED\) | ||
| - \(POOL_APP_NOT_ENABLED\) | ||
| - not have an application enabled |
There was a problem hiding this comment.
I think this is typically achieved by a generic overall HEALTH_ override because any of the other warnings that are being ignored above can "strike" the same way.
There was a problem hiding this comment.
hmm, i do see tons of overall HEALTH_ suppressions under qa/suites/; but in recent examples of our POOL_APP_NOT_ENABLED failures, i don't see any matches for overall HEALTH_ in the teuthology.log
i'm also hesitant to add any catch-alls here, because i want the rados team to know about any new/unexpected cluster warnings from the rgw suite as soon as they show up. am i misunderstanding how that overall HEALTH_ thing works?
There was a problem hiding this comment.
My understanding (I could be wrong!) is that the same health check description string (in this case "X pool(s) do not have an application enabled") can be output in two different ways. One, via Monitor::log_health(), would have the health check designator (in this case POOL_APP_NOT_ENABLED) appended. The other is the periodic summary output coming from Monitor::do_health_to_clog(). In that case the health check designator is appended only if
if (g_conf()->mon_health_detail_to_clog &&
summary != health_status_cache.summary &&
level != HEALTH_OK) {
condition holds. This goes for all health checks, so e.g. PG_DEGRADED that you are also ignoring could show up in the periodic summary output as "Degraded data redundancy: X/Y objects degraded (Z%)", without the designator that you are matching against. overall HEALTH_ in the ignorelist gets around that.
There was a problem hiding this comment.
okay, thanks; in the one failure that didn't mention POOL_APP_NOT_ENABLED, i did find this in the teuthology.log: cluster [WRN] overall HEALTH_WARN 1 pool(s) do not have an application enabled
There was a problem hiding this comment.
this in the teuthology.log:
cluster [WRN] overall HEALTH_WARN 1 pool(s) do not have an application enabled
Yup, I believe this is that periodic summary output coming from Monitor::do_health_to_clog().
There was a problem hiding this comment.
This PR looks good to me, but is the action item here that we should add a overall HEALTH_ suppression in place of the one added here or in supplement to it?
There was a problem hiding this comment.
i'd prefer to keep this suppression specific to not have an application enabled until we start seeing others
|
passed qa after rerun https://pulpito.ceph.com/cbodley-2023-08-23_19:56:40-rgw-main-distro-default-smithi/ |
|
Is there a reason you didn't add a symlink for |
Fixes: https://tracker.ceph.com/issues/62504 Signed-off-by: Casey Bodley <cbodley@redhat.com>
@alimaredia i had only added it where i saw the failures, but i updated the commit to cover the rest of our subsuites (dbstore was also missing) |
silence new cluster warnings added in #47560
Fixes: https://tracker.ceph.com/issues/62504
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windows