Skip to content

qa/rgw: add new POOL_APP_NOT_ENABLED failures to log-ignorelist#53074

Merged
alimaredia merged 1 commit intoceph:mainfrom
cbodley:wip-62504
Aug 24, 2023
Merged

qa/rgw: add new POOL_APP_NOT_ENABLED failures to log-ignorelist#53074
alimaredia merged 1 commit intoceph:mainfrom
cbodley:wip-62504

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Aug 21, 2023

silence new cluster warnings added in #47560

Fixes: https://tracker.ceph.com/issues/62504

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

- \(PG_AVAILABILITY\)
- \(PG_DEGRADED\)
- \(POOL_APP_NOT_ENABLED\)
- not have an application enabled
Copy link
Contributor

@idryomov idryomov Aug 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is typically achieved by a generic overall HEALTH_ override because any of the other warnings that are being ignored above can "strike" the same way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, i do see tons of overall HEALTH_ suppressions under qa/suites/; but in recent examples of our POOL_APP_NOT_ENABLED failures, i don't see any matches for overall HEALTH_ in the teuthology.log

i'm also hesitant to add any catch-alls here, because i want the rados team to know about any new/unexpected cluster warnings from the rgw suite as soon as they show up. am i misunderstanding how that overall HEALTH_ thing works?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding (I could be wrong!) is that the same health check description string (in this case "X pool(s) do not have an application enabled") can be output in two different ways. One, via Monitor::log_health(), would have the health check designator (in this case POOL_APP_NOT_ENABLED) appended. The other is the periodic summary output coming from Monitor::do_health_to_clog(). In that case the health check designator is appended only if

  if (g_conf()->mon_health_detail_to_clog &&
      summary != health_status_cache.summary &&
      level != HEALTH_OK) {

condition holds. This goes for all health checks, so e.g. PG_DEGRADED that you are also ignoring could show up in the periodic summary output as "Degraded data redundancy: X/Y objects degraded (Z%)", without the designator that you are matching against. overall HEALTH_ in the ignorelist gets around that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, thanks; in the one failure that didn't mention POOL_APP_NOT_ENABLED, i did find this in the teuthology.log: cluster [WRN] overall HEALTH_WARN 1 pool(s) do not have an application enabled

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this in the teuthology.log: cluster [WRN] overall HEALTH_WARN 1 pool(s) do not have an application enabled

Yup, I believe this is that periodic summary output coming from Monitor::do_health_to_clog().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good to me, but is the action item here that we should add a overall HEALTH_ suppression in place of the one added here or in supplement to it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer to keep this suppression specific to not have an application enabled until we start seeing others

@cbodley
Copy link
Contributor Author

cbodley commented Aug 24, 2023

@alimaredia
Copy link
Contributor

Is there a reason you didn't add a symlink for ignore-pg-availability.yaml to the d4n, notifications, thrash, and upgrade sub-suites?

@cbodley
Copy link
Contributor Author

cbodley commented Aug 24, 2023

Is there a reason you didn't add a symlink for ignore-pg-availability.yaml to the d4n, notifications, thrash, and upgrade sub-suites?

@alimaredia i had only added it where i saw the failures, but i updated the commit to cover the rest of our subsuites (dbstore was also missing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants