Skip to content

qa/cephfs: ignore when specific OSD is reported down during upgrade#58486

Merged
rishabh-d-dave merged 1 commit intoceph:mainfrom
rishabh-d-dave:ignore-osd-down
Oct 18, 2024
Merged

qa/cephfs: ignore when specific OSD is reported down during upgrade#58486
rishabh-d-dave merged 1 commit intoceph:mainfrom
rishabh-d-dave:ignore-osd-down

Conversation

@rishabh-d-dave
Copy link
Contributor

@rishabh-d-dave rishabh-d-dave commented Jul 9, 2024

We already ignore health warning regarding OSD being down during upgrade
but health warning regarding specific OSD being down is not added to the
ignorelist which causes upgrade jobs to be marked as failed even though
they were successful.

Fixes: https://tracker.ceph.com/issues/66877

Contribution Guidelines

  • To sign and title your commits, please refer to Submitting Patches to Ceph.

  • If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.

  • When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an x between the brackets: [x]. Spaces and capitalization matter when checking off items this way.

Checklist

  • Tracker (select at least one)
    • References tracker ticket
    • Very recent bug; references commit where it was introduced
    • New feature (ticket optional)
    • Doc update (no ticket needed)
    • Code cleanup (no ticket needed)
  • Component impact
    • Affects Dashboard, opened tracker ticket
    • Affects Orchestrator, opened tracker ticket
    • No impact that needs to be tracked
  • Documentation (select at least one)
    • Updates relevant documentation
    • No doc update is appropriate
  • Tests (select at least one)
Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows
  • jenkins test rook e2e

@rishabh-d-dave rishabh-d-dave requested a review from a team July 9, 2024 18:48
@rishabh-d-dave rishabh-d-dave force-pushed the ignore-osd-down branch 3 times, most recently from 45ecf64 to acdbe87 Compare July 11, 2024 16:41
@rishabh-d-dave
Copy link
Contributor Author

Same - ceph API failed due to error unrelated to this PR. https://jenkins.ceph.com/job/ceph-api/77616/

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

ceph:
log-ignorelist:
- OSD_DOWN
- cluster *\[WRN\] *osd.*is down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wow - regex's are allowed. That's because these strings are used in grep, yes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, grep allows regex so it should be fine. There's on other way to catch that error message otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's wrong with osd.*is down?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current pattern ensures that regex matches no messages other than cluster warning message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do we need to be that specific? We've often had to put the same type of message in the ignorelist twice because it appears as a health warning and a cluster message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I didn't want to ignore unnecessary message. Will change this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

1 similar comment
@rishabh-d-dave
Copy link
Contributor Author

jenkins test api

We already ignore health warning regarding OSD being down during upgrade
but health warning regarding specific OSD being down is not added to the
ignorelist which causes upgrade jobs to be marked as failed even though
they were successful.

Fixes: https://tracker.ceph.com/issues/66877
Signed-off-by: Rishabh Dave <ridave@redhat.com>
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had any activity for 60 days. It will be closed if no further activity occurs for another 30 days.
If you are a maintainer or core committer, please follow-up on this pull request to identify what steps should be taken by the author to move this proposed change forward.
If you are the author of this pull request, thank you for your proposed contribution. If you believe this change is still appropriate, please ensure that any feedback has been addressed and ask for a code review.

@github-actions github-actions bot added the stale label Sep 21, 2024
@rishabh-d-dave rishabh-d-dave added wip-rishabh-testing Rishabh's testing label and removed stale needs-review labels Sep 24, 2024
@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

@rishabh-d-dave
Copy link
Contributor Author

This PR is under test in https://tracker.ceph.com/issues/68354.

@rishabh-d-dave
Copy link
Contributor Author

jenkins test make check arm64

Copy link
Contributor Author

@rishabh-d-dave rishabh-d-dave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cephfs Ceph File System tests wip-rishabh-testing Rishabh's testing label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants