pacific: qa/suites/orch: whitelist warnings that are expected in test environments#55523
Conversation
57bd6ab to
84e7142
Compare
84e7142 to
bedfc49
Compare
|
@ljflores looks like a bigger set of changes than for main? Happy to approve this once it passes QA. Anything you need from me in the mean time? |
|
@markhpc yeah, this changeset is a bit different since there are some tests that are specific to pacific. I wrote that this is a "partial" backport in the commit message. I'll link test results here once I have them. (Trello ref: https://trello.com/c/3cEnuGqr/1952-wip-yuri10-testing-2024-02-08-0854-pacific) |
ronen-fr
left a comment
There was a problem hiding this comment.
LGTM (apart from aone question)
| - mons down | ||
| - flag\(s\) set | ||
| - out of quorum | ||
| - PG_ |
There was a problem hiding this comment.
Is that one restrictive enough?
There was a problem hiding this comment.
Thrash tests really trigger a lot of pg related warnings. @ronen-fr Is there one specifically you want to disallow?
There was a problem hiding this comment.
@ronen-fr I think Sam's comment makes sense. Let's keep it as is, especially since we whitelist that exact string in many of our other tests:
$ git grep "PG_"
basic/tasks/rados_api_tests.yaml: - \(PG_AVAILABILITY\)
basic/tasks/rados_api_tests.yaml: - \(PG_DEGRADED\)
basic/tasks/rados_cls_all.yaml: - \(PG_AVAILABILITY\)
basic/tasks/rados_python.yaml: - \(PG_
basic/tasks/repair_test.yaml: - \(PG_
basic/tasks/scrub_test.yaml: - \(PG_
dashboard/tasks/dashboard.yaml: - \(PG_
mgr/tasks/crash.yaml: - \(PG_
mgr/tasks/failover.yaml: - \(PG_
mgr/tasks/insights.yaml: - \(PG_
mgr/tasks/module_selftest.yaml: - \(PG_
mgr/tasks/progress.yaml: - \(PG_
mgr/tasks/prometheus.yaml: - \(PG_
mgr/tasks/workunits.yaml: - \(PG_
monthrash/ceph.yaml:# slow mons -> slow peering -> PG_AVAILABILITY
monthrash/ceph.yaml: - \(PG_AVAILABILITY\)
monthrash/workloads/rados_api_tests.yaml: - \(PG_
monthrash/workloads/rados_mon_workunits.yaml: - \(PG_
multimon/tasks/mon_clock_with_skews.yaml: - \(PG_
multimon/tasks/mon_recovery.yaml: - \(PG_AVAILABILITY\)
objectstore/backends/ceph_objectstore_tool.yaml: - \(PG_
perf/ceph.yaml: - \(PG_
rest/mgr-restful.yaml: - \(PG_
singleton-bluestore/all/cephtool.yaml: - \(PG_
singleton-bluestore/all/cephtool.yaml: - \(SMALLER_PG_NUM\)
singleton-nomsgr/all/balancer.yaml: - \(PG_AVAILABILITY\)
|
|
|
Changes look fine, but waiting for RCA on fs related issue I detailed in https://trello.com/c/3cEnuGqr/1952-wip-yuri10-testing-2024-02-08-0854-pacific. On PTO today - will have a look tomorrow. |
bedfc49 to
41aa181
Compare
|
Made some final adjustments for the rados suite based on latest test results. All results found here: https://pulpito.ceph.com/?branch=wip-yuri10-testing-2024-02-08-0854-pacific All expected warnings on the core side have been addressed (unless there's something I missed due to a nondeterministic test scenario). Remaining warnings are from MDS or Cephadm daemons. Will have a final summary posted soon. |
|
Failures look acceptable on the core side: https://tracker.ceph.com/projects/rados/wiki/PACIFIC There were some new warnings from cephadm, but @adk3798 had a look and didn't view them as problematic to testing the release, so I simply raised some new tickets to track them here: @vshankar on the latest run (https://pulpito.ceph.com/lflores-2024-02-15_17:32:20-rados-wip-yuri10-testing-2024-02-08-0854-pacific-distro-default-smithi/), I see these new MDS/filesystem warnings. Do you want to take care of whitelisting those in this PR, or raise tracker tickets and handle them separately? |
I'm going through the failures now - If they aren't related to any underlying cephfs issue, we can add those to ignore list in this change.
test_nfs.py::test_cluster_set_reset_user_config() creates a cephfs volume @adk3798 I don't see
See: #55601 (review) (under discussion, but looks like we might have to silence this warning)
All warnings seem like a fallout from #54312 to me. |
We'll want to tackle that separately since it's about time we fix such unnecessary warnings since this also shows up in main branch runs. So, I'll create a tracker for that. As far as this PR is concerned, we have two options:
If time permits, can we add these warning to ignore list? @ljflores |
Yeah, if cephadm deployed it, it should have run
We scale the fs down to a single MDS during upgrade, so this warning is expected I think and is just popping up now because of the change to get log scraping working as you were thinking |
|
Adding to ignorelist... |
41aa181 to
6472e62
Compare
vshankar
left a comment
There was a problem hiding this comment.
Thx @ljflores - LGTM.
Need to run failed tests from https://trello.com/c/3cEnuGqr/1952-wip-yuri10-testing-2024-02-08-0854-pacific
…ents Semi-backport of 00fc796. Some changes had to be made though for yaml files and warnings that are specific to pacific. Fixes: https://tracker.ceph.com/issues/64343 Signed-off-by: Laura Flores <lflores@ibm.com>
6472e62 to
275f1a4
Compare
|
Latest test results are pretty clean, barring accepted cephadm warnings and one more FS warning. I've tracked the FS warning here (https://tracker.ceph.com/issues) so final QA isn't blocked any longer for 16.2.15, but so we'll be able to use it in the result summary. |
| - \(CACHE_POOL_NO_HIT_SET\) | ||
| - \(PG_ | ||
| - \(OSD_ | ||
| - mons down: |
There was a problem hiding this comment.
So this causes https://tracker.ceph.com/issues/64452
I don't think we need a colon behind mons down
Semi-backport of 00fc796. (#55507) Some changes had to be made though for yaml files and warnings that are specific to pacific.
The motivation is that we are still testing stuff for pacific, i.e. in https://trello.com/c/3cEnuGqr/1952-wip-yuri10-testing-2024-02-08-0854-pacific, so we'll need clean results.
Fixes: https://tracker.ceph.com/issues/64343
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e