Skip to content

qa/distros: add ubuntu 22 as supported distro#49443

Merged
cbodley merged 4 commits intoceph:mainfrom
cbodley:wip-qa-supported-distros
Mar 17, 2023
Merged

qa/distros: add ubuntu 22 as supported distro#49443
cbodley merged 4 commits intoceph:mainfrom
cbodley:wip-qa-supported-distros

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Dec 14, 2022

we're doing builds for these distros. if we're going to support them in reef, we need to start the testing

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

@cbodley cbodley added the tests label Dec 14, 2022
@cbodley cbodley force-pushed the wip-qa-supported-distros branch from c5b1662 to 7efe384 Compare December 14, 2022 21:45
@cbodley cbodley force-pushed the wip-qa-supported-distros branch from 7efe384 to 061e0cb Compare January 30, 2023 17:18
@cbodley cbodley force-pushed the wip-qa-supported-distros branch from 061e0cb to c422012 Compare February 22, 2023 18:26
@cbodley cbodley force-pushed the wip-qa-supported-distros branch from c422012 to a1d9efc Compare March 3, 2023 19:42
@yuriw
Copy link
Contributor

yuriw commented Mar 8, 2023

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Mar 8, 2023

centos9 testing seems to be blocked on #47501 at the moment; would it help if i split the centos changes out to another PR so we can try to test/merge the ubuntu changes?

cbodley added 3 commits March 8, 2023 10:28
Signed-off-by: Casey Bodley <cbodley@redhat.com>
…yaml

Signed-off-by: Casey Bodley <cbodley@redhat.com>
…_20.04.yaml

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-qa-supported-distros branch from a1d9efc to 62e520c Compare March 8, 2023 15:38
@cbodley cbodley changed the title qa/distros: add centos stream 9 and ubuntu 22 as supported distros qa/distros: add ubuntu 22 as supported distro Mar 8, 2023
@cbodley
Copy link
Contributor Author

cbodley commented Mar 8, 2023

the smoke suite was mostly successful on ubuntu22: https://pulpito.ceph.com/cbodley-2023-03-08_18:24:26-smoke-main-distro-default-smithi/

only one cluster [WRN] Health check failed: 1 pool(s) do not have an application enabled (POOL_APP_NOT_ENABLED)

@cbodley
Copy link
Contributor Author

cbodley commented Mar 10, 2023

jenkins test api

@yuriw
Copy link
Contributor

yuriw commented Mar 11, 2023

@ljflores
Copy link
Member

@cbodley @neha-ojha here is the rados suite review:

https://pulpito.ceph.com/?sha1=63c2dce869c8f63c396d3b6505a21c44088ff500

Failures, unrelated:
1. https://tracker.ceph.com/issues/58969
2. https://tracker.ceph.com/issues/58585
3. https://tracker.ceph.com/issues/57755
4. https://tracker.ceph.com/issues/49287
5. https://tracker.ceph.com/issues/58560
6. https://tracker.ceph.com/issues/49727

Details:
1. test_full_health: _ValError: In input['fs_map']['filesystems'][0]['mdsmap']: missing keys: {'max_xattr_size'} - Ceph - Mgr - Dashboard
2. rook: failed to pull kubelet image - Ceph - Orchestrator
3. task/test_orch_cli: test_cephfs_mirror times out - Ceph - Orchestrator
4. SELinux Denials during cephadm/workunits/test_cephadm - Ceph - Orchestrator
5. test_envlibrados_for_rocksdb.sh failed to subscribe to repo - Infrastructure
6. lazy_omap_stats_test: "ceph osd deep-scrub all" hangs - Ceph - RADOS

The only new failure that caught my attention was "1678520463.0682712 osd.2 (osd.2) 4 : cluster [WRN] WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate event" in cluster log, which appeared twice on 22.04 tests:

/a/yuriw-2023-03-10_22:46:37-rados-reef-distro-default-smithi/7203358

2023-03-11T10:28:47.074 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.263 INFO:teuthology.orchestra.run.smithi149.stdout:1678530338.9736485 osd.3 (osd.3) 104 : cluster [WRN] WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate event
2023-03-11T10:28:47.263 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-03-11T10:28:47.264 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.280 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.338 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.392 INFO:teuthology.orchestra.run.smithi149.stdout:1678530338.9736485 osd.3 (osd.3) 104 : cluster [WRN] WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate event

This doesn't seem like it would be caused by a change in distro, but @cbodley please take a look. Otherwise, I did not see anything related.

@vshankar
Copy link
Contributor

Hmmm... fs suite has failures which I haven't seen before, so those could be related. I'll have a look.

@vshankar
Copy link
Contributor

@yuriw - were you using a custom test suite (--suite-repo, --suite-branch) by any chance?

FWIW, could we run the fs suite with the latest main (and changes in this PR)?

@ljflores
Copy link
Member

@yuriw - were you using a custom test suite (--suite-repo, --suite-branch) by any chance?

FWIW, could we run the fs suite with the latest main (and changes in this PR)?

@vshankar yes, a custom --suite-branch was used. (cbodley:wip-qa-supported-distros)

@vshankar
Copy link
Contributor

@yuriw - were you using a custom test suite (--suite-repo, --suite-branch) by any chance?
FWIW, could we run the fs suite with the latest main (and changes in this PR)?

@vshankar yes, a custom --suite-branch was used. (cbodley:wip-qa-supported-distros)

OK. That might explain some of the failures, but not all. I'll rerun the failed jobs in the fs suite and see how it looks.

@ljflores
Copy link
Member

@cbodley @neha-ojha here is the rados suite review:

https://pulpito.ceph.com/?sha1=63c2dce869c8f63c396d3b6505a21c44088ff500

Failures, unrelated: 1. https://tracker.ceph.com/issues/58969 2. https://tracker.ceph.com/issues/58585 3. https://tracker.ceph.com/issues/57755 4. https://tracker.ceph.com/issues/49287 5. https://tracker.ceph.com/issues/58560 6. https://tracker.ceph.com/issues/49727

Details: 1. test_full_health: _ValError: In input['fs_map']['filesystems'][0]['mdsmap']: missing keys: {'max_xattr_size'} - Ceph - Mgr - Dashboard 2. rook: failed to pull kubelet image - Ceph - Orchestrator 3. task/test_orch_cli: test_cephfs_mirror times out - Ceph - Orchestrator 4. SELinux Denials during cephadm/workunits/test_cephadm - Ceph - Orchestrator 5. test_envlibrados_for_rocksdb.sh failed to subscribe to repo - Infrastructure 6. lazy_omap_stats_test: "ceph osd deep-scrub all" hangs - Ceph - RADOS

The only new failure that caught my attention was "1678520463.0682712 osd.2 (osd.2) 4 : cluster [WRN] WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate event" in cluster log, which appeared twice on 22.04 tests:

/a/yuriw-2023-03-10_22:46:37-rados-reef-distro-default-smithi/7203358

2023-03-11T10:28:47.074 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[ERR\]|\[WRN\]|\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.263 INFO:teuthology.orchestra.run.smithi149.stdout:1678530338.9736485 osd.3 (osd.3) 104 : cluster [WRN] WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate event
2023-03-11T10:28:47.263 WARNING:tasks.ceph:Found errors (ERR|WRN|SEC) in cluster log
2023-03-11T10:28:47.264 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[SEC\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.280 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[ERR\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.338 DEBUG:teuthology.orchestra.run.smithi149:> sudo egrep '\[WRN\]' /var/log/ceph/ceph.log | egrep -v '\(POOL_APP_NOT_ENABLED\)' | egrep -v '\(OSDMAP_FLAGS\)' | egrep -v '\(OSD_' | egrep -v '\(OBJECT_' | egrep -v '\(PG_' | egrep -v '\(SLOW_OPS\)' | egrep -v 'overall HEALTH' | egrep -v 'slow request' | egrep -v '\(MDS_ALL_DOWN\)' | egrep -v '\(MDS_UP_LESS_THAN_MAX\)' | head -n 1
2023-03-11T10:28:47.392 INFO:teuthology.orchestra.run.smithi149.stdout:1678530338.9736485 osd.3 (osd.3) 104 : cluster [WRN] WaitReplicas::react(const DigestUpdate&): Unexpected DigestUpdate event

This doesn't seem like it would be caused by a change in distro, but @cbodley please take a look. Otherwise, I did not see anything related.

I reran the jobs that failed from this bug, and they passed on a second round. So, I don't believe this is directly related to ubuntu 22.04.

I created a tracker for the bug here, for future investigation: https://tracker.ceph.com/issues/59049

@cbodley rados approved

Same as in commit 2de2146 ("qa/workunits/rbd: use bionic version
of qemu-iotests for focal").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
@idryomov idryomov requested a review from a team as a code owner March 16, 2023 12:39
@github-actions github-actions bot added the rbd label Mar 16, 2023
@idryomov
Copy link
Contributor

One of the RBD failures was real (caused by this PR). I have pushed a fix to Casey's branch.

A rerun succeeded: https://pulpito.ceph.com/dis-2023-03-15_16:39:35-rbd-reef-distro-default-smithi/

@vshankar
Copy link
Contributor

Apologies for the delay in looking into fs suite failures - I'm on it today.

@vshankar
Copy link
Contributor

I reran the fs suite failures here: https://pulpito.ceph.com/vshankar-2023-03-17_02:45:20-fs-reef-testing-default-smithi/ (10 jobs vs 25 failed,dead jobs from https://pulpito.ceph.com/yuriw-2023-03-10_22:52:36-fs-reef-distro-default-smithi/).

From the 10 jobs, most of them are knows issues. The failure:

qa suite has test code but the ceph-mds binary does not have the required functionality. Looks like the custom suite branch was forked from main? Also the missing 15 jobs did not get scheduled due to the same reason? @cbodley

@cbodley
Copy link
Contributor Author

cbodley commented Mar 17, 2023

@vshankar you're right that this branch targets main

@vshankar
Copy link
Contributor

@vshankar you're right that this branch targets main

In that case, I'm good with merging this change. I'll keep an eye on the nightly runs for reef for any new failures in fs suite.

@cbodley
Copy link
Contributor Author

cbodley commented Mar 17, 2023

jenkins test make check

@cbodley
Copy link
Contributor Author

cbodley commented Mar 17, 2023

jenkins test api

@idryomov
Copy link
Contributor

@vshankar you're right that this branch targets main

But the intent is to cherry pick this to reef, right?

@vshankar
Copy link
Contributor

@vshankar you're right that this branch targets main

But the intent is to cherry pick this to reef, right?

Right. The qa suite which was run was the main branch qa suite plus this change and the ceph binaries were from reef branch. Ideally, reef binaries + reef qa suite would have been better. Not sure why that wasn't done.

@cbodley
Copy link
Contributor Author

cbodley commented Mar 17, 2023

Ideally, reef binaries + reef qa suite would have been better. Not sure why that wasn't done.

sorry. i've only been testing this against the rgw suite on main, and Yuri has only been running baselines for reef. that was close enough for most suites

if you'd like to do more reef testing in the meantime, i pushed a reef-based version of this branch to https://github.com/cbodley/ceph/commits/wip-qa-reef-ubuntu22

@cbodley
Copy link
Contributor Author

cbodley commented Mar 17, 2023

jenkins test api

@cbodley cbodley merged commit ee1ae6c into ceph:main Mar 17, 2023
@cbodley cbodley deleted the wip-qa-supported-distros branch March 18, 2023 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants