Skip to content

qa/distros: bump container host distros from centos 8->9#53901

Merged
cbodley merged 10 commits intoceph:mainfrom
cbodley:wip-qa-container-distros-s
Feb 1, 2024
Merged

qa/distros: bump container host distros from centos 8->9#53901
cbodley merged 10 commits intoceph:mainfrom
cbodley:wip-qa-container-distros-s

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Oct 9, 2023

  • bump the centos versions under qa/distros/podman/ and qa/distros/container-hosts/ and update symlinks under qa/suites/
  • remove references to rhel as a container host. only seemed to effect the fs/workload suite

why? we're going to stop building ceph packages for centos 8 and rhel for the squid release. these distros were removed from most suites in #53517. i didn't touch container host distros there, but testing showed that jobs relying on the container host distros also installed ceph packages. these jobs would fail once we stop building them

so we either need to stop using these old distros as the container host, or change the suites to stop installing ceph packages there. this pr takes the former (easier) approach

Show available Jenkins commands
  • jenkins retest this please
  • jenkins test classic perf
  • jenkins test crimson perf
  • jenkins test signed
  • jenkins test make check
  • jenkins test make check arm64
  • jenkins test submodules
  • jenkins test dashboard
  • jenkins test dashboard cephadm
  • jenkins test api
  • jenkins test docs
  • jenkins render docs
  • jenkins test ceph-volume all
  • jenkins test ceph-volume tox
  • jenkins test windows

Copy link
Contributor

@adk3798 adk3798 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container_hosts also has an ubuntu 20.04 that I see you removed in your other PR. Do we want to remove that here as well?

@cbodley cbodley force-pushed the wip-qa-container-distros-s branch from c98109f to 7d90c18 Compare October 10, 2023 14:38
@cbodley cbodley requested a review from a team as a code owner October 10, 2023 14:38
@github-actions github-actions bot added the rook label Oct 10, 2023
@cbodley
Copy link
Contributor Author

cbodley commented Oct 19, 2023

@adk3798 i believe that our shaman builds for centos8 are the only ones that build container images. are you familiar with how that works? can you help me figure out what needs to happen to move that to the centos9 builds?

edit: i don't think that will block progress on this pr, but we'll need to switch the base container image to centos9 before we can stop doing shaman builds for these old distros

@adk3798
Copy link
Contributor

adk3798 commented Oct 19, 2023

@adk3798 i believe that our shaman builds for centos8 are the only ones that build container images. are you familiar with how that works? can you help me figure out what needs to happen to move that to the centos9 builds?

edit: i don't think that will block progress on this pr, but we'll need to switch the base container image to centos9 before we can stop doing shaman builds for these old distros

I'm pretty it's some script from https://github.com/ceph/ceph-build that is doing the centos 8 builds clones the ceph-container repo and runs https://github.com/ceph/ceph-container/blob/main/contrib/build-push-ceph-container-imgs.sh, but I'm not sure where in the ceph-build repo we'd have to update to get it to do that for the centos 9 builds.

@dmick
Copy link
Member

dmick commented Oct 19, 2023

See http://wiki.front.sepia.ceph.com/doku.php?id=production:jenkins.ceph.com#how_jobs_are_run for a top-level description of the builds. The actual building is done with a script from ceph-container, which is why those jobs also clone that repo. There might also be some code in the ceph-build job .yml itself for limiting containers to centos8; I can't recall. I remember being sad that there needed to be more than one test. It won't be that hard to unravel.

@dmick
Copy link
Member

dmick commented Oct 19, 2023

@cbodley
Copy link
Contributor Author

cbodley commented Nov 7, 2023

@yuriw could i please ask your help with qa on this one? i believe it just needs fs, rados, and orch suites

@cbodley
Copy link
Contributor Author

cbodley commented Nov 10, 2023

@adk3798 have you guys put any more thought into building centos9 containers?

@dmick
Copy link
Member

dmick commented Nov 11, 2023

fwiw, I've been experimenting with ceph-container lately for other purposes (trying to allow for building "staging" containers when preparing a Ceph release) and may have a better idea of how to go about adding CentOS9/making it the default

@cbodley
Copy link
Contributor Author

cbodley commented Nov 13, 2023

Yuri ran several suites against main over the weekend in https://pulpito.ceph.com/?branch=wip-yuri5-testing-2023-11-10-0828. there were several failures, but the most consistent one was Command failed on smithi186 with status 1: 'TESTDIR=/home/ubuntu/cephtest bash -s' (example teuthology.log):

2023-11-11T17:22:03.248 DEBUG:teuthology.orchestra.run.smithi087:> TESTDIR=/home/ubuntu/cephtest bash -s
2023-11-11T17:22:03.884 INFO:teuthology.orchestra.run.smithi087.stdout:Last metadata expiration check: 0:00:54 ago on Sat 11 Nov 2023 05:21:09 PM UTC.
2023-11-11T17:22:03.973 INFO:teuthology.orchestra.run.smithi087.stderr:Unable to resolve argument container-tools
2023-11-11T17:22:04.004 INFO:teuthology.orchestra.run.smithi087.stderr:Error: Problems in request:
2023-11-11T17:22:04.004 INFO:teuthology.orchestra.run.smithi087.stderr:missing groups or modules: container-tools
2023-11-11T17:22:04.026 DEBUG:teuthology.orchestra.run:got remote process result: 1

i tried removing this container-tools module stuff from these container-distro yaml files (with this suite-branch) but it goes on to fail later with:

2023-11-13T15:48:03.093 INFO:tasks.cephadm:Pulling image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:298abd03623204cc79b0c92c1ba318ccde53812a on all hosts...
2023-11-13T15:48:03.093 DEBUG:teuthology.orchestra.run.smithi141:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:298abd03623204cc79b0c92c1ba318ccde53812a pull
2023-11-13T15:48:03.109 DEBUG:teuthology.orchestra.run.smithi160:> sudo /home/ubuntu/cephtest/cephadm --image quay-quay-quay.apps.os.sepia.ceph.com/ceph-ci/ceph:298abd03623204cc79b0c92c1ba318ccde53812a pull
2023-11-13T15:48:03.619 INFO:teuthology.orchestra.run.smithi160.stderr:Traceback (most recent call last):
2023-11-13T15:48:03.620 INFO:teuthology.orchestra.run.smithi160.stderr:  File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main
2023-11-13T15:48:03.625 INFO:teuthology.orchestra.run.smithi160.stderr:    return _run_code(code, main_globals, None,
2023-11-13T15:48:03.625 INFO:teuthology.orchestra.run.smithi160.stderr:  File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
2023-11-13T15:48:03.625 INFO:teuthology.orchestra.run.smithi160.stderr:    exec(code, run_globals)
2023-11-13T15:48:03.625 INFO:teuthology.orchestra.run.smithi160.stderr:  File "/tmp/tmp0xxf0e3i.cephadm.build/__main__.py", line 7643, in <module>
2023-11-13T15:48:03.626 INFO:teuthology.orchestra.run.smithi160.stderr:  File "/tmp/tmp0xxf0e3i.cephadm.build/__main__.py", line 7629, in main
2023-11-13T15:48:03.626 INFO:teuthology.orchestra.run.smithi160.stderr:  File "/tmp/tmp0xxf0e3i.cephadm.build/cephadmlib/container_engines.py", line 137, in check_container_engine
2023-11-13T15:48:03.626 INFO:teuthology.orchestra.run.smithi160.stderr:  File "/tmp/tmp0xxf0e3i.cephadm.build/cephadmlib/container_engines.py", line 33, in get_version
2023-11-13T15:48:03.626 INFO:teuthology.orchestra.run.smithi160.stderr:  File "/tmp/tmp0xxf0e3i.cephadm.build/cephadmlib/call_wrappers.py", line 307, in call_throws
2023-11-13T15:48:03.626 INFO:teuthology.orchestra.run.smithi160.stderr:RuntimeError: Failed command: /bin/podman version --format {{.Client.Version}}: time="2023-11-13T15:48:03Z" level=error msg="reading system config \"/usr/share/containers/containers.conf\": decode configuration /usr/share/containers/containers.conf: toml: line 600 (last key \"engine.runtime\"): Key 'engine.runtime' has already been defined."

could i ask someone's help to work through these failures in the orch suite? i assume that will resolve the failures in other suites

@cbodley
Copy link
Contributor Author

cbodley commented Nov 29, 2023

could i ask someone's help to work through these failures in the orch suite? i assume that will resolve the failures in other suites

@adk3798 who would be a good person to help me follow up on this?

@adk3798
Copy link
Contributor

adk3798 commented Nov 29, 2023

could i ask someone's help to work through these failures in the orch suite? i assume that will resolve the failures in other suites

@adk3798 who would be a good person to help me follow up on this?

I can take a look and do some testing around this PR. I've never seen that error it's printing about Key 'engine.runtime' has already been defined but I assume it's because we're setting up some config file for podman with something invalid with the podman version it installs on centos 9.

@cbodley
Copy link
Contributor Author

cbodley commented Jan 19, 2024

@vshankar i've replaced qa/suites/fs/workload/0-rhel_8.yaml with 0-centos_9.stream.yaml, but i still saw a bunch of those jobs scheduled against rhel_8 in https://pulpito.ceph.com/yuriw-2024-01-18_21:19:14-fs-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/

description: fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate}
clusters/1a11s-mds-1c-client-3node conf/{client mds mon osd} mount/kclient/{base/{mount-syntax/{v1}
mount overrides/{distro/stock/{k-stock rhel_8} ms-die-on-skipped}} ms_mode/legacy
wsync/no} objectstore-ec/bluestore-comp-ec-root omap_limit/10 overrides/{cephsqlite-timeout
frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts session_timeout}
ranks/multi/{balancer/random export-check n/3 replication/default} standby-replay
tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/yes
4-flush/no 5-workunit/suites/fsx}}

the description shows that rhel 8 is sneaking back in via qa/cephfs/mount/kclient/overrides/distro/stock/rhel_8.yaml. can this be safely removed? we won't be building rhel8 packages anymore

@vshankar
Copy link
Contributor

@vshankar i've replaced qa/suites/fs/workload/0-rhel_8.yaml with 0-centos_9.stream.yaml, but i still saw a bunch of those jobs scheduled against rhel_8 in https://pulpito.ceph.com/yuriw-2024-01-18_21:19:14-fs-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/

description: fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate}
clusters/1a11s-mds-1c-client-3node conf/{client mds mon osd} mount/kclient/{base/{mount-syntax/{v1}
mount overrides/{distro/stock/{k-stock rhel_8} ms-die-on-skipped}} ms_mode/legacy
wsync/no} objectstore-ec/bluestore-comp-ec-root omap_limit/10 overrides/{cephsqlite-timeout
frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts session_timeout}
ranks/multi/{balancer/random export-check n/3 replication/default} standby-replay
tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/yes
4-flush/no 5-workunit/suites/fsx}}

the description shows that rhel 8 is sneaking back in via qa/cephfs/mount/kclient/overrides/distro/stock/rhel_8.yaml. can this be safely removed? we won't be building rhel8 packages anymore

Thx @cbodley for catching this. That yaml can be replaced by an equivalent rhel9 (9.3) yaml. I'll have a change up for that.

@vshankar
Copy link
Contributor

@vshankar i've replaced qa/suites/fs/workload/0-rhel_8.yaml with 0-centos_9.stream.yaml, but i still saw a bunch of those jobs scheduled against rhel_8 in https://pulpito.ceph.com/yuriw-2024-01-18_21:19:14-fs-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/

description: fs/workload/{0-centos_9.stream begin/{0-install 1-cephadm 2-logrotate}
clusters/1a11s-mds-1c-client-3node conf/{client mds mon osd} mount/kclient/{base/{mount-syntax/{v1}
mount overrides/{distro/stock/{k-stock rhel_8} ms-die-on-skipped}} ms_mode/legacy
wsync/no} objectstore-ec/bluestore-comp-ec-root omap_limit/10 overrides/{cephsqlite-timeout
frag ignorelist_health ignorelist_wrongly_marked_down osd-asserts session_timeout}
ranks/multi/{balancer/random export-check n/3 replication/default} standby-replay
tasks/{0-subvolume/{with-namespace-isolated} 1-check-counter 2-scrub/yes 3-snaps/yes
4-flush/no 5-workunit/suites/fsx}}

the description shows that rhel 8 is sneaking back in via qa/cephfs/mount/kclient/overrides/distro/stock/rhel_8.yaml. can this be safely removed? we won't be building rhel8 packages anymore

Update: PR #55233 should fix this automatically (its a change to fix another issue related to cephfs-shell packaging though).

@cbodley
Copy link
Contributor Author

cbodley commented Jan 22, 2024

Update: PR #55233 should fix this automatically (its a change to fix another issue related to cephfs-shell packaging though).

thanks @vshankar

i added that pr to a suite-branch based on wip-yuri8-testing-2024-01-18-0823 and scheduled a --rerun at https://pulpito.ceph.com/cbodley-2024-01-22_19:46:51-fs-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/

@ljflores
Copy link
Member

ljflores commented Jan 22, 2024

@cbodley on a second run, the failures from #53901 (comment) are gone, which is good!

I noticed two others that I hadn't noticed before, since these tests are already failing for different known reasons. Although, we should fix these new issues before the PR is merged:

  1. This dashboard test fails from missing groups or modules: nodejs:16.
    /a/yuriw-2024-01-18_21:18:17-rados-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/7521035
2024-01-19T05:00:11.136 INFO:tasks.workunit:Running workunit cephadm/test_dashboard_e2e.sh...
2024-01-19T05:00:11.137 DEBUG:teuthology.orchestra.run.smithi052:workunit test cephadm/test_dashboard_e2e.sh> mkdir -p -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && cd -- /home/ubuntu/cephtest/mnt.0/client.0/tmp && CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=0edf41a622739843d6b978b179ff3227b476dd9d TESTDIR="/home/ubuntu/cephtest" CEPH_ARGS="--cluster ceph" CEPH_ID="0" PATH=$PATH:/usr/sbin CEPH_BASE=/home/ubuntu/cephtest/clone.client.0 CEPH_ROOT=/home/ubuntu/cephtest/clone.client.0 CEPH_MNT=/home/ubuntu/cephtest/mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephadm/test_dashboard_e2e.sh
2024-01-19T05:00:11.205 INFO:tasks.workunit.client.0.smithi052.stderr:+++ dirname /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephadm/test_dashboard_e2e.sh
2024-01-19T05:00:11.206 INFO:tasks.workunit.client.0.smithi052.stderr:++ cd /home/ubuntu/cephtest/clone.client.0/qa/workunits/cephadm
2024-01-19T05:00:11.206 INFO:tasks.workunit.client.0.smithi052.stderr:++ pwd
2024-01-19T05:00:11.206 INFO:tasks.workunit.client.0.smithi052.stderr:+ SCRIPT_DIR=/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephadm
2024-01-19T05:00:11.206 INFO:tasks.workunit.client.0.smithi052.stderr:+ DASHBOARD_FRONTEND_DIR=/home/ubuntu/cephtest/clone.client.0/qa/workunits/cephadm/../../../src/pybind/mgr/dashboard/frontend
2024-01-19T05:00:11.207 INFO:tasks.workunit.client.0.smithi052.stderr:+ '[' -z '' ']'
2024-01-19T05:00:11.207 INFO:tasks.workunit.client.0.smithi052.stderr:+ SUDO=sudo
2024-01-19T05:00:11.207 INFO:tasks.workunit.client.0.smithi052.stderr:+ install_common
2024-01-19T05:00:11.207 INFO:tasks.workunit.client.0.smithi052.stderr:+ NODEJS_VERSION=16
2024-01-19T05:00:11.207 INFO:tasks.workunit.client.0.smithi052.stderr:+ grep -q debian /etc/centos-release /etc/os-release /etc/redhat-release /etc/system-release
2024-01-19T05:00:11.207 INFO:tasks.workunit.client.0.smithi052.stderr:+ grep -q rhel /etc/centos-release /etc/os-release /etc/redhat-release /etc/system-release
2024-01-19T05:00:11.208 INFO:tasks.workunit.client.0.smithi052.stderr:+ sudo yum module -y enable nodejs:16
2024-01-19T05:00:11.775 INFO:tasks.workunit.client.0.smithi052.stdout:Last metadata expiration check: 0:08:05 ago on Fri 19 Jan 2024 04:52:06 AM UTC.
2024-01-19T05:00:11.904 INFO:tasks.workunit.client.0.smithi052.stderr:Error: Problems in request:
2024-01-19T05:00:11.904 INFO:tasks.workunit.client.0.smithi052.stderr:missing groups or modules: nodejs:16
2024-01-19T05:00:11.927 DEBUG:teuthology.orchestra.run:got remote process result: 1

On a centos 9 container, I ran the following command, which shows that nodejs 16 is not available for centos9 (only 18 or 20).

# yum module list nodejs
Last metadata expiration check: 0:03:21 ago on Mon Jan 22 20:20:34 2024.
CentOS Stream 9 - AppStream
Name                                     Stream                                   Profiles                                                               Summary                                            
nodejs                                   18                                       common [d], development, minimal, s2i                                  Javascript runtime                                 
nodejs                                   20                                       common [d], development, minimal, s2i                                  Javascript runtime                                 

Running yum module -y enable nodejs:18 in my container succeeded, so we need to modify the test to enable either 18 or 20. Pretty sure this can just be fixed with a small commit by modifying the dashboard script (https://github.com/ceph/ceph/blob/main/qa/workunits/cephadm/test_dashboard_e2e.sh) and rerunning the failed jobs with a new qa suite.

I created this commit to address the issue, which I verified on a centos 9 container if you'd like to use it: c18b763

  1. The second failure I noticed was missing groups or modules: container-tools, although this seems to have already been discussed in qa/distros: bump container host distros from centos 8->9 #53901 (comment). Perhaps @adk3798's fix in the orch suite needs to also be applied to the rados suite?
    /a/yuriw-2024-01-18_21:18:17-rados-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/7521377
    /a/yuriw-2024-01-18_21:18:17-rados-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/7520845
2024-01-19T11:09:43.032 INFO:teuthology.run_tasks:Running task pexec...
2024-01-19T11:09:43.041 INFO:teuthology.task.pexec:Executing custom commands...
2024-01-19T11:09:43.041 INFO:teuthology.task.pexec:Running commands on host ubuntu@smithi080.front.sepia.ceph.com
2024-01-19T11:09:43.041 DEBUG:teuthology.orchestra.run.smithi080:> TESTDIR=/home/ubuntu/cephtest bash -s
2024-01-19T11:09:43.659 INFO:teuthology.orchestra.run.smithi080.stdout:Last metadata expiration check: 0:00:47 ago on Fri 19 Jan 2024 11:08:56 AM UTC.
2024-01-19T11:09:43.755 INFO:teuthology.orchestra.run.smithi080.stderr:Unable to resolve argument container-tools
2024-01-19T11:09:43.789 INFO:teuthology.orchestra.run.smithi080.stderr:Error: Problems in request:
2024-01-19T11:09:43.789 INFO:teuthology.orchestra.run.smithi080.stderr:missing groups or modules: container-tools
2024-01-19T11:09:43.815 DEBUG:teuthology.orchestra.run:got remote process result: 1
2024-01-19T11:09:43.816 ERROR:teuthology.run_tasks:Saw exception from tasks.
Traceback (most recent call last):

Otherwise, things are looking good! Current summary: https://tracker.ceph.com/projects/rados/wiki/MAIN#httpstrellocomcYjrx9ygD1911-wip-yuri8-testing-2024-01-18-0823-old-wip-yuri8-testing-2023-12-15-0911

Signed-off-by: Laura Flores <lflores@ibm.com>
@vshankar
Copy link
Contributor

Update: PR #55233 should fix this automatically (its a change to fix another issue related to cephfs-shell packaging though).

thanks @vshankar

i added that pr to a suite-branch based on wip-yuri8-testing-2024-01-18-0823 and scheduled a --rerun at https://pulpito.ceph.com/cbodley-2024-01-22_19:46:51-fs-wip-yuri8-testing-2024-01-18-0823-distro-default-smithi/

I'll have a look at the run tomorrow when its done (its nearly done btw)...

@ljflores
Copy link
Member

@cbodley was there a resolution for missing groups or modules: container-tools? That's the last thing blocking things from the rados side.

@ljflores
Copy link
Member

jenkins retest this please

@cbodley
Copy link
Contributor Author

cbodley commented Jan 24, 2024

@cbodley was there a resolution for missing groups or modules: container-tools? That's the last thing blocking things from the rados side.

no, i haven't had a chance yet to figure out what's missing in rados

@vshankar
Copy link
Contributor

vshankar commented Jan 24, 2024

@cbodley - There are a few more references to centos8 in fs suite. Need to also fix:

qa/suites/fs/upgrade/featureful_client/old_client/centos_8.stream.yaml
qa/suites/fs/upgrade/featureful_client/upgraded_client/centos_8.stream.yaml
qa/suites/fs/upgrade/upgraded_client/centos_8.stream.yaml
qa/suites/fs/upgrade/nofs/centos_8.stream.yaml

.. and also https://github.com/ceph/ceph/pull/55233/files#r1465005933 @batrick

@cbodley
Copy link
Contributor Author

cbodley commented Jan 24, 2024

i pushed an update that keeps the old centos8_* container distro files, and reverts the changes under qa/suites/rados/thrash-old-clients/0-distro$ so it continues to run on centos8

oops, i think my change there broke @adk3798's earlier fix for missing groups or modules: container-tools. we need to keep the old centos8 yamls like qa/distros/container-hosts/centos_8.stream_container_tools.yaml for use in those thrash-old-clients tests

however, there's a qa/suites/orch/cephadm/workunits/0-distro where 0-distro -> .qa/distros/container-hosts so cephadm is still trying to run on centos8. the rados suites includes this too under qa/suites/rados/cephadm/workunits/0-distro

@adk3798 how would you like to manage that set of container distros you run against? should that symlink point to a separate qa/distros/supported-container-hosts/ directory that excludes the centos8 files? that'd be analogous to how we have qa/distros/supported/ as a subset of qa/distros/all/

@cbodley
Copy link
Contributor Author

cbodley commented Jan 24, 2024

@cbodley - There are a few more references to centos8 in fs suite. Need to also fix:

qa/suites/fs/upgrade/featureful_client/old_client/centos_8.stream.yaml
qa/suites/fs/upgrade/featureful_client/upgraded_client/centos_8.stream.yaml
qa/suites/fs/upgrade/upgraded_client/centos_8.stream.yaml
qa/suites/fs/upgrade/nofs/centos_8.stream.yaml

thanks @vshankar. it looks like all of those upgrade tests start on old releases (i see nautilus, octopus, and pacific) which don't have centos9 packages, so they'd have to start on centos8. but once we stop building centos8 packages for main/squid, we won't be able to install the upgraded packages there

for squid we'll only support upgrades from quincy and reef, both of which can start on centos9. so if you're able to drop these old releases, you could replace the centos_8.stream.yaml links with centos_9.stream.yaml

but if you need to keep testing these old clients, you'll probably need to use the rados/thrash-old-clients strategy where the clients are installed on a centos8 container distro, with upgraded servers running in centos9 containers

@vshankar
Copy link
Contributor

@cbodley - There are a few more references to centos8 in fs suite. Need to also fix:
qa/suites/fs/upgrade/featureful_client/old_client/centos_8.stream.yaml
qa/suites/fs/upgrade/featureful_client/upgraded_client/centos_8.stream.yaml
qa/suites/fs/upgrade/upgraded_client/centos_8.stream.yaml
qa/suites/fs/upgrade/nofs/centos_8.stream.yaml

thanks @vshankar. it looks like all of those upgrade tests start on old releases (i see nautilus, octopus, and pacific) which don't have centos9 packages, so they'd have to start on centos8. but once we stop building centos8 packages for main/squid, we won't be able to install the upgraded packages there

for squid we'll only support upgrades from quincy and reef, both of which can start on centos9. so if you're able to drop these old releases, you could replace the centos_8.stream.yaml links with centos_9.stream.yaml

The basis for testing older release is to catch any bugs before changes hit downstream.

but if you need to keep testing these old clients, you'll probably need to use the rados/thrash-old-clients strategy where the clients are installed on a centos8 container distro, with upgraded servers running in centos9 containers

ACK. That's something I haven't looked into yet. Thanks for that info!

So, as far as this change is concerned, the only pending item from cephfs pov is to test with #55233. I'm on it.

@cbodley
Copy link
Contributor Author

cbodley commented Jan 25, 2024

@adk3798 how would you like to manage that set of container distros you run against? should that symlink point to a separate qa/distros/supported-container-hosts/ directory that excludes the centos8 files? that'd be analogous to how we have qa/distros/supported/ as a subset of qa/distros/all/

@adk3798 i pushed a commit for this, does it look ok?

@adk3798
Copy link
Contributor

adk3798 commented Jan 29, 2024

@adk3798 how would you like to manage that set of container distros you run against? should that symlink point to a separate qa/distros/supported-container-hosts/ directory that excludes the centos8 files? that'd be analogous to how we have qa/distros/supported/ as a subset of qa/distros/all/

@adk3798 i pushed a commit for this, does it look ok?

The setup of supported-container-hosts itself looks fine, but do we need to update all the things pointing to regular container-hosts to point there instead? For example, I know orch/cephadm itself has

./suites/orch/cephadm/smoke-small/0-distro
./suites/orch/cephadm/workunits/0-distro
./suites/orch/cephadm/smoke-roleless/0-distro
./suites/orch/cephadm/with-work/0-distro
./suites/orch/cephadm/osds/0-distro
./suites/orch/cephadm/thrash/0-distro
./suites/orch/cephadm/smoke/0-distro

and a number of them (but not all I think) just symlink to container-hosts. Same goes for some of the 0-random-distro$ like the one in upgrade/quincy-x/parallel/ and some 0-distro that are outside of suites/orch/cephadm. If we don't update those as well, they could still pick centos 8 for their tests I think.

Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley cbodley force-pushed the wip-qa-container-distros-s branch from 9f7889d to 849a58b Compare January 30, 2024 15:03
@cbodley
Copy link
Contributor Author

cbodley commented Jan 30, 2024

thanks @adk3798, i amended that last commit to fix the remaining symlinks to qa/distros/container-hosts

there was a single qa/suites/upgrade/telemetry-upgrade/pacific-x/0-random-distro$ that won't be able to use centos9, so i left it as-is. someone will need to remove upgrade/telemetry-upgrade/pacific-x (and add a reef-x?) for squid

@ljflores
Copy link
Member

ljflores commented Feb 1, 2024

@ljflores
Copy link
Member

ljflores commented Feb 1, 2024

jenkins test docs

@cbodley
Copy link
Contributor Author

cbodley commented Feb 1, 2024

@vshankar i think this has all the approvals it needs to merge; do we need to wait for #55233?

@vshankar
Copy link
Contributor

vshankar commented Feb 1, 2024

@vshankar i think this has all the approvals it needs to merge; do we need to wait for #55233?

This is good to merge. My tests ran fine with #55233. Couldn't get time to post the results here. Please go ahead and merge.

Edit: I'll merge the cephfs qa change tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants