Skip to content

qa/rgw/upgrade: exclude ceph-osd-classic/crimson on squid and tentacle#66341

Merged
cbodley merged 2 commits intoceph:mainfrom
cbodley:wip-73943
Dec 12, 2025
Merged

qa/rgw/upgrade: exclude ceph-osd-classic/crimson on squid and tentacle#66341
cbodley merged 2 commits intoceph:mainfrom
cbodley:wip-73943

Conversation

@cbodley
Copy link
Contributor

@cbodley cbodley commented Nov 20, 2025

split packages for ceph-osd-classic and ceph-osd-crimson were added on main, but don't exist on squid and tentacle. exclude these packages from their install tasks

Fixes: https://tracker.ceph.com/issues/73943

Show available Jenkins commands

You must only issue one Jenkins command per-comment. Jenkins does not understand
comments with more than one command.

@ivancich
Copy link
Member

Thanks for looking into this, @cbodley .

@cbodley
Copy link
Contributor Author

cbodley commented Nov 20, 2025

qa https://pulpito.ceph.com/cbodley-2025-11-20_17:55:41-rgw:upgrade-main-distro-default-gibba/
rerun https://pulpito.ceph.com/cbodley-2025-11-20_18:41:44-rgw:upgrade-main-distro-default-gibba/

3 of 4 jobs passed, but the squid job for centos failed both times with "timeout expired in wait_for_all_osds_up"

@cbodley
Copy link
Contributor Author

cbodley commented Dec 5, 2025

rebased, but rerun https://pulpito.ceph.com/cbodley-2025-12-05_20:42:32-rgw:upgrade-main-distro-default-smithi/ is still failing with:

timeout expired in wait_for_all_osds_up

from https://qa-proxy.ceph.com/teuthology/cbodley-2025-12-05_20:42:32-rgw:upgrade-main-distro-default-smithi/8642596/teuthology.log, the yum upgrade command logs some errors when installing the new ceph-osd-classic and ceph-osd-crimson packages:

2025-12-05T21:09:48.089 DEBUG:teuthology.orchestra.run.smithi196:> sudo yum -y upgrade ceph-radosgw ceph-test ceph ceph-base cephadm ceph-immutable-object-cache ceph-mgr ceph-mgr-dashboard ceph-mgr-diskprediction-local ceph-mgr-rook ceph-mgr-cephadm ceph-osd ceph-osd-classic ceph-fuse ceph-volume librados-devel libcephfs2 libcephfs-devel librados2 librbd1 python3-rados python3-rgw python3-cephfs python3-rbd rbd-fuse rbd-mirror rbd-nbd
...
2025-12-05T21:10:47.896 INFO:teuthology.orchestra.run.smithi196.stdout: Installing : ceph-osd-crimson-2:20.3.0-4434.g8611241d.el9.x86_6 21/87
2025-12-05T21:10:47.913 INFO:teuthology.orchestra.run.smithi196.stdout: Running scriptlet: ceph-osd-crimson-2:20.3.0-4434.g8611241d.el9.x86_6 21/87
2025-12-05T21:10:47.913 INFO:teuthology.orchestra.run.smithi196.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:10:47.914 INFO:teuthology.orchestra.run.smithi196.stdout:
2025-12-05T21:10:48.106 INFO:teuthology.orchestra.run.smithi196.stdout: Upgrading : ceph-volume-2:20.3.0-4434.g8611241d.el9.noarch 22/87
2025-12-05T21:10:48.129 INFO:teuthology.orchestra.run.smithi196.stdout: Running scriptlet: ceph-volume-2:20.3.0-4434.g8611241d.el9.noarch 22/87
2025-12-05T21:10:48.166 INFO:teuthology.orchestra.run.smithi196.stdout: Upgrading : ceph-osd-2:20.3.0-4434.g8611241d.el9.x86_64 23/87
2025-12-05T21:10:49.568 INFO:teuthology.orchestra.run.smithi196.stdout: Running scriptlet: ceph-osd-2:20.3.0-4434.g8611241d.el9.x86_64 23/87
2025-12-05T21:10:49.666 INFO:teuthology.orchestra.run.smithi196.stdout: Installing : ceph-osd-classic-2:20.3.0-4434.g8611241d.el9.x86_6 24/87
2025-12-05T21:10:49.672 INFO:teuthology.orchestra.run.smithi196.stdout: Running scriptlet: ceph-osd-classic-2:20.3.0-4434.g8611241d.el9.x86_6 24/87
2025-12-05T21:10:49.672 INFO:teuthology.orchestra.run.smithi196.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:10:49.672 INFO:teuthology.orchestra.run.smithi196.stdout:

later during restarting upgraded osds, commands fail to find ceph-osd:

2025-12-05T21:11:33.076 INFO:teuthology.task.print:restarting upgraded osds
2025-12-05T21:11:33.076 INFO:teuthology.run_tasks:Running task ceph.restart...
2025-12-05T21:11:33.130 DEBUG:tasks.ceph.osd.0:waiting for process to exit
2025-12-05T21:11:33.131 INFO:teuthology.orchestra.run:waiting for 300
2025-12-05T21:11:33.155 INFO:tasks.ceph.osd.0:Stopped
2025-12-05T21:11:33.156 DEBUG:teuthology.orchestra.run.smithi077:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd down 0
2025-12-05T21:11:33.537 INFO:tasks.ceph.mon.a.smithi077.stderr:2025-12-05T21:11:33.534+0000 7f2471907640 -1 mon.a@0(leader).osd e40 definitely_dead 0
2025-12-05T21:11:33.659 INFO:teuthology.orchestra.run.smithi077.stderr:marked down osd.0.
2025-12-05T21:11:33.674 INFO:tasks.ceph.osd.0:Restarting daemon
2025-12-05T21:11:33.675 DEBUG:teuthology.orchestra.run.smithi077:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f --cluster ceph -i 0
2025-12-05T21:11:33.677 INFO:tasks.ceph.osd.0:Started
2025-12-05T21:11:33.677 DEBUG:tasks.ceph.osd.2:waiting for process to exit
2025-12-05T21:11:33.677 INFO:teuthology.orchestra.run:waiting for 300
2025-12-05T21:11:33.710 INFO:tasks.ceph.osd.2:Stopped
2025-12-05T21:11:33.711 DEBUG:teuthology.orchestra.run.smithi077:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph osd down 2
2025-12-05T21:11:33.734 INFO:tasks.ceph.osd.0.smithi077.stderr:Traceback (most recent call last):
2025-12-05T21:11:33.734 INFO:tasks.ceph.osd.0.smithi077.stderr: File "/bin/daemon-helper", line 65, in
2025-12-05T21:11:33.734 INFO:tasks.ceph.osd.0.smithi077.stderr: proc = subprocess.Popen(
2025-12-05T21:11:33.734 INFO:tasks.ceph.osd.0.smithi077.stderr: File "/usr/lib64/python3.9/subprocess.py", line 951, in init
2025-12-05T21:11:33.745 INFO:tasks.ceph.osd.0.smithi077.stderr: self._execute_child(args, executable, preexec_fn, close_fds,
2025-12-05T21:11:33.746 INFO:tasks.ceph.osd.0.smithi077.stderr: File "/usr/lib64/python3.9/subprocess.py", line 1837, in _execute_child
2025-12-05T21:11:33.746 INFO:tasks.ceph.osd.0.smithi077.stderr: raise child_exception_type(errno_num, err_msg, err_filename)
2025-12-05T21:11:33.746 INFO:tasks.ceph.osd.0.smithi077.stderr:FileNotFoundError: [Errno 2] No such file or directory: 'ceph-osd'
2025-12-05T21:11:34.075 INFO:tasks.ceph.mon.a.smithi077.stderr:2025-12-05T21:11:34.072+0000 7f2471907640 -1 mon.a@0(leader).osd e41 definitely_dead 0
2025-12-05T21:11:34.666 INFO:teuthology.orchestra.run.smithi077.stderr:marked down osd.2.
2025-12-05T21:11:34.691 INFO:tasks.ceph.osd.2:Restarting daemon
2025-12-05T21:11:34.691 DEBUG:teuthology.orchestra.run.smithi077:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f --cluster ceph -i 2
2025-12-05T21:11:34.693 INFO:tasks.ceph.osd.2:Started
2025-12-05T21:11:34.693 INFO:tasks.ceph:Waiting until ceph daemons up and pgs clean...
2025-12-05T21:11:34.694 INFO:tasks.ceph.ceph_manager.ceph:waiting for mgr available
2025-12-05T21:11:34.694 DEBUG:teuthology.orchestra.run.smithi077:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph mgr dump --format=json
2025-12-05T21:11:34.751 INFO:tasks.ceph.osd.2.smithi077.stderr:Traceback (most recent call last):
2025-12-05T21:11:34.751 INFO:tasks.ceph.osd.2.smithi077.stderr: File "/bin/daemon-helper", line 65, in
2025-12-05T21:11:34.751 INFO:tasks.ceph.osd.2.smithi077.stderr: proc = subprocess.Popen(
2025-12-05T21:11:34.751 INFO:tasks.ceph.osd.2.smithi077.stderr: File "/usr/lib64/python3.9/subprocess.py", line 951, in init
2025-12-05T21:11:34.751 INFO:tasks.ceph.osd.2.smithi077.stderr: self._execute_child(args, executable, preexec_fn, close_fds,
2025-12-05T21:11:34.752 INFO:tasks.ceph.osd.2.smithi077.stderr: File "/usr/lib64/python3.9/subprocess.py", line 1837, in _execute_child
2025-12-05T21:11:34.752 INFO:tasks.ceph.osd.2.smithi077.stderr: raise child_exception_type(errno_num, err_msg, err_filename)
2025-12-05T21:11:34.752 INFO:tasks.ceph.osd.2.smithi077.stderr:FileNotFoundError: [Errno 2] No such file or directory: 'ceph-osd'

this ultimately leads to timeouts:

AssertionError: timeout expired in wait_for_all_osds_up

@Matan-B
Copy link
Contributor

Matan-B commented Dec 8, 2025

The failed jobs above appear to only occur in centos9, the green ones are from ubuntu.
Looking at the log:

2025-12-05T21:34:54.214 INFO:teuthology.task.print:installing upgraded packages
2025-12-05T21:35:19.065 INFO:teuthology.orchestra.run.smithi103.stderr:Package ceph-osd-classic available, but not installed.
2025-12-05T21:35:19.066 INFO:teuthology.orchestra.run.smithi103.stdout:No match for argument: ceph-osd-classic
2025-12-05T21:35:19.190 INFO:teuthology.orchestra.run.smithi103.stdout: ceph-osd-classic                x86_64  2:20.3.0-4434.g8611241d.el9 ceph          16 M
2025-12-05T21:35:19.190 INFO:teuthology.orchestra.run.smithi103.stdout: ceph-osd-crimson                x86_64  2:20.3.0-4434.g8611241d.el9 ceph          19 M
2025-12-05T21:35:56.181 INFO:teuthology.orchestra.run.smithi103.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:35:58.017 INFO:teuthology.orchestra.run.smithi103.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:37:28.227 INFO:teuthology.orchestra.run.smithi135.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:37:30.058 INFO:teuthology.orchestra.run.smithi135.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:37:56.969 INFO:teuthology.task.print:ragweed prepare before upgrade
2025-12-05T21:38:13.699 INFO:teuthology.task.print:restarting upgraded osds
2025-12-05T21:38:14.373 INFO:tasks.ceph.osd.0.smithi103.stderr:FileNotFoundError: [Errno 2] No such file or directory: 'ceph-osd'

For some reason we fail to link ceph-osd, The scriptlet failed is sourced from the spec file:

%post osd-crimson
%{_sbindir}/update-alternatives --install %{_bindir}/ceph-osd ceph-osd \
    %{_bindir}/ceph-osd-crimson 50

Which would later be overridden with:

%post osd-classic
%{_sbindir}/update-alternatives --install %{_bindir}/ceph-osd ceph-osd \
    %{_bindir}/ceph-osd-classic 100

I trying to understand where ceph-osd exists and it is not a symlink comes from since this looks like the origin of this issue. Are you aware of other suites hitting this? Since upgrade tasks was fixed in rados/fs/upgrade

@cbodley
Copy link
Contributor Author

cbodley commented Dec 8, 2025

trying to understand where ceph-osd exists and it is not a symlink comes from since this looks like the origin of this issue.

in the output of the yum upgrade command, i see those update-alternatives commands running before the ceph-osd's cleanup:

  Installing       : ceph-osd-classic-2:20.3.0-4434.g8611241d.el9.x86_6   24/87
  Running scriptlet: ceph-osd-classic-2:20.3.0-4434.g8611241d.el9.x86_6   24/87
failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink

...

  Running scriptlet: ceph-osd-2:20.2.0-384.g36bf3900.el9.x86_64           59/87
  Cleanup          : ceph-osd-2:20.2.0-384.g36bf3900.el9.x86_64           59/87
  Running scriptlet: ceph-osd-2:20.2.0-384.g36bf3900.el9.x86_64           59/87

is "cleanup" where the old ceph-osd binary would be removed?

@Matan-B
Copy link
Contributor

Matan-B commented Dec 8, 2025

is "cleanup" where the old ceph-osd binary would be removed?

ceph-osd still exists:

%if 0%{with crimson}
mv %{buildroot}%{_bindir}/crimson-osd %{buildroot}%{_bindir}/ceph-osd-crimson
%endif
mv %{buildroot}%{_bindir}/ceph-osd %{buildroot}%{_bindir}/ceph-osd-classic

There are now 3 packages:

- ceph-osd: Contains shared components (systemd units, sysctl configs,
  common executables like ceph-erasure-code-tool) and depends on exactly
  one OSD implementation
- ceph-osd-classic: Contains the classic OSD implementation binary and
  classic-specific tools
- ceph-osd-crimson: Contains the crimson OSD implementation binary and
  crimson-specific tools

@Matan-B
Copy link
Contributor

Matan-B commented Dec 9, 2025

I might a clue on what's going on, still looking will edit this comment:
The issue seems to be from install.upgrade task (upgrade_common). This task seems to work differently from the plain install that was fixed in recent PRs:

2025-12-05T21:08:15.872 INFO:teuthology.run_tasks:Running task install.upgrade...
2025-12-05T21:08:15.913 DEBUG:teuthology.task.install:Package list is: {'deb': ['ceph', 'cephadm', 'ceph-mds', 'ceph-mgr', 'ceph-osd', 'ceph-osd-classic', 'ceph-common', 'ceph-fuse', 'ceph-test', 'ceph-volume', 'radosgw', 'python3-rados', 'python3-rgw', 'python3-cephfs', 'python3-rbd', 'libcephfs2', 'libcephfs-dev', 'librados2', 'librbd1', 'rbd-fuse'], 'rpm': ['ceph-radosgw', 'ceph-test', 'ceph', 'ceph-base', 'cephadm', 'ceph-immutable-object-cache', 'ceph-mgr', 'ceph-mgr-dashboard', 'ceph-mgr-diskprediction-local', 'ceph-mgr-rook', 'ceph-mgr-cephadm', 'ceph-osd', 'ceph-osd-classic', 'ceph-fuse', 'ceph-volume', 'librados-devel', 'libcephfs2', 'libcephfs-devel', 'librados2', 'librbd1', 'python3-rados', 'python3-rgw', 'python3-cephfs', 'python3-rbd', 'rbd-fuse', 'rbd-mirror', 'rbd-nbd']}
2025-12-05T21:08:15.913 INFO:teuthology.task.install:Upgrading ceph rpm packages: ceph-radosgw, ceph-test, ceph, ceph-base, cephadm, ceph-immutable-object-cache, ceph-mgr, ceph-mgr-dashboard, ceph-mgr-diskprediction-local, ceph-mgr-rook, ceph-mgr-cephadm, ceph-osd, ceph-osd-classic, ceph-fuse, ceph-volume, librados-devel, libcephfs2, libcephfs-devel, librados2, librbd1, python3-rados, python3-rgw, python3-cephfs, python3-rbd, rbd-fuse, rbd-mirror, rbd-nbd
2025-12-05T21:08:16.086 INFO:teuthology.task.install:Ceph rpm upgrade from 20.2.0-384.g36bf3900.el9 to 20.3.0-4434.g8611241d

We might need to adjust yum upgrade. We try to upgrade ceph-osd-classic even though it was not installed previously since it was excluded:

2025-12-05T21:08:16.086 INFO:teuthology.task.install:Ceph rpm upgrade from 20.2.0-384.g36bf3900.el9 to 20.3.0-4434.g8611241d
2025-12-05T21:08:16.861 DEBUG:teuthology.orchestra.run.smithi077:> sudo yum -y upgrade ceph-radosgw ceph-test ceph ceph-base cephadm ceph-immutable-object-cache ceph-mgr ceph-mgr-dashboard ceph-mgr-diskprediction-local ceph-mgr-rook ceph-mgr-cephadm ceph-osd ceph-osd-classic ceph-fuse ceph-volume librados-devel libcephfs2 libcephfs-devel librados2 librbd1 python3-rados python3-rgw python3-cephfs python3-rbd rbd-fuse rbd-mirror rbd-nbd
2025-12-05T21:08:37.641 INFO:teuthology.orchestra.run.smithi077.stderr:Package ceph-osd-classic available, but not installed.
2025-12-05T21:08:37.641 INFO:teuthology.orchestra.run.smithi077.stdout:No match for argument: ceph-osd-classic

Summarizing smithi077 attempt to upgrade:
We can't find ceph-osd-classic to upgrade -> we upgrade ceph-osd -> we get ceph-osd-crimson/classic as dependencies -> we fail to symlink:

2025-12-05T21:08:16.861 DEBUG:teuthology.orchestra.run.smithi077:> sudo yum -y upgrade ceph-radosgw ceph-test ceph ceph-base cephadm ceph-immutable-object-cache ceph-mgr ceph-mgr-dashboard ceph-mgr-diskprediction-local ceph-mgr-rook ceph-mgr-cephadm ceph-osd ceph-osd-classic ceph-fuse ceph-volume librados-devel libcephfs2 libcephfs-devel librados2 librbd1 python3-rados python3-rgw python3-cephfs python3-rbd rbd-fuse rbd-mirror rbd-nbd
2025-12-05T21:08:37.641 INFO:teuthology.orchestra.run.smithi077.stderr:Package ceph-osd-classic available, but not installed.
2025-12-05T21:08:37.641 INFO:teuthology.orchestra.run.smithi077.stdout:No match for argument: ceph-osd-classic
2025-12-05T21:08:37.756 INFO:teuthology.orchestra.run.smithi077.stdout:Upgrading:
2025-12-05T21:08:37.758 INFO:teuthology.orchestra.run.smithi077.stdout: ceph-osd                        x86_64  2:20.3.0-4434.g8611241d.el9 ceph         170 k
2025-12-05T21:08:37.760 INFO:teuthology.orchestra.run.smithi077.stdout:Installing dependencies:
2025-12-05T21:08:37.760 INFO:teuthology.orchestra.run.smithi077.stdout: ceph-osd-classic                x86_64  2:20.3.0-4434.g8611241d.el9 ceph          16 M
2025-12-05T21:08:37.760 INFO:teuthology.orchestra.run.smithi077.stdout: ceph-osd-crimson                x86_64  2:20.3.0-4434.g8611241d.el9 ceph          19 M
2025-12-05T21:08:37.762 INFO:teuthology.orchestra.run.smithi077.stdout:Downloading Packages:
2025-12-05T21:09:18.715 INFO:teuthology.orchestra.run.smithi077.stdout:  Running scriptlet: ceph-osd-crimson-2:20.3.0-4434.g8611241d.el9.x86_6   21/87
2025-12-05T21:09:18.715 INFO:teuthology.orchestra.run.smithi077.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:09:20.478 INFO:teuthology.orchestra.run.smithi077.stdout:  Running scriptlet: ceph-osd-classic-2:20.3.0-4434.g8611241d.el9.x86_6   24/87
2025-12-05T21:09:20.479 INFO:teuthology.orchestra.run.smithi077.stdout:failed to link /usr/bin/ceph-osd -> /etc/alternatives/ceph-osd: /usr/bin/ceph-osd exists and it is not a symlink
2025-12-05T21:09:39.614 INFO:teuthology.orchestra.run.smithi077.stdout:  Running scriptlet: ceph-osd-2:20.2.0-384.g36bf3900.el9.x86_64           59/87
2025-12-05T21:09:39.614 INFO:teuthology.orchestra.run.smithi077.stdout:  Cleanup          : ceph-osd-2:20.2.0-384.g36bf3900.el9.x86_64           59/87
2025-12-05T21:09:39.650 INFO:teuthology.orchestra.run.smithi077.stdout:  Running scriptlet: ceph-osd-2:20.2.0-384.g36bf3900.el9.x86_64           59/87
2025-12-05T21:09:46.575 INFO:teuthology.orchestra.run.smithi077.stdout:Upgraded:
2025-12-05T21:09:46.576 INFO:teuthology.orchestra.run.smithi077.stdout:  ceph-osd-2:20.3.0-4434.g8611241d.el9.x86_64
2025-12-05T21:09:46.579 INFO:teuthology.orchestra.run.smithi077.stdout:Complete!

We might need to add Obsoletes: that will remove the old ceph-osd version on upgrade and will allow symlink to pass. I'll push a PR

@Matan-B
Copy link
Contributor

Matan-B commented Dec 9, 2025

@cbodley, PTAL: #66568

@cbodley
Copy link
Contributor Author

cbodley commented Dec 10, 2025

split packages for ceph-osd-classic and ceph-osd-crimson were added on
main, but don't exist on squid and tentacle. exclude these packages from
their install tasks

Fixes: https://tracker.ceph.com/issues/73943

Signed-off-by: Casey Bodley <cbodley@redhat.com>
Signed-off-by: Casey Bodley <cbodley@redhat.com>
@cbodley
Copy link
Contributor Author

cbodley commented Dec 12, 2025

rebased after merge of #66568 and rgw/upgrade is green again 👍

https://pulpito.ceph.com/cbodley-2025-12-12_16:52:22-rgw:upgrade-main-distro-default-gibba/

@cbodley cbodley merged commit 6a2ed21 into ceph:main Dec 12, 2025
13 checks passed
@cbodley cbodley deleted the wip-73943 branch December 12, 2025 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants