quincy: qa/cephfs: no reliance on centos#59037
Conversation
|
cc @lxbsz @joscollin |
|
@vshankar There still have two jobs will fail, and you missed:
If I filter these two tests with |
I'll fix that up quincy. I likely just checked centos yamls and missed out rhel8. |
Currently I just triggered to schedule 185/204 jobs, and will trigger the rest ones after this being fixed. |
Signed-off-by: Venky Shankar <vshankar@redhat.com>
And switch to ubuntu. Signed-off-by: Venky Shankar <vshankar@redhat.com>
b0ca4b3 to
e954116
Compare
|
@lxbsz Try to schedule with this update. Kept the fix in a separate commit for now. |
Trying it now. Thanks |
There was a problem hiding this comment.
Hi @vshankar. I think following failures are related to this PR -
2024-08-13T05:20:16.778 DEBUG:teuthology.orchestra.run.smithi028:> sudo nsenter --net=/var/run/netns/ceph-ns--home-ubuntu-cephtest-mnt.0 adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage /bin/mount -t ceph :/ /home/ubuntu/cephtest/mnt.0 -v -o norequire_active_mds,conf=/etc/ceph/ceph.conf,norbytes,name=0,mds_namespace=cephfs,ms_mode=legacy,wsync
2024-08-13T05:20:16.821 INFO:teuthology.orchestra.run.smithi028.stdout:parsing options: rw,norequire_active_mds,conf=/etc/ceph/ceph.conf,norbytes,name=0,mds_namespace=cephfs,ms_mode=legacy,wsync
2024-08-13T05:20:16.822 INFO:teuthology.orchestra.run.smithi028.stdout:mount.ceph: options "norequire_active_mds,norbytes,name=0,mds_namespace=cephfs,ms_mode=legacy,wsync".
2024-08-13T05:20:16.822 INFO:teuthology.orchestra.run.smithi028.stdout:invalid new device string format
2024-08-13T05:20:16.969 INFO:teuthology.orchestra.run.smithi028.stderr:mount error 22 = Invalid argument
2024-08-13T05:20:16.970 INFO:teuthology.orchestra.run.smithi028.stdout:parsing options: rw,norequire_active_mds,conf=/etc/ceph/ceph.conf,norbytes,name=0,mds_namespace=cephfs,ms_mode=legacy,wsync
2024-08-13T05:20:16.970 INFO:teuthology.orchestra.run.smithi028.stdout:mount.ceph: options "norequire_active_mds,norbytes,name=0,mds_namespace=cephfs,ms_mode=legacy,wsync".
2024-08-13T05:20:16.970 INFO:teuthology.orchestra.run.smithi028.stdout:invalid new device string format
2024-08-13T05:20:16.970 INFO:teuthology.orchestra.run.smithi028.stdout:mount.ceph: resolved to: "172.21.15.28:6789,172.21.15.53:6789,172.21.15.155:6789"
2024-08-13T05:20:16.970 INFO:teuthology.orchestra.run.smithi028.stdout:mount.ceph: trying mount with old device syntax: 172.21.15.28:6789,172.21.15.53:6789,172.21.15.155:6789:/
2024-08-13T05:20:16.970 INFO:teuthology.orchestra.run.smithi028.stdout:mount.ceph: options "norequire_active_mds,norbytes,name=0,mds_namespace=cephfs,ms_mode=legacy,wsync,key=0,fsid=cc3ed6a8-5930-11ef-bcce-c7b262605968" will pass to kernel
2024-08-13T05:20:16.972 DEBUG:teuthology.orchestra.run:got remote process result: 32
2024-08-13T05:20:16.973 INFO:tasks.cephfs.kernel_mount:mount command failed
2024-08-13T05:20:16.974 ERROR:teuthology.run_tasks:Saw exception from tasks.
2024-08-13T05:29:58.237 INFO:tasks.cephfs_test_runner:======================================================================
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner:ERROR: test_cap_acquisition_throttle_readdir (tasks.cephfs.test_client_limits.TestClientLimits)
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner:Mostly readdir acquires caps faster than the mds recalls, so the cap
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_e0e452b3a276271196a54dadb3ac706afad6f142/qa/tasks/cephfs/test_client_limits.py", line 189, in test_cap_acquisition_throttle_readdir
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner: cap_acquisition_value = self.get_session(mount_a_client_id)['cap_acquisition']['value']
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_e0e452b3a276271196a54dadb3ac706afad6f142/qa/tasks/cephfs/cephfs_test_case.py", line 257, in get_session
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner: return self._session_by_id(session_ls)[client_id]
2024-08-13T05:29:58.238 INFO:tasks.cephfs_test_runner:KeyError: '5358'
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:======================================================================
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:ERROR: test_client_metrics_and_metadata (tasks.cephfs.test_mds_metrics.TestMDSMetrics)
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_e0e452b3a276271196a54dadb3ac706afad6f142/qa/tasks/cephfs/test_mds_metrics.py", line 541, in test_client_metrics_and_metadata
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner: raise RuntimeError("valid_metrics of fs1 not found!")
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:RuntimeError: valid_metrics of fs1 not found!
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-08-13T06:29:02.519 INFO:tasks.cephfs_test_runner:Ran 1 test in 94.801s
There are more new failures in this run. And other QA runs including this PR also have failures mentioed above -
https://pulpito.ceph.com/xiubli-2024-08-13_04:53:58-fs-wip-xiubli-testing-20240812.051138-quincy-distro-default-smithi/
https://pulpito.ceph.com/xiubli-2024-08-13_04:48:23-fs-wip-jcollin-testing-20240812.053224-quincy-distro-default-smithi/
https://pulpito.ceph.com/xiubli-2024-08-13_10:32:13-fs-wip-xiubli-testing-20240813.052545-quincy-distro-default-smithi/
Checked the failures, I found that the I don't know how this failure is related to this PR. Possibly the python issue in different disto ? |
|
We see this failure for multiple different QA runs and this PR was the only PR that was present on all those QA runs. Plus, this failure is seen only for Spending some time digging deeper in to this, the code itself looks alright to me. It creates dict of session_id and session key-value pair and returns session only for client_id it was passed. And this same code is present on main branch too. Probably the extra debug code needs to be added to find out exactly what went wrong. |
For this run, the kernel ring buffer shows
and this is ubuntu 20.04 (focal) with the stock kernel -- so, this is trying to use v1 messenger protocol0 which is getting denied by the kernel driver. cc @lxbsz |
For this kernel it hasn't included the following commit yet: |
This failure again is related to ubuntu 20.04 kernel driver which is sending an empty metric feature bitset: Its kclient version 5.4.0 which probably does not have the metric changes @lxbsz ? |
Yeah, correct. |
In that case, lets track this in redmine as these are known issues with using stock kernel in ubuntu 20.04. |
Or should we also just skip this tests ? |
That's fine too, but it involved a custom quincy patch which needs to be reverted once we get back to testing latest distros. |
|
jenkins test api |
1 similar comment
|
jenkins test api |
So, I've been going over https://pulpito.ceph.com/xiubli-2024-08-13_04:29:48-fs-wip-xiubli-testing-20240812.103236-quincy-distro-default-smithi/ (QA tracker: https://tracker.ceph.com/issues/67495) and these failures are popping up, however, given that switching to Ubuntu is the only choice we have right now (due to the centos mess), we should just track these issues in redmine tickets and revisit them when we have the relevant distro's for testing. @rishabh-d-dave |
|
Okay. After some delay, I'm getting back to this -- will create quincy specific redmine tickets and merge the set of PRs that are pending for Quincy. |
I'll start merging Quincy backport PRs after preparing the run wiki if no other new failures are seen related to the PRs being tested. |
|
https://tracker.ceph.com/projects/cephfs/wiki/Quincy#wip-xiubli-testing-20240812103236-quincy requires approval @ceph/cephfs |
Although this is off-putting, this is probably the way forward to get tests running (esp. fs:upgrade) without relying on centos8.
Contribution Guidelines
To sign and title your commits, please refer to Submitting Patches to Ceph.
If you are submitting a fix for a stable branch (e.g. "quincy"), please refer to Submitting Patches to Ceph - Backports for the proper workflow.
When filling out the below checklist, you may click boxes directly in the GitHub web UI. When entering or editing the entire PR message in the GitHub web UI editor, you may also select a checklist item by adding an
xbetween the brackets:[x]. Spaces and capitalization matter when checking off items this way.Checklist
Show available Jenkins commands
jenkins retest this pleasejenkins test classic perfjenkins test crimson perfjenkins test signedjenkins test make checkjenkins test make check arm64jenkins test submodulesjenkins test dashboardjenkins test dashboard cephadmjenkins test apijenkins test docsjenkins render docsjenkins test ceph-volume alljenkins test ceph-volume toxjenkins test windowsjenkins test rook e2e