QA Run #71431
openwip-vshankar-testing-20250525.122301-debug
Updated by Venky Shankar 10 months ago
- Status changed from QA Testing to QA Needs Approval
- Assignee changed from Venky Shankar to Kotresh Hiremath Ravishankar
Kotresh, handing this out for analyzing the fs suite run for the PR mentioned in the description. (as a second eye for the test results).
Updated by Kotresh Hiremath Ravishankar 10 months ago
Analysed the failures:
1. test_dir_merge_with_snap_items (https://pulpito.ceph.com/vshankar-2025-05-26_08:10:42-fs-wip-vshankar-testing-20250525.122301-debug-testing-default-smithi/8296019)
Not related. Known Issue - https://tracker.ceph.com/issues/71060
2. test_journal_smoke (https://pulpito.ceph.com/vshankar-2025-05-26_08:10:42-fs-wip-vshankar-testing-20250525.122301-debug-testing-default-smithi/8295810)
It is not related. The job ran with using global snaprealm, so the fix is not exercised. And the failure is because of the git checkout failed. See below.
2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner:====================================================================== 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner:ERROR: test_journal_smoke (tasks.cephfs.test_journal_repair.TestJournalRepair) 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6623f908411ee935603e31ca18a04df330fbb125/qa/tasks/workunit.py", line 345, in _run_tests 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner: remote.run(logger=log.getChild(role), 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/remote.py", line 535, in run 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner: r = self._runner(client=self.ssh, name=self.shortname, **kwargs) 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 461, in run 2025-05-26T08:45:23.565 INFO:tasks.cephfs_test_runner: r.wait() 2025-05-26T08:45:23.566 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 161, in wait 2025-05-26T08:45:23.566 INFO:tasks.cephfs_test_runner: self._raise_for_status() 2025-05-26T08:45:23.566 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 181, in _raise_for_status 2025-05-26T08:45:23.566 INFO:tasks.cephfs_test_runner: raise CommandFailedError( 2025-05-26T08:45:23.566 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed on smithi059 with status 128: 'rm -rf /home/ubuntu/cephtest/clone.client.0 && git clone https://git.ceph.com/ceph-ci.git /home/ubuntu/cephtest/clone.client.0 && cd /home/ubuntu/cephtest/clone.client.0 && git checkout 6623f908411ee935603e31ca18a04df330fbb125'
3. test_root_snapshot_with_use_global_snaprealm_seq_config_disabled (https://pulpito.ceph.com/vshankar-2025-05-26_08:10:42-fs-wip-vshankar-testing-20250525.122301-debug-testing-default-smithi/8295925)
The failure is in clean-up. Check below. I will fix this up and refresh.
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner:======================================================================
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner:ERROR: test_root_snapshot_with_use_global_snaprealm_seq_config_disabled (tasks.cephfs.test_volumes.TestSubvolumeSnapshots)
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner:To verify that the snapshots between root and subvolume snapshot directory triggers cow of
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6623f908411ee935603e31ca18a04df330fbb125/qa/tasks/cephfs/test_volumes.py", line 6657, in test_root_snapshot_with_use_global_snaprealm_seq_config_disabled
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner: self._cleanup_subvolumes_and_snapshots(group, subvolname, snapshot, True)
2025-05-26T17:22:06.810 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6623f908411ee935603e31ca18a04df330fbb125/qa/tasks/cephfs/test_volumes.py", line 441, in _cleanup_subvolumes_and_snapshots
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: self.mount_a.run_shell(['rmdir', './.snap/root_s1'])
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_ceph-c_6623f908411ee935603e31ca18a04df330fbb125/qa/tasks/cephfs/mount.py", line 782, in run_shell
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: return self.client_remote.run(args=args, **kwargs)
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/remote.py", line 535, in run
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 461, in run
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: r.wait()
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 161, in wait
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: self._raise_for_status()
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/git.ceph.com_teuthology_eaeb97003cfc43fc86754e4e45e7b398c784dedf/teuthology/orchestra/run.py", line 181, in _raise_for_status
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner: raise CommandFailedError(
2025-05-26T17:22:06.811 INFO:tasks.cephfs_test_runner:teuthology.exceptions.CommandFailedError: Command failed on smithi076 with status 1: '(cd /home/ubuntu/cephtest/mnt.0 && exec rmdir ./.snap/root_s1)'
Updated by Kotresh Hiremath Ravishankar 10 months ago
Further digged into the failure of test_root_snapshot_with_use_global_snaprealm_seq_config_disabled https://pulpito.ceph.com/vshankar-2025-05-26_08:10:42-fs-wip-vshankar-testing-20250525.122301-debug-testing-default-smithi/8295925
1. The job used kclient
2. Clean up code in the qa testcase is as below.
def _cleanup_subvolumes_and_snapshots(self, group, subvolname, snapshot, root_snapped=False):
for i in range(1, 26):
self._fs_cmd("subvolume", "snapshot", "rm", self.volname, f"{subvolname}_{i}", f"{snapshot}_1", group)
#self._fs_cmd("subvolume", "snapshot", "rm", self.volname, f"{subvolname}_{i}", f"{snapshot}_2", group)
for i in range(1, 26):
self._fs_cmd("subvolume", "rm", self.volname, f"{subvolname}_{i}", group)
self._fs_cmd("subvolumegroup", "rm", self.volname, group)
if root_snapped:
self.mount_a.run_shell(['rmdir', './.snap/root_s1']) <<<<<------------------ Received Operation Not Permitted here ----------------
self.mount_a.run_shell(['rmdir', './.snap/root_s2'])
2. Error snippet of 'Operatoin not permitted' received on root snapshot removal from teuthology log.
2025-05-26T17:21:58.480 DEBUG:teuthology.orchestra.run.smithi076:> sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs subvolumegroup rm cephfs subvol_grp34630 ... 2025-05-26T17:21:58.857 DEBUG:teuthology.orchestra.run.smithi076:> (cd /home/ubuntu/cephtest/mnt.0 && exec rmdir ./.snap/root_s1) 2025-05-26T17:21:58.873 DEBUG:teuthology.orchestra.run:got remote process result: 1 2025-05-26T17:21:58.874 INFO:teuthology.orchestra.run.smithi076.stderr:rmdir: failed to remove './.snap/root_s1': Operation not permitted 2025-05-26T17:21:58.874 INFO:teuthology.nuke:Clearing teuthology firewall rules... 2025-05-26T17:21:58.874 DEBUG:teuthology.orchestra.run.smithi076:> sudo sh -c 'iptables-save | grep -v teuthology | iptables-restore'
3. The rmsnap request has not reached mds
[khiremat@vossi04 8295925]$ pwd /teuthology/vshankar-2025-05-26_08:10:42-fs-wip-vshankar-testing-20250525.122301-debug-testing-default-smithi/8295925 [khiremat@vossi04 8295925]$ find . | xargs zgrep "rmsnap" | grep handle_client_request | wc -l 28 [khiremat@vossi04 8295925]$ find . | xargs zgrep "rmsnap" | grep handle_client_request | grep root_s1 | wc -l 0
4. Kclient logs around that time - no evidence of failure in the logs. We have a log for the subvolumegroup removal which is the cmd ran prior to root snapshot removal. And no errors after that.
May 26 17:21:58 smithi076 sudo[74774]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000) May 26 17:21:58 smithi076 sudo[74774]: pam_unix(sudo:session): session closed for user root May 26 17:21:58 smithi076 sudo[74823]: ubuntu : PWD=/home/ubuntu ; USER=root ; COMMAND=/bin/adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 120 ceph --cluster ceph fs subvolumegroup rm cephfs subvol_grp34630 May 26 17:21:58 smithi076 sudo[74823]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000) May 26 17:21:58 smithi076 sudo[74823]: pam_unix(sudo:session): session closed for user root May 26 17:21:58 smithi076 sudo[74896]: ubuntu : PWD=/home/ubuntu ; USER=root ; COMMAND=/bin/sh -c 'iptables-save | grep -v teuthology | iptables-restore' May 26 17:21:58 smithi076 sudo[74896]: pam_unix(sudo:session): session opened for user root(uid=0) by ubuntu(uid=1000) May 26 17:21:58 smithi076 sudo[74896]: pam_unix(sudo:session): session closed for user root
Based on the available logs above, it's not clear why the kclient threw 'Operation not permitted'
Updated by Kotresh Hiremath Ravishankar 10 months ago
Further updates on the failures of test_root_snapshot_with_use_global_snaprealm_seq_config_disabled
1. It's consistently reproducible. Please check below.
https://pulpito.ceph.com/khiremat-2025-05-28_12:41:11-fs:volumes-wip-vshankar-testing-20250525.122301-debug-distro-default-smithi/
https://pulpito.ceph.com/khiremat-2025-05-28_14:11:47-fs:volumes-wip-vshankar-testing-20250525.122301-debug-distro-default-smithi/
Updated by Kotresh Hiremath Ravishankar 10 months ago
The following run confirms that the issue is not related to the PR in question here.
This is reproducible on the main branch as well (Carved the test and pushed to upstream branch and ran the test). Please check the run below.
https://pulpito.ceph.com/khiremat-2025-05-28_17:34:43-fs:volumes-main-distro-default-smithi/
Let me just add 'sudo' to the snapshot removal command. This could be as trivial as that.
I will update the results with 'sudo'
Updated by Kotresh Hiremath Ravishankar 10 months ago
The test passes with 'sudo rmdir <snaphsot>'. It's as simple as this :)
https://pulpito.ceph.com/khiremat-2025-05-28_18:11:40-fs:volumes-main-distro-default-smithi/
I will refresh the PR with the qa fix.
Updated by Venky Shankar 10 months ago ยท Edited
Kotresh Hiremath Ravishankar wrote in #note-6:
The test passes with 'sudo rmdir <snaphsot>'. It's as simple as this :)
ha! why did it not fail in the earlier run I did on a separate branch :thinking:
I think I added the cleanup in the latest patchset.
Updated by Venky Shankar 10 months ago
Venky Shankar wrote in #note-7:
Kotresh Hiremath Ravishankar wrote in #note-6:
The test passes with 'sudo rmdir <snaphsot>'. It's as simple as this :)
ha! why did it not fail in the earlier run I did on a separate branch :thinking:
I think I added the cleanup in the latest patchset.
(maybe you edited my comment rather than replying :P)
Updated by Kotresh Hiremath Ravishankar 10 months ago
Venky Shankar wrote in #note-8:
Venky Shankar wrote in #note-7:
Kotresh Hiremath Ravishankar wrote in #note-6:
The test passes with 'sudo rmdir <snaphsot>'. It's as simple as this :)
ha! why did it not fail in the earlier run I did on a separate branch :thinking:
I think I added the cleanup in the latest patchset.
(maybe you edited my comment rather than replying :P)
Oh! yeah I never realized!!
Updated by Venky Shankar 10 months ago
- Status changed from QA Needs Approval to QA Approved
fs approved. Need to update the run wiki before merging the change.