Bug #65136
openQA failure: test_fscrypt_dummy_encryption_with_quick_group
0%
Description
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:====================================================================== 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:FAIL: test_fscrypt_dummy_encryption_with_quick_group (tasks.cephfs.test_fscrypt.TestFscrypt) 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:Traceback (most recent call last): 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner: File "/home/teuthworker/src/github.com_ceph_ceph-c_234d532354068e06d01621fd032c3b663cead394/qa/tasks/cephfs/test_fscrypt.py", line 76, in test_fscrypt_dummy_encryption_with_quick_group 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner: self.assertEqual(proc.returncode, 0) 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:AssertionError: 1 != 0 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner: 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:---------------------------------------------------------------------- 2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:Ran 1 test in 4466.457s
Above job failure on the surface looks similar to https://tracker.ceph.com/issues/59684 because same test case from same suite failed with same traceback. But the function names present in the dmesg log of https://tracker.ceph.com/issues/59684 ("ceph_con_v1_try_read", for example) are not present anywhere in the directory of this job. grep -rni ceph_con_v1_try_read /a/yuriw-2024-03-14_15:28:28-fs-wip-yuri4-testing-2024-03-13-0733-reef-distro-default-smithi/7600356 returned nothing. So I suspected that this is a different failure. Talking with Xiubo confirmed that this looks like different issue.
Updated by Venky Shankar almost 2 years ago
- Category set to Correctness/Safety
- Assignee set to Xiubo Li
- Target version set to v20.0.0
Xiubo, handing this out to you as Rishabh mentioned that he has run the failure through you and it seems like a different issue than https://tracker.ceph.com/issues/59684.
Updated by Venky Shankar almost 2 years ago
- Related to Bug #59684: Test failure: test_fscrypt_dummy_encryption_with_quick_group (tasks.cephfs.test_fscrypt.TestFscrypt) added
Updated by Xiubo Li almost 2 years ago
Venky Shankar wrote:
Xiubo, handing this out to you as Rishabh mentioned that he has run the failure through you and it seems like a different issue than https://tracker.ceph.com/issues/59684.
Yeah, sure. I will work on it later.
Updated by Jos Collin almost 2 years ago ยท Edited
reef:
https://pulpito.ceph.com/leonidus-2024-06-14_06:25:37-fs-wip-lusov-testing-20240613.155007-reef-distro-default-smithi/7754979
https://pulpito.ceph.com/leonidus-2024-06-16_14:19:11-fs-wip-lusov-testing-20240616.100042-reef-distro-default-smithi/7758502
https://pulpito.ceph.com/leonidus-2024-06-16_14:17:14-fs-wip-leonidus-testing-20240616.070940-reef-distro-default-smithi/7758390
Updated by Igor Golikov about 1 year ago
I observed the following failure:
The test_fscrypt_dummy_encryption_with_quick_group (within the TestFSCryptXFS test suite) fails repeatedly with the following error:
FSTYP -- ceph
PLATFORM -- Linux/x86_64 smithi110 6.13.0-rc7-gc4d7838ae54d #1 SMP PREEMPT_DYNAMIC Mon Jan 27 16:39:20 UTC 2025
MKFS_OPTIONS -- 172.21.15.64:6789:/scratch
MOUNT_OPTIONS -- -o name=admin,secret=AQA5rrtnE5LIJxAAOCa1SRptmtTgjr7aGhxO0Q==,test_dummy_encryption -o context=system_u:object_r:root_t:s0 172.21.15.64:6789:/scratch /tmp/tmp.gZIQpl7FOWscratch
ceph/001 [expunged]
ceph/002 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/003 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/004 0s
ceph/005 [not run] mount option "test_dummy_encryption" not allowed in this test
generic/001 9s
generic/002 1s
generic/003 [not run] atime not maintained by ceph
generic/004 [not run] O_TMPFILE is not supported
generic/005 1s
generic/006 31s
generic/007 58s
generic/008 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/009 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/011 31s
generic/012 [not run] xfs_io fpunch failed (old kernel/wrong fs?)
generic/013
2025-02-23T23:52:09.555 INFO:tasks.cephfs.test_fscrypt:Command stderr -
Link to the run: https://qa-proxy.ceph.com/teuthology/igolikov-2025-02-23_18:38:01-fs:fscrypt-main-testing-default-smithi/8148540/teuthology.log
When I check the remote logs (smithi110 in this case), there is only crash log there and its empty, No logs at all for this machine.
So I checked the console_logs and there s a log file for smithi110. What I found there:
1053.725754] kernel BUG at net/ceph/messenger.c:1070!
Entering kdb (current=0xffff888100a40000, pid 9) on processor 0 Oops: (null)
due to oops at 0xffffffffa09a2974
CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Kdump: loaded Not tainted 6.13.0-rc7-gc4d7838ae54d #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
Workqueue: ceph-msgr ceph_con_workfn [libceph]
RIP: 0010:ceph_msg_data_cursor_init+0x34/0x40 [libceph]
Updated by Venky Shankar 10 months ago
- Assignee changed from Xiubo Li to Igor Golikov
(reassigning to Igor)
Igor Golikov wrote in #note-11:
I observed the following failure:
The test_fscrypt_dummy_encryption_with_quick_group (within the TestFSCryptXFS test suite) fails repeatedly with the following error:
FSTYP -- ceph
PLATFORM -- Linux/x86_64 smithi110 6.13.0-rc7-gc4d7838ae54d #1 SMP PREEMPT_DYNAMIC Mon Jan 27 16:39:20 UTC 2025
MKFS_OPTIONS -- 172.21.15.64:6789:/scratch
MOUNT_OPTIONS -- -o name=admin,secret=AQA5rrtnE5LIJxAAOCa1SRptmtTgjr7aGhxO0Q==,test_dummy_encryption -o context=system_u:object_r:root_t:s0 172.21.15.64:6789:/scratch /tmp/tmp.gZIQpl7FOWscratch
@
ceph/001 [expunged]
ceph/002 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/003 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/004 0s
ceph/005 [not run] mount option "test_dummy_encryption" not allowed in this test
generic/001 9s
generic/002 1s
generic/003 [not run] atime not maintained by ceph
generic/004 [not run] O_TMPFILE is not supported
generic/005 1s
generic/006 31s
generic/007 58s
generic/008 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/009 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/011 31s
generic/012 [not run] xfs_io fpunch failed (old kernel/wrong fs?)
Maybe these operations (fzero/fpunch) aren't supported by the cephfs kernel driver. @Alex Markuze ?
generic/013
2025-02-23T23:52:09.555 INFO:tasks.cephfs.test_fscrypt:Command stderr -
@
Link to the run: https://qa-proxy.ceph.com/teuthology/igolikov-2025-02-23_18:38:01-fs:fscrypt-main-testing-default-smithi/8148540/teuthology.logWhen I check the remote logs (smithi110 in this case), there is only crash log there and its empty, No logs at all for this machine.
So I checked the console_logs and there s a log file for smithi110. What I found there:
1053.725754] kernel BUG at net/ceph/messenger.c:1070!
Entering kdb (current=0xffff888100a40000, pid 9) on processor 0 Oops: (null)
due to oops at 0xffffffffa09a2974
CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Kdump: loaded Not tainted 6.13.0-rc7-gc4d7838ae54d #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
Workqueue: ceph-msgr ceph_con_workfn [libceph]
RIP: 0010:ceph_msg_data_cursor_init+0x34/0x40 [libceph]
This kernel oops looks unrelated to the failure above. Do we know what happened? @Alex Markuze ?
I'm seeing this failed run in my test branch: https://pulpito.ceph.com/vshankar-2025-05-12_08:22:09-fs-wip-vshankar-testing-20250508.200127-debug-testing-default-smithi/8281104
Updated by Alex Markuze 10 months ago
I think @Viacheslav Dubeyko was seeing some issue with test 13.