Project

General

Profile

Actions

Bug #65136

open

QA failure: test_fscrypt_dummy_encryption_with_quick_group

Added by Rishabh Dave almost 2 years ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Correctness/Safety
Target version:
-
% Done:

0%

Source:
Backport:
Regression:
No
Severity:
3 - minor
Reviewed:
Affected Versions:
ceph-qa-suite:
Component(FS):
Labels (FS):
Pull request ID:
Tags (freeform):
Merge Commit:
Fixed In:
Released In:
Upkeep Timestamp:

Description

https://pulpito.ceph.com/yuriw-2024-03-14_15:28:28-fs-wip-yuri4-testing-2024-03-13-0733-reef-distro-default-smithi/7600356

2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:======================================================================
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:FAIL: test_fscrypt_dummy_encryption_with_quick_group (tasks.cephfs.test_fscrypt.TestFscrypt)
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:Traceback (most recent call last):
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:  File "/home/teuthworker/src/github.com_ceph_ceph-c_234d532354068e06d01621fd032c3b663cead394/qa/tasks/cephfs/test_fscrypt.py", line 76, in test_fscrypt_dummy_encryption_with_quick_group
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:    self.assertEqual(proc.returncode, 0)
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:AssertionError: 1 != 0
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:----------------------------------------------------------------------
2024-03-14T21:26:21.391 INFO:tasks.cephfs_test_runner:Ran 1 test in 4466.457s

Above job failure on the surface looks similar to https://tracker.ceph.com/issues/59684 because same test case from same suite failed with same traceback. But the function names present in the dmesg log of https://tracker.ceph.com/issues/59684 ("ceph_con_v1_try_read", for example) are not present anywhere in the directory of this job. grep -rni ceph_con_v1_try_read /a/yuriw-2024-03-14_15:28:28-fs-wip-yuri4-testing-2024-03-13-0733-reef-distro-default-smithi/7600356 returned nothing. So I suspected that this is a different failure. Talking with Xiubo confirmed that this looks like different issue.


Related issues 1 (0 open1 closed)

Related to Linux kernel client - Bug #59684: Test failure: test_fscrypt_dummy_encryption_with_quick_group (tasks.cephfs.test_fscrypt.TestFscrypt)DuplicateXiubo Li

Actions
Actions #1

Updated by Venky Shankar almost 2 years ago

  • Category set to Correctness/Safety
  • Assignee set to Xiubo Li
  • Target version set to v20.0.0

Xiubo, handing this out to you as Rishabh mentioned that he has run the failure through you and it seems like a different issue than https://tracker.ceph.com/issues/59684.

Actions #2

Updated by Venky Shankar almost 2 years ago

  • Related to Bug #59684: Test failure: test_fscrypt_dummy_encryption_with_quick_group (tasks.cephfs.test_fscrypt.TestFscrypt) added
Actions #3

Updated by Xiubo Li almost 2 years ago

Venky Shankar wrote:

Xiubo, handing this out to you as Rishabh mentioned that he has run the failure through you and it seems like a different issue than https://tracker.ceph.com/issues/59684.

Yeah, sure. I will work on it later.

Actions #11

Updated by Igor Golikov about 1 year ago

I observed the following failure:
The test_fscrypt_dummy_encryption_with_quick_group (within the TestFSCryptXFS test suite) fails repeatedly with the following error:

FSTYP -- ceph
PLATFORM -- Linux/x86_64 smithi110 6.13.0-rc7-gc4d7838ae54d #1 SMP PREEMPT_DYNAMIC Mon Jan 27 16:39:20 UTC 2025
MKFS_OPTIONS -- 172.21.15.64:6789:/scratch
MOUNT_OPTIONS -- -o name=admin,secret=AQA5rrtnE5LIJxAAOCa1SRptmtTgjr7aGhxO0Q==,test_dummy_encryption -o context=system_u:object_r:root_t:s0 172.21.15.64:6789:/scratch /tmp/tmp.gZIQpl7FOWscratch


ceph/001 [expunged]
ceph/002 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/003 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/004 0s
ceph/005 [not run] mount option "test_dummy_encryption" not allowed in this test
generic/001 9s
generic/002 1s
generic/003 [not run] atime not maintained by ceph
generic/004 [not run] O_TMPFILE is not supported
generic/005 1s
generic/006 31s
generic/007 58s
generic/008 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/009 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/011 31s
generic/012 [not run] xfs_io fpunch failed (old kernel/wrong fs?)
generic/013
2025-02-23T23:52:09.555 INFO:tasks.cephfs.test_fscrypt:Command stderr -

Link to the run: https://qa-proxy.ceph.com/teuthology/igolikov-2025-02-23_18:38:01-fs:fscrypt-main-testing-default-smithi/8148540/teuthology.log

When I check the remote logs (smithi110 in this case), there is only crash log there and its empty, No logs at all for this machine.
So I checked the console_logs and there s a log file for smithi110. What I found there:

1053.725754] kernel BUG at net/ceph/messenger.c:1070!
Entering kdb (current=0xffff888100a40000, pid 9) on processor 0 Oops: (null)
due to oops at 0xffffffffa09a2974
CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Kdump: loaded Not tainted 6.13.0-rc7-gc4d7838ae54d #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
Workqueue: ceph-msgr ceph_con_workfn [libceph]
RIP: 0010:ceph_msg_data_cursor_init+0x34/0x40 [libceph]

Actions #12

Updated by Venky Shankar 10 months ago

  • Assignee changed from Xiubo Li to Igor Golikov

(reassigning to Igor)

Igor Golikov wrote in #note-11:

I observed the following failure:
The test_fscrypt_dummy_encryption_with_quick_group (within the TestFSCryptXFS test suite) fails repeatedly with the following error:

FSTYP -- ceph
PLATFORM -- Linux/x86_64 smithi110 6.13.0-rc7-gc4d7838ae54d #1 SMP PREEMPT_DYNAMIC Mon Jan 27 16:39:20 UTC 2025
MKFS_OPTIONS -- 172.21.15.64:6789:/scratch
MOUNT_OPTIONS -- -o name=admin,secret=AQA5rrtnE5LIJxAAOCa1SRptmtTgjr7aGhxO0Q==,test_dummy_encryption -o context=system_u:object_r:root_t:s0 172.21.15.64:6789:/scratch /tmp/tmp.gZIQpl7FOWscratch

@
ceph/001 [expunged]
ceph/002 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/003 [not run] mount option "test_dummy_encryption" not allowed in this test
ceph/004 0s
ceph/005 [not run] mount option "test_dummy_encryption" not allowed in this test
generic/001 9s
generic/002 1s
generic/003 [not run] atime not maintained by ceph
generic/004 [not run] O_TMPFILE is not supported
generic/005 1s
generic/006 31s
generic/007 58s
generic/008 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/009 [not run] xfs_io fzero failed (old kernel/wrong fs?)
generic/011 31s
generic/012 [not run] xfs_io fpunch failed (old kernel/wrong fs?)

Maybe these operations (fzero/fpunch) aren't supported by the cephfs kernel driver. @Alex Markuze ?

generic/013
2025-02-23T23:52:09.555 INFO:tasks.cephfs.test_fscrypt:Command stderr -
@
Link to the run: https://qa-proxy.ceph.com/teuthology/igolikov-2025-02-23_18:38:01-fs:fscrypt-main-testing-default-smithi/8148540/teuthology.log

When I check the remote logs (smithi110 in this case), there is only crash log there and its empty, No logs at all for this machine.
So I checked the console_logs and there s a log file for smithi110. What I found there:

1053.725754] kernel BUG at net/ceph/messenger.c:1070!
Entering kdb (current=0xffff888100a40000, pid 9) on processor 0 Oops: (null)
due to oops at 0xffffffffa09a2974
CPU: 0 UID: 0 PID: 9 Comm: kworker/0:1 Kdump: loaded Not tainted 6.13.0-rc7-gc4d7838ae54d #1
Hardware name: Supermicro SYS-5018R-WR/X10SRW-F, BIOS 2.0 12/17/2015
Workqueue: ceph-msgr ceph_con_workfn [libceph]
RIP: 0010:ceph_msg_data_cursor_init+0x34/0x40 [libceph]

This kernel oops looks unrelated to the failure above. Do we know what happened? @Alex Markuze ?

I'm seeing this failed run in my test branch: https://pulpito.ceph.com/vshankar-2025-05-12_08:22:09-fs-wip-vshankar-testing-20250508.200127-debug-testing-default-smithi/8281104

Actions #13

Updated by Alex Markuze 10 months ago

I think @Viacheslav Dubeyko was seeing some issue with test 13.

Actions #14

Updated by Patrick Donnelly 9 months ago

  • Target version deleted (v20.0.0)
Actions

Also available in: Atom PDF