Bug #73152
closedCephFS client side, stops working causing machine partial lockup when using fscrypt - multiple OSes / kernels affected
100%
Description
Expected Behavior:
The system should allow access to fscrypt-encrypted directories on CephFS without causing kernel oops or system instability.
Actual Behavior:
Accessing fscrypt-encrypted directories on CephFS leads to a kernel oops, partial system lockup and failure to access CephFS file system.
Note: without the FScrypt area decrypted there does not appear to be issues / errors.
Client System Information:
Operating Systems: Ubuntu 22.04,24.04 and Debian 13
Ubuntu client kernel Version: 6.5.0-44-generic / 6.8.0-40-generic / 6.8.0-83-generic
Ubuntu client Ceph versions: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) Ubuntu 22.04, ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28) squid (stable) Ubuntu 24.04
Debian client kernel Version: 6.14.0-29-generic
Debian Ceph version: ceph version 18.2.7 (6b0e988052ec84cf2d4a54ff9bbbc5e720b621ad) reef (stable)
Server System Information:
Proxmox Cluster backed on Proxmox Virtual Environment 9.0.6
Ceph server version: ceph version 19.2.3 (ad1eecf4042e0ce72f382f60c97b709fd6f16a51) squid (stable)
More info / logs:
Significantly more info including DMESG dumps can be found on my original ticket filed under Ubuntu: https://bugs.launchpad.net/kernel/+bug/2073679
Files
Updated by Ater Fax 4 months ago
What happens:
A kernel NULL pointer dereference occurred during CephFS operations.
Triggered in libceph, specifically:
get_bvec_at
process_v2_sparse_read
decrypt_tail
populate_in_iter
ceph_con_v2_try_read
ceph_msg_data_advance
The bug occurrs in a kworker thread (kworker/2:0) within the ceph-msgr workqueue.
Kernel flagged supervisor read access at 0x0 → standard NULL dereference.
Updated by Venky Shankar 4 months ago
- Project changed from CephFS to Linux kernel client
- Category changed from Correctness/Safety to fs/ceph
- Assignee set to Viacheslav Dubeyko
- Component(FS) deleted (
libcephfs) - Labels (FS) deleted (
crash)
Slava, can you please take a look.
Updated by Viacheslav Dubeyko 4 months ago
@aterfax Could you please explain the reproducing path more clearly?
As far as I can follow, we need:
(1) Create fscrypt-encrypted directory.
(2) Create file in fscrypt-encrypted directory.
(3) Try to access the file content in fscrypt-encrypted directory.
What is mount options on your side?
How did you created the fscrypt-encrypted directory?
How did you created the file?
Which size this file has?
How did you tried to access the file's content?
Thanks,
Slava.
Updated by Ater Fax 4 months ago
Hi Slava,
I believe your steps may work but I would mention that this may not be a perfect replication or trigger the bug. The data in my storage area was created and working (still does work on kernel 6.5.0-41-generic) as a network filesystem for Docker Swarmed containers on Ubuntu 22.04 with kernel 6.5.0-41-generic (this is still working) and prior for a long time. If the bug itself is some artifact of the files being created on with the earlier Ceph version (17.2.7) with FScrypt on that Ubuntu version, we may not successfully replicate as above.
The mount options are as follows:
username@3f5a8c29-91d6-47f1-ae4e-2db63e1c4429.cephfs-docker-shared=/ /mnt/pve/cephfs-docker-shared/ ceph noatime,_netdev,ms_mode=secure 0 0
The files under this area are largely created by the running docker containers with a side dish of the version controlled directory I use to orchestrate it all. When the issue triggers, it will occur when Docker starts and tries to access data in the area, or when Docker is disabled, I can trigger it myself by attempting to access (as far as I can tell) any file in the encrypted area. In terms of accessing the data or files in question, I'm fairly sure I trigger it with any access attempt, be that using 'cat', 'less' any typical editor or 'ls -al' (although I don't think using 'ls' actually causes a lock up, just shows malformed metadata display).
I don't believe the file size, type or anything along those lines has any affect on the bug triggering - it seems that any access / usage attempt = issues. (Although I can't say I have tried doing anything that special yet, e.g. prodding only at metadata or inodes etc...)
Prior to the machines falling over or stopping working entirely I see the 'df' command start to hang infinitely and files under the encrypted directory will begin to have malformed POSIX metadata, e.g. instead of a user and group it may show "???" etc...
Notably, only the client side is affected - not server side. That is to say I have multiple functional clients happily transacting with the Ceph server even if one on a newer kernel has locked up and died. I'm going to clone some of my VMs and do some exploratory work seeing what does and doesn't seem to trigger the issue. Will report back soonish.
Updated by Viacheslav Dubeyko 4 months ago
I have:
uname -a
Linux ceph-0005 6.17.0-rc6-dirty #46 SMP PREEMPT_DYNAMIC Thu Sep 18 16:56:08 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux
I executed such steps:
sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs
mkdir /mnt/cephfs/fscrypt-test
fscrypt setup
fscrypt setup /mnt/cephfs
dd if=/dev/urandom of=/mnt/cephfs/my.key bs=32 count=1
fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test/
dd if=/dev/urandom of=./00001.bin bs=4096 count=10
fscrypt lock /mnt/cephfs/fscrypt-test/
fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test/
ls lah 2 root root 1 Sep 22 14:32 .
total 40K
drwx-----
drwxr-xr-x 9 root root 22 Sep 22 14:30 ..rw-r--r- 1 root root 40K Sep 22 14:32 00001.bin
I tried ls and cat commands. Both commands works perfectly on my side.
So, we need to find a proper way to reproduce the issue. I assumed that, maybe, unaligned file size can trigger the issue. Maybe, 6.5.0-44 doesn't include last fixes in CephFS kernel client.
Updated by Viacheslav Dubeyko 4 months ago
[ 148.568906] fscrypt: AES-256-XTS using implementation "xts-aes-aesni" [ 148.576166] ------------[ cut here ]------------ [ 148.576185] WARNING: CPU: 1 PID: 36 at net/ceph/messenger_v2.c:869 get_bvec_at+0x98/0xb0 [libceph] [ 148.576224] Modules linked in: ip_vs_rr xt_ipvs ip_vs xt_nat esp4 veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf nft_chain_ nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc ceph libceph fscache netfs overlay ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrac k nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink binfmt_misc nls_iso8859_1 input_leds joydev serio_raw mac_hid qemu_fw_cfg vmge nid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid45 6 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear qxl drm_ttm_helper ttm sha25 6_ssse3 sha1_ssse3 drm_kms_helper aesni_intel crypto_simd cryptd psmouse drm i2c_piix4 pata_acpi floppy [ 148.576285] CPU: 1 PID: 36 Comm: kworker/1:1 Not tainted 6.5.0-44-generic #44~22.04.1-Ubuntu [ 148.576287] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 148.576289] Workqueue: ceph-msgr ceph_con_workfn [libceph] [ 148.576316] RIP: 0010:get_bvec_at+0x98/0xb0 [libceph] [ 148.576343] Code: d8 41 89 44 24 0c 48 8b 45 e8 65 48 2b 04 25 28 00 00 00 75 19 48 83 c4 18 5b 41 5c 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc <0f> 0b eb ab e8 9f 9b 99 f2 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f [ 148.576345] RSP: 0018:ffffab308013bc58 EFLAGS: 00010246 [ 148.576347] RAX: 0000000000000000 RBX: ffff973d49a05378 RCX: ffff973d57bb2920 [ 148.576349] RDX: 0000000000000000 RSI: ffffab308013bca8 RDI: ffff973d49a05378 [ 148.576350] RBP: ffffab308013bc80 R08: 0000000000000f38 R09: 0000000000000000 [ 148.576351] R10: 0000000000000000 R11: 0000000000001000 R12: ffffab308013bca8 [ 148.576352] R13: 0000000000000f38 R14: 00000000000000c8 R15: 0000000000001000 [ 148.576354] FS: 0000000000000000(0000) GS:ffff973e71d00000(0000) knlGS:0000000000000000 [ 148.576355] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 148.576357] CR2: 0000777a5e965d40 CR3: 0000000116514000 CR4: 00000000000006e0 [ 148.576361] Call Trace: [ 148.576380] <TASK> [ 148.576383] ? show_regs+0x6d/0x80 [ 148.576388] ? __warn+0x89/0x160 [ 148.576391] ? get_bvec_at+0x98/0xb0 [libceph] [ 148.576419] ? report_bug+0x17e/0x1b0 [ 148.576424] ? handle_bug+0x46/0x90 [ 148.576427] ? exc_invalid_op+0x18/0x80 [ 148.576429] ? asm_exc_invalid_op+0x1b/0x20 [ 148.576433] ? get_bvec_at+0x98/0xb0 [libceph] [ 148.576458] ? osd_sparse_read+0x1a6/0x290 [libceph] [ 148.576512] process_v2_sparse_read.constprop.0+0xfb/0x220 [libceph] [ 148.576538] decrypt_tail+0x201/0x220 [libceph] [ 148.576564] populate_in_iter+0x592/0x700 [libceph] [ 148.576588] ? do_recvmsg+0x80/0xe0 [libceph] [ 148.576613] ceph_con_v2_try_read+0x49/0x160 [libceph] [ 148.576638] ceph_con_workfn+0x1d9/0x450 [libceph] [ 148.576662] process_one_work+0x240/0x450 [ 148.576665] worker_thread+0x50/0x3f0 [ 148.576668] ? __pfx_worker_thread+0x10/0x10 [ 148.576670] kthread+0xf2/0x120 [ 148.576673] ? __pfx_kthread+0x10/0x10 [ 148.576675] ret_from_fork+0x47/0x70 [ 148.576678] ? __pfx_kthread+0x10/0x10 [ 148.576681] ret_from_fork_asm+0x1b/0x30 [ 148.576684] </TASK> [ 148.576684] ---[ end trace 0000000000000000 ]--- [ 148.576694] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 148.576727] #PF: supervisor read access in kernel mode [ 148.576736] #PF: error_code(0x0000) - not-present page [ 148.576744] PGD 0 P4D 0 [ 148.576750] Oops: 0000 [#1] PREEMPT SMP PTI [ 148.576758] CPU: 1 PID: 36 Comm: kworker/1:1 Tainted: G W 6.5.0-44-generic #44~22.04.1-Ubuntu [ 148.576772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 148.576787] Workqueue: ceph-msgr ceph_con_workfn [libceph] [ 148.576820] RIP: 0010:ceph_msg_data_advance+0x2d/0x380 [libceph] [ 148.576853] Code: 00 00 48 8b 4f 10 48 39 f1 0f 82 f3 02 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 57 08 <8b> 02 83 f8 03 0f 84 a3 02 00 00 0f 87 9c 00 00 00 83 f8 01 0f 84 [ 148.576876] RSP: 0018:ffffab308013bc18 EFLAGS: 00010296 [ 148.576885] RAX: 0000000000000000 RBX: ffff973d49a05378 RCX: 0000000000000000 [ 148.576895] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff973d49a05378 [ 148.576905] RBP: ffffab308013bc48 R08: 0000000000000f38 R09: 0000000000000000 [ 148.576915] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000000000 [ 148.576924] R13: 0000000000000f38 R14: 00000000000000c8 R15: 0000000000001000 [ 148.576934] FS: 0000000000000000(0000) GS:ffff973e71d00000(0000) knlGS:0000000000000000 [ 148.576945] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 148.576954] CR2: 0000000000000000 CR3: 0000000116514000 CR4: 00000000000006e0 [ 148.576967] Call Trace: [ 148.576972] <TASK> [ 148.576977] ? show_regs+0x6d/0x80 [ 148.576985] ? __die+0x24/0x80 [ 148.576992] ? page_fault_oops+0x99/0x1b0 [ 148.577000] ? kernelmode_fixup_or_oops+0xb2/0x140 [ 148.577009] ? __bad_area_nosemaphore+0x1a5/0x2c0 [ 148.577017] ? bad_area_nosemaphore+0x16/0x30 [ 148.577024] ? do_user_addr_fault+0x2c4/0x6b0 [ 148.577032] ? get_bvec_at+0x98/0xb0 [libceph] [ 148.577064] ? exc_page_fault+0x83/0x1b0 [ 148.577073] ? asm_exc_page_fault+0x27/0x30 [ 148.577082] ? ceph_msg_data_advance+0x2d/0x380 [libceph] [ 148.577114] get_bvec_at+0x47/0xb0 [libceph] [ 148.577168] process_v2_sparse_read.constprop.0+0xfb/0x220 [libceph] [ 148.577203] decrypt_tail+0x201/0x220 [libceph] [ 148.577236] populate_in_iter+0x592/0x700 [libceph] [ 148.577269] ? do_recvmsg+0x80/0xe0 [libceph] [ 148.577301] ceph_con_v2_try_read+0x49/0x160 [libceph] [ 148.577333] ceph_con_workfn+0x1d9/0x450 [libceph] [ 148.577365] process_one_work+0x240/0x450 [ 148.577373] worker_thread+0x50/0x3f0 [ 148.577381] ? __pfx_worker_thread+0x10/0x10 [ 148.577389] kthread+0xf2/0x120 [ 148.577711] ? __pfx_kthread+0x10/0x10 [ 148.577970] ret_from_fork+0x47/0x70 [ 148.578267] ? __pfx_kthread+0x10/0x10 [ 148.578533] ret_from_fork_asm+0x1b/0x30 [ 148.578781] </TASK> [ 148.579010] Modules linked in: ip_vs_rr xt_ipvs ip_vs xt_nat esp4 veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf nft_chain_ nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc ceph libceph fscache netfs overlay ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrac k nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink binfmt_misc nls_iso8859_1 input_leds joydev serio_raw mac_hid qemu_fw_cfg vmge nid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid45 6 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear qxl drm_ttm_helper ttm sha25 6_ssse3 sha1_ssse3 drm_kms_helper aesni_intel crypto_simd cryptd psmouse drm i2c_piix4 pata_acpi floppy [ 148.581388] CR2: 0000000000000000 [ 148.581652] ---[ end trace 0000000000000000 ]--- [ 148.581906] RIP: 0010:ceph_msg_data_advance+0x2d/0x380 [libceph] [ 148.582223] Code: 00 00 48 8b 4f 10 48 39 f1 0f 82 f3 02 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 57 08 <8b> 02 83 f8 03 0f 84 a3 02 00 00 0f 87 9c 00 00 00 83 f8 01 0f 84 [ 148.582760] RSP: 0018:ffffab308013bc18 EFLAGS: 00010296 [ 148.583064] RAX: 0000000000000000 RBX: ffff973d49a05378 RCX: 0000000000000000 [ 148.583350] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff973d49a05378 [ 148.583628] RBP: ffffab308013bc48 R08: 0000000000000f38 R09: 0000000000000000 [ 148.583905] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000000000 [ 148.584185] R13: 0000000000000f38 R14: 00000000000000c8 R15: 0000000000001000 [ 148.584456] FS: 0000000000000000(0000) GS:ffff973e71d00000(0000) knlGS:0000000000000000 [ 148.584732] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 148.585011] CR2: 0000000000000000 CR3: 0000000116514000 CR4: 00000000000006e0 [ 148.585303] note: kworker/1:1[36] exited with irqs disabled
As far as I can see, the warning has been triggered at first and we are trying to dereference the NULL pointer then:
static void get_bvec_at(struct ceph_msg_data_cursor *cursor,
struct bio_vec *bv)
{
struct page *page;
size_t off, len;
WARN_ON(!cursor->total_resid); <-- We have warning here, probably, total_resid == 0
/* skip zero-length data items */
while (!cursor->resid)
ceph_msg_data_advance(cursor, 0); <-- We are trying to dereference NULL pointer in ceph_msg_data_advance()
/* get a piece of data, cursor isn't advanced */
page = ceph_msg_data_next(cursor, &off, &len);
bvec_set_page(bv, page, len, off);
}
The struct ceph_msg_data_cursor:
struct ceph_msg_data_cursor {
size_t total_resid; /* across all data items */
struct ceph_msg_data *data; /* current data item */
size_t resid; /* bytes not yet consumed */
int sr_resid; /* residual sparse_read len */
bool need_crc; /* crc update needed */
union {
#ifdef CONFIG_BLOCK
struct ceph_bio_iter bio_iter;
#endif /* CONFIG_BLOCK */
struct bvec_iter bvec_iter;
struct { /* pages */
unsigned int page_offset; /* offset in page */
unsigned short page_index; /* index in array */
unsigned short page_count; /* pages in array */
};
struct { /* pagelist */
struct page *page; /* page from list */
size_t offset; /* bytes from list */
};
struct {
struct iov_iter iov_iter;
unsigned int lastlen;
};
};
};
Probably, cursor->data is NULL because cursor->total_resid == 0. Technically speaking, it is possible to protect code from trying to dereference the NULL pointer. But how do we have such situation? It is not clear yet at all.
Updated by Ater Fax 4 months ago
- Investigating using newer kernel but not FSCrypt decrypting
Had a play about in Ubuntu 22.04 under 6.8.0-83-generic with FScrypt left encrypted, not seeing any issues on the area and can 'df', list metadata and open files as normal.
- Investigating using newer kernel and FSCrypt and decrypting the directory
Before attempting to open any files in the encrypted area, listing metadata seems fine showing the correct user:group etc... but after I attempt to 'cat' a file the 'cat' command locks up and does not return info but can be escaped with Ctrl+C , the 'df' command hangs and cannot be escaped, 'ls' onto files hangs and cannot be escaped. Attempting to sudo to elevate permissions hangs and cannot be escaped.
DMESG shows the same Oops/issue as before but I also see it reporting that 1 mds closed the session and has denied reconnect and the other "hung".
[Mon Sep 22 22:39:44 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds1 closed our session
[Mon Sep 22 22:39:44 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds1 reconnect start
[Mon Sep 22 22:39:44 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds1 reconnect denied
[Mon Sep 22 22:39:57 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds0 hung
Checking the MDS/Mons etc... and Ceph Mds/Mgr/Mon logs don't show either of them hung but I also don't see them deny the reconnect.
I also see 'top' showing the load average zoom up to double the core count (4 core, 8 load avg).
- How the area was created
I don't actually remember the exact commands used to create encrypted directory and the bash history is long gone however I can see:
/etc/fscrypt.conf:
{
"source": "custom_passphrase",
"hash_costs": {
"time": "4",
"memory": "131072",
"parallelism": "2"
},
"options": {
"padding": "32",
"contents": "AES_256_XTS",
"filenames": "AES_256_CTS",
"policy_version": "2"
},
"use_fs_keyring_for_v1_policies": false,
"allow_cross_user_metadata": false
}
/etc/fscrypt.conf
Pretty good chance I followed the arch wiki - https://wiki.archlinux.org/title/Fscrypt
Updated by Ater Fax 4 months ago
Did some more prodding. I created a new file from a node running Ubuntu 22.04 with 6.8.0-83-generic on the area without error. I could also 'cat' that new file without error.
So it seems the files created by the older version/s of the kernel are a component to this bug. Perhaps this will be reproducible if you create a file with kernel 6.5.0-41-generic and then try to open it with 6.5.0-44-generic or higher.
Updated by Viacheslav Dubeyko 4 months ago
I have created the file under Ubuntu 22.04.5 LTS (GNU/Linux 6.5.0-41-generic x86_64). I successfully accessed the file under 6.5.0-41 and 6.17.0-rc6 without any issue. Unfortunately, I didn't reproduce the issue.
Updated by Viacheslav Dubeyko 4 months ago
@aterfax Could you able to enable the debug output and to reproduce the issue again? If you didn't compile the kernel, then you cannot enable the debug output.
Updated by Ater Fax 4 months ago
If you provide some specific instructions on how I can I'm happy to.
Am I needing to compile a new kernel with `scripts/config --enable CEPH_LIB_PRETTYDEBUG` and then edit the node's /etc/ceph/ceph.conf to add the following in the [global] section?
[global]
debug_client = 20
debug_ms = 1Updated by Viacheslav Dubeyko 4 months ago
I have this lines in .config:
CONFIG_CEPH_LIB=y
CONFIG_CEPH_LIB_PRETTYDEBUG=y
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
CONFIG_CEPH_FS=y
CONFIG_CEPH_FSCACHE=y
CONFIG_CEPH_FS_POSIX_ACL=y
CONFIG_CEPH_FS_SECURITY_LABEL=y
You can enable output from CephFS kernel client:
echo 'module ceph +p' > /proc/dynamic_debug/control
File:
echo 'file super.c +p' > /proc/dynamic_debug/control
Function:
echo 'func ceph_submit_write +p' > /proc/dynamic_debug/control
You can enable output from libceph:
echo 'module libceph +p' > /proc/dynamic_debug/control
You can disable output:
echo 'module libceph -p' > /proc/dynamic_debug/control
The most complicated is to select the proper files/functions to debug.
Let's try:
(1) net/ceph/messenger_v2.c
echo 'file net/ceph/messenger_v2.c +p' > /proc/dynamic_debug/control
(2) net/ceph/messenger_v1.c
echo 'net/ceph/messenger_v1.c +p' > /proc/dynamic_debug/control
(3) net/ceph/osd_client.c
echo 'net/ceph/osd_client.c +p' > /proc/dynamic_debug/control
(4) net/ceph/messenger.c
echo 'net/ceph/messenger.c +p' > /proc/dynamic_debug/control
It will already generate a lot of debug output. But I am not sure that we will have enough details to analyze the issue deeper. Probably, we will need to select functions then for more clean analysis.
[1] https://www.kernel.org/doc/html/v4.14/admin-guide/dynamic-debug-howto.html
Updated by Ater Fax 4 months ago
Yep. Certainly quite noisy. Snipped large amounts of what appear to be superfluous lines.
Dumped using
root@node003:/root# uname -a
Linux node003 6.5.0-45-generic #45~22.04.1 SMP PREEMPT_DYNAMIC Wed Sep 24 02:50:54 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
root@node003:/root# echo 'module ceph +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'file super.c +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'func ceph_submit_write +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'module libceph +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'module libceph -p' > /proc/dynamic_debug/control
root@node003:/root# echo 'file net/ceph/messenger_v2.c +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'net/ceph/messenger_v1.c +p' > /proc/dynamic_debug/control
bash: echo: write error: Invalid argument
root@node003:/root# echo 'net/ceph/osd_client.c +p' > /proc/dynamic_debug/control
bash: echo: write error: Invalid argument
root@node003:/root# echo 'net/ceph/messenger.c +p' > /proc/dynamic_debug/control
bash: echo: write error: Invalid argument
Decrypt started at Sep 24 03:37:20 and cat onto file '.area-decrypted' and call trace at Sep 24 03:37:20
See attached log.
Please me know if anything should be redacted as I do see some info leakage in terms of paths / file names.
Updated by Viacheslav Dubeyko 4 months ago
As far as I can see, we have several instances of decrypt_tail() call:
Sep 24 03:37:05 node003 kernel: libceph: messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 00000000d0c44c26 enc_page_cnt 1 sg_cnt 3 Sep 24 03:37:05 node003 kernel: libceph: messenger.c:1415 : ===== 00000000d0c44c26 17 from mds0 26=client_reply len 2526+0+0 (0 0 0) ===== Sep 24 03:37:05 node003 kernel: ceph: mds_client.c:3760 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] handle_reply 000000000a785d65
Sep 24 03:37:05 node003 kernel: libceph: messenger_v2:1158 : decrypt_tail con 000000003af9e106 msg 00000000d20938e1 enc_page_cnt 1 sg_cnt 3 Sep 24 03:37:05 node003 kernel: libceph: messenger.c:1415 : ===== 00000000d20938e1 21 from mds1 25=client_request_forward len 9+0+0 (0 0 0) ===== Sep 24 03:37:05 node003 kernel: ceph: mds_client.c:3989 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] forward tid 13 to mds0 (we resend)
Sep 24 03:37:05 node003 kernel: libceph: messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 00000000d20938e1 enc_page_cnt 1 sg_cnt 3 Sep 24 03:37:05 node003 kernel: libceph: messenger.c:1415 : ===== 00000000d20938e1 18 from mds0 26=client_reply len 1092+0+0 (0 0 0) ===== Sep 24 03:37:05 node003 kernel: ceph: mds_client.c:3760 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] handle_reply 000000000a785d65
Sep 24 03:37:17 node003 kernel: libceph: messenger_v2:1158 : decrypt_tail con 000000003af9e106 msg 00000000df4cdbac enc_page_cnt 1 sg_cnt 3 Sep 24 03:37:17 node003 kernel: libceph: messenger.c:1415 : ===== 00000000df4cdbac 22 from mds1 22=client_session len 28+0+0 (0 0 0) ===== Sep 24 03:37:17 node003 kernel: ceph: mds_client.c:4111 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds1 renewcaps 00000000f68d2503 state open seq 18 Sep 24 03:37:17 node003 kernel: ceph: mds_client.c:2083 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds1 ttl now 4295015512, was fresh, now stale
Sep 24 03:37:17 node003 kernel: libceph: messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 000000009d61b09c enc_page_cnt 1 sg_cnt 3 Sep 24 03:37:17 node003 kernel: libceph: messenger.c:1415 : ===== 000000009d61b09c 19 from mds0 22=client_session len 28+0+0 (0 0 0) ===== Sep 24 03:37:17 node003 kernel: ceph: mds_client.c:4111 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds0 renewcaps 000000009951cbe1 state open seq 6 Sep 24 03:37:17 node003 kernel: ceph: mds_client.c:2083 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds0 ttl now 4295015512, was fresh, now stale
Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 000000009d61b09c enc_page_cnt 1 sg_cnt 3 Sep 24 03:37:20 node003 kernel: libceph: messenger.c:1415 : ===== 000000009d61b09c 20 from mds0 26=client_reply len 2526+0+0 (0 0 0) ===== Sep 24 03:37:20 node003 kernel: ceph: mds_client.c:3760 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] handle_reply 0000000053b59cce
Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1158 : decrypt_tail con 00000000cba01a13 msg 00000000ee65444e enc_page_cnt 2 sg_cnt 6 Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5780 : prep_next_sparse_read: [0] starting new sparse read req Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5819 : prep_next_sparse_read: [0] new sparse read op at idx 0 0x0~0x1000 Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 000000009ad0bb93 Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5878 : [0] got 1 extents Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 10 buf 000000008b76aea3 Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 00000000f62fddb2 Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5923 : [0] ext 0 off 0x0 len 0x1000 Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 1000 buf 0000000000000000 Sep 24 03:37:20 node003 kernel: ------------[ cut here ]------------ Sep 24 03:37:20 node003 kernel: WARNING: CPU: 11 PID: 1376 at net/ceph/messenger_v2.c:869 get_bvec_at+0x98/0xb0 [libceph]
However, only latest one works in different way. In all previous cases, we have enc_page_cnt 1 sg_cnt 3, but latest one has enc_page_cnt 2 sg_cnt 6.
Updated by Viacheslav Dubeyko 4 months ago
In the case of issue, we have such sequence of actions:
(1) Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5923 : [0] ext 0 off 0x0 len 0x1000 [1]
static int osd_sparse_read(struct ceph_connection *con,
struct ceph_msg_data_cursor *cursor,
char **pbuf)
{
<skipped>
case CEPH_SPARSE_READ_DATA:
if (sr->sr_index >= count) {
sr->sr_state = CEPH_SPARSE_READ_HDR;
goto next_op;
}
eoff = sr->sr_extent[sr->sr_index].off;
elen = sr->sr_extent[sr->sr_index].len;
dout("[%d] ext %d off 0x%llx len 0x%llx\n",
o->o_osd, sr->sr_index, eoff, elen);
if (elen > INT_MAX) {
dout("Sparse read extent length too long (0x%llx)\n",
elen);
return -EREMOTEIO;
}
/* zero out anything from sr_pos to start of extent */
if (sr->sr_pos < eoff)
advance_cursor(cursor, eoff - sr->sr_pos, true);
/* Set position to end of extent */
sr->sr_pos = eoff + elen;
/* send back the new length and nullify the ptr */
cursor->sr_resid = elen;
ret = elen;
*pbuf = NULL;
/* Bump the array index */
++sr->sr_index;
break;
}
return ret;
}
(2) Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 1000 buf 0000000000000000 [2]
static int process_v2_sparse_read(struct ceph_connection *con,
struct page **pages, int spos)
{
struct ceph_msg_data_cursor *cursor = &con->v2.in_cursor;
int ret;
for (;;) {
char *buf = NULL;
ret = con->ops->sparse_read(con, cursor, &buf);
if (ret <= 0)
return ret;
dout("%s: sparse_read return %x buf %p\n", __func__, ret, buf);
do {
int idx = spos >> PAGE_SHIFT;
int soff = offset_in_page(spos);
struct page *spage = con->v2.in_enc_pages[idx];
int len = min_t(int, ret, PAGE_SIZE - soff);
if (buf) {
memcpy_from_page(buf, spage, soff, len);
buf += len;
} else {
struct bio_vec bv;
get_bvec_at(cursor, &bv);
len = min_t(int, len, bv.bv_len);
memcpy_page(bv.bv_page, bv.bv_offset,
spage, soff, len);
ceph_msg_data_advance(cursor, len);
}
spos += len;
ret -= len;
} while (ret);
}
}
So, it's clear that osd_sparse_read() returned ret 1000 and buf 1000. This is why we are going to call get_bvec_at().
However, prep_next_sparse_read() has been prepared the read for 0x0~0x1000:
Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5819 : prep_next_sparse_read: [0] new sparse read op at idx 0 0x0~0x1000
And we had three process_v2_sparse_read() before the failed one:
Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 000000009ad0bb93 Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5878 : [0] got 1 extents Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 10 buf 000000008b76aea3 Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 00000000f62fddb2 Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5923 : [0] ext 0 off 0x0 len 0x1000 Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 1000 buf 0000000000000000
Potentially, other process_v2_sparse_read() could be called in other threads. Otherwise, the whole prepared sparsed read has been processed in these three process_v2_sparse_read() before problematic one. Also, osd_sparse_read() could did something wrong in preparing sparse read.
[1] https://elixir.bootlin.com/linux/v6.17-rc7/source/net/ceph/osd_client.c#L5871
[2] https://elixir.bootlin.com/linux/v6.17-rc7/source/net/ceph/messenger_v2.c#L1100
Updated by Viacheslav Dubeyko 4 months ago
I think I need to spend some time for debugging the situation on my side. I need to build more clear understanding how code is working before making any conclusions about the issue nature.
Updated by Viacheslav Dubeyko 4 months ago
Ater Fax wrote in #note-13:
Please me know if anything should be redacted as I do see some info leakage in terms of paths / file names.
If you believe that some output in the log represent a security leakage, then, I think you need to delete the attachment, correct it, and upload corrected file again.
It's hard for me to make the conclusion what could represent such security leakage. File names look pretty neutral to me.
Updated by Ater Fax 4 months ago
Aye, I had a quick skim through last night and there wasn't anything that I saw that worried me.
Please let me know if you want me to do some further debugging / prodding of the affected files. Happy to help!
For now it sounds like my easiest path around the issue is likely to be mirroring the FSCrypted (decrypted) area to a different filesystem, then deleting and re-staging everything back into the CephFS FScrypted area on the newer kernel/s. I'll start the work for that, but retain at least a subset of files untouched so we have something to keep triggering the bug.
Thanks for the effort!
Updated by Ater Fax 4 months ago
Quick update on my attempt to re-stage all files onto a completely new separate CephFS filesystem this weekend:
- Using Ubuntu 22.04 with 6.8.0-84-generic (node003).
- A totally new Ceph FS file system.
- Initializing a new FSCrypt area inside the new CephFS area with defaults.
- rsync copying the existing data across from a running node (Ubuntu 22.04 - 6.5.0-41-generic) with the existing data area
I am seeing the same error occurring when attempting to read certain files. e.g. one was an AdGuard "querylog.json".
I am none the wiser what about my setup might be causing this.
Updated by Ater Fax 4 months ago
- File area_decrypted.tar area_decrypted.tar added
As far as an MVP goes for reproducing this in the new area and a new FSCrypt within it goes, the attached hidden file I use just to indicate the area is decrypted was able to consistently reproduce the error (at least on my nodes).
tl;dr on making it happen
- Have Ubuntu 22.04 with 6.8.0-84-generic (or anything newer than 6.5.0-41-generic I bet).
- Initialize a new FSCrypt area inside the new CephFS area with default settings.
- Copy the file into the unlocked FSCrypt directory.
- Lock the FSCrypt directory.
- Unlock the FSCrypt directory.
- 'cat' the file while observing 'journalctl -k -f'
Same Ceph versions on the servers as detailed in the first report post.
Name: .area-decrypted
Size: 44 bytes
MD5: 6b5a13afab6ee4dc081a456c3f67ed0f
Name: area_decrypted.tar
Size: 10240 bytes : 10 KiB
MD5: 8bd563f222b844c44c3ae832a704907b
Updated by Viacheslav Dubeyko 4 months ago
Thanks a lot for sharing the file. I will try to reproduce the issue with it. Sorry, I was busy last two business days this testing snapshot related issue patchset. I should have time and Ceph cluster available today/tomorrow for debugging the issue.
Updated by Ater Fax 4 months ago
No worries! To be frank, I am a bit gobsmacked that the issue was replicated after mirroring the data store, let alone with that particular file. I'm really surprised the issue was preserved and communicable given I was rsyncing via SSH between nodes.
More so given it was created with a "touch" and "echo 'If you can read this the area is decrypted.' > .area-decrypted" originally. It is far from being special as far as I know and files go, yet crash things it still does.
Updated by Viacheslav Dubeyko 4 months ago
- Status changed from New to In Progress
- % Done changed from 0 to 20
I has been reproduced the issue. I missed the really important mount option initially (ms_mode=secure) and this is why I wasn't been able to reproduce the issue. The client on my side tried to use messenger v1 protocol instead of v2. Finally, I was able to reproduce the issue with such mount options:
sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure
The reproducing path:
(1) mkdir /mnt/cephfs/fscrypt-test-3
(2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3
(3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3
(4) fscrypt lock /mnt/cephfs/fscrypt-test-3
(5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3
(6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar
(7) Issue has been triggered
And I have such (pretty interesting) call trace:
[ 408.072247] ------------[ cut here ]------------
[ 408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865 ceph_con_v2_try_read+0x4b39/0x72f0
[ 408.072267] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[ 408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+ #1 PREEMPT(voluntary)
[ 408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 408.072310] Workqueue: ceph-msgr ceph_con_workfn
[ 408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0
[ 408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8 8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06 fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85
[ 408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246
[ 408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38
[ 408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8
[ 408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8
[ 408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000
[ 408.072329] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000
[ 408.072331] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[ 408.072336] PKRU: 55555554
[ 408.072337] Call Trace:
[ 408.072338] <TASK>
[ 408.072340] ? sched_clock_noinstr+0x9/0x10
[ 408.072344] ? __pfx_ceph_con_v2_try_read+0x10/0x10
[ 408.072347] ? _raw_spin_unlock+0xe/0x40
[ 408.072349] ? finish_task_switch.isra.0+0x15d/0x830
[ 408.072353] ? __kasan_check_write+0x14/0x30
[ 408.072357] ? mutex_lock+0x84/0xe0
[ 408.072359] ? __pfx_mutex_lock+0x10/0x10
[ 408.072361] ceph_con_workfn+0x27e/0x10e0
[ 408.072364] ? metric_delayed_work+0x311/0x2c50
[ 408.072367] process_one_work+0x611/0xe20
[ 408.072371] ? __kasan_check_write+0x14/0x30
[ 408.072373] worker_thread+0x7e3/0x1580
[ 408.072375] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 408.072378] ? __pfx_worker_thread+0x10/0x10
[ 408.072381] kthread+0x381/0x7a0
[ 408.072383] ? __pfx__raw_spin_lock_irq+0x10/0x10
[ 408.072385] ? __pfx_kthread+0x10/0x10
[ 408.072387] ? __kasan_check_write+0x14/0x30
[ 408.072389] ? recalc_sigpending+0x160/0x220
[ 408.072392] ? _raw_spin_unlock_irq+0xe/0x50
[ 408.072394] ? calculate_sigpending+0x78/0xb0
[ 408.072395] ? __pfx_kthread+0x10/0x10
[ 408.072397] ret_from_fork+0x2b6/0x380
[ 408.072400] ? __pfx_kthread+0x10/0x10
[ 408.072402] ret_from_fork_asm+0x1a/0x30
[ 408.072406] </TASK>
[ 408.072407] ---[ end trace 0000000000000000 ]---
[ 408.072418] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
[ 408.072984] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G W 6.17.0-rc7+ #1 PREEMPT(voluntary)
[ 408.073886] Tainted: [W]=WARN
[ 408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 408.074468] Workqueue: ceph-msgr ceph_con_workfn
[ 408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[ 408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[ 408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[ 408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[ 408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[ 408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[ 408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[ 408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[ 408.079159] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000
[ 408.079736] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[ 408.080376] PKRU: 55555554
[ 408.080513] Call Trace:
[ 408.080630] <TASK>
[ 408.080729] ceph_con_v2_try_read+0x49b9/0x72f0
[ 408.081115] ? __pfx_ceph_con_v2_try_read+0x10/0x10
[ 408.081348] ? _raw_spin_unlock+0xe/0x40
[ 408.081538] ? finish_task_switch.isra.0+0x15d/0x830
[ 408.081768] ? __kasan_check_write+0x14/0x30
[ 408.081986] ? mutex_lock+0x84/0xe0
[ 408.082160] ? __pfx_mutex_lock+0x10/0x10
[ 408.082343] ceph_con_workfn+0x27e/0x10e0
[ 408.082529] ? metric_delayed_work+0x311/0x2c50
[ 408.082737] process_one_work+0x611/0xe20
[ 408.082948] ? __kasan_check_write+0x14/0x30
[ 408.083156] worker_thread+0x7e3/0x1580
[ 408.083331] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 408.083557] ? __pfx_worker_thread+0x10/0x10
[ 408.083751] kthread+0x381/0x7a0
[ 408.083922] ? __pfx__raw_spin_lock_irq+0x10/0x10
[ 408.084139] ? __pfx_kthread+0x10/0x10
[ 408.084310] ? __kasan_check_write+0x14/0x30
[ 408.084510] ? recalc_sigpending+0x160/0x220
[ 408.084708] ? _raw_spin_unlock_irq+0xe/0x50
[ 408.084917] ? calculate_sigpending+0x78/0xb0
[ 408.085138] ? __pfx_kthread+0x10/0x10
[ 408.085335] ret_from_fork+0x2b6/0x380
[ 408.085525] ? __pfx_kthread+0x10/0x10
[ 408.085720] ret_from_fork_asm+0x1a/0x30
[ 408.085922] </TASK>
[ 408.086036] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[ 408.087778] ---[ end trace 0000000000000000 ]---
[ 408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[ 408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[ 408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[ 408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[ 408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[ 408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[ 408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[ 408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[ 408.091035] FS: 0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000
[ 408.091452] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[ 408.092530] PKRU: 55555554
[ 417.112915] ==================================================================
[ 417.113491] BUG: KASAN: slab-use-after-free in __mutex_lock.constprop.0+0x1522/0x1610
[ 417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951
[ 417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G D W 6.17.0-rc7+ #1 PREEMPT(voluntary)
[ 417.114592] Tainted: [D]=DIE, [W]=WARN
[ 417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[ 417.114596] Workqueue: events handle_timeout
[ 417.114601] Call Trace:
[ 417.114602] <TASK>
[ 417.114604] dump_stack_lvl+0x5c/0x90
[ 417.114610] print_report+0x171/0x4dc
[ 417.114613] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 417.114617] ? kasan_complete_mode_report_info+0x80/0x220
[ 417.114621] kasan_report+0xbd/0x100
[ 417.114625] ? __mutex_lock.constprop.0+0x1522/0x1610
[ 417.114628] ? __mutex_lock.constprop.0+0x1522/0x1610
[ 417.114630] __asan_report_load4_noabort+0x14/0x30
[ 417.114633] __mutex_lock.constprop.0+0x1522/0x1610
[ 417.114635] ? queue_con_delay+0x8d/0x200
[ 417.114638] ? __pfx___mutex_lock.constprop.0+0x10/0x10
[ 417.114641] ? __send_subscribe+0x529/0xb20
[ 417.114644] __mutex_lock_slowpath+0x13/0x20
[ 417.114646] mutex_lock+0xd4/0xe0
[ 417.114649] ? __pfx_mutex_lock+0x10/0x10
[ 417.114652] ? ceph_monc_renew_subs+0x2a/0x40
[ 417.114654] ceph_con_keepalive+0x22/0x110
[ 417.114656] handle_timeout+0x6b3/0x11d0
[ 417.114659] ? _raw_spin_unlock_irq+0xe/0x50
[ 417.114662] ? __pfx_handle_timeout+0x10/0x10
[ 417.114664] ? queue_delayed_work_on+0x8e/0xa0
[ 417.114669] process_one_work+0x611/0xe20
[ 417.114672] ? __kasan_check_write+0x14/0x30
[ 417.114676] worker_thread+0x7e3/0x1580
[ 417.114678] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 417.114682] ? __pfx_sched_setscheduler_nocheck+0x10/0x10
[ 417.114687] ? __pfx_worker_thread+0x10/0x10
[ 417.114689] kthread+0x381/0x7a0
[ 417.114692] ? __pfx__raw_spin_lock_irq+0x10/0x10
[ 417.114694] ? __pfx_kthread+0x10/0x10
[ 417.114697] ? __kasan_check_write+0x14/0x30
[ 417.114699] ? recalc_sigpending+0x160/0x220
[ 417.114703] ? _raw_spin_unlock_irq+0xe/0x50
[ 417.114705] ? calculate_sigpending+0x78/0xb0
[ 417.114707] ? __pfx_kthread+0x10/0x10
[ 417.114710] ret_from_fork+0x2b6/0x380
[ 417.114713] ? __pfx_kthread+0x10/0x10
[ 417.114715] ret_from_fork_asm+0x1a/0x30
[ 417.114720] </TASK>
[ 417.125171] Allocated by task 2:
[ 417.125333] kasan_save_stack+0x26/0x60
[ 417.125522] kasan_save_track+0x14/0x40
[ 417.125742] kasan_save_alloc_info+0x39/0x60
[ 417.125945] __kasan_slab_alloc+0x8b/0xb0
[ 417.126133] kmem_cache_alloc_node_noprof+0x13b/0x460
[ 417.126381] copy_process+0x320/0x6250
[ 417.126595] kernel_clone+0xb7/0x840
[ 417.126792] kernel_thread+0xd6/0x120
[ 417.126995] kthreadd+0x85c/0xbe0
[ 417.127176] ret_from_fork+0x2b6/0x380
[ 417.127378] ret_from_fork_asm+0x1a/0x30
[ 417.127692] Freed by task 0:
[ 417.127851] kasan_save_stack+0x26/0x60
[ 417.128057] kasan_save_track+0x14/0x40
[ 417.128267] kasan_save_free_info+0x3b/0x60
[ 417.128491] __kasan_slab_free+0x6c/0xa0
[ 417.128708] kmem_cache_free+0x182/0x550
[ 417.128906] free_task+0xeb/0x140
[ 417.129070] __put_task_struct+0x1d2/0x4f0
[ 417.129259] __put_task_struct_rcu_cb+0x15/0x20
[ 417.129480] rcu_do_batch+0x3d3/0xe70
[ 417.129681] rcu_core+0x549/0xb30
[ 417.129839] rcu_core_si+0xe/0x20
[ 417.130005] handle_softirqs+0x160/0x570
[ 417.130190] __irq_exit_rcu+0x189/0x1e0
[ 417.130369] irq_exit_rcu+0xe/0x20
[ 417.130531] sysvec_apic_timer_interrupt+0x9f/0xd0
[ 417.130768] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 417.131082] Last potentially related work creation:
[ 417.131305] kasan_save_stack+0x26/0x60
[ 417.131484] kasan_record_aux_stack+0xae/0xd0
[ 417.131695] __call_rcu_common+0xcd/0x14b0
[ 417.131909] call_rcu+0x31/0x50
[ 417.132071] delayed_put_task_struct+0x128/0x190
[ 417.132295] rcu_do_batch+0x3d3/0xe70
[ 417.132478] rcu_core+0x549/0xb30
[ 417.132658] rcu_core_si+0xe/0x20
[ 417.132808] handle_softirqs+0x160/0x570
[ 417.132993] __irq_exit_rcu+0x189/0x1e0
[ 417.133181] irq_exit_rcu+0xe/0x20
[ 417.133353] sysvec_apic_timer_interrupt+0x9f/0xd0
[ 417.133584] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 417.133921] Second to last potentially related work creation:
[ 417.134183] kasan_save_stack+0x26/0x60
[ 417.134362] kasan_record_aux_stack+0xae/0xd0
[ 417.134566] __call_rcu_common+0xcd/0x14b0
[ 417.134782] call_rcu+0x31/0x50
[ 417.134929] put_task_struct_rcu_user+0x58/0xb0
[ 417.135143] finish_task_switch.isra.0+0x5d3/0x830
[ 417.135366] __schedule+0xd30/0x5100
[ 417.135534] schedule_idle+0x5a/0x90
[ 417.135712] do_idle+0x25f/0x410
[ 417.135871] cpu_startup_entry+0x53/0x70
[ 417.136053] start_secondary+0x216/0x2c0
[ 417.136233] common_startup_64+0x13e/0x141
[ 417.136894] The buggy address belongs to the object at ffff888124870000
which belongs to the cache task_struct of size 10504
[ 417.138122] The buggy address is located 52 bytes inside of
freed 10504-byte region [ffff888124870000, ffff888124872908)
[ 417.139465] The buggy address belongs to the physical page:
[ 417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x124870
[ 417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 417.141519] memcg:ffff88811aa20e01
[ 417.141874] anon flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ 417.142600] page_type: f5(slab)
[ 417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001
[ 417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01
[ 417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001
[ 417.144329] head: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01
[ 417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff 00000000ffffffff
[ 417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[ 417.145485] page dumped because: kasan: bad access detected
[ 417.145859] Memory state around the buggy address:
[ 417.146094] ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 417.146439] ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 417.147145] ^
[ 417.147387] ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 417.147751] ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 417.148123] ==================================================================
Updated by Viacheslav Dubeyko 4 months ago
First of all, we have warning:
Oct 1 13:07:57 ceph-0006 kernel: [ 67.092868] ------------[ cut here ]------------ Oct 1 13:07:57 ceph-0006 kernel: [ 67.092871] WARNING: CPU: 3 PID: 234 at net/ceph/messenger_v2.c:865 get_bvec_at+0x1ca/0x230 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092886] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds floppy psmouse mac_hid i2c_piix4 vga16fb i2c_smbus pata_acpi serio_raw qemu_fw_cfg bochs vgastate sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore Oct 1 13:07:57 ceph-0006 kernel: [ 67.092943] CPU: 3 UID: 0 PID: 234 Comm: kworker/3:2 Not tainted 6.17.0-rc7+ #6 PREEMPT(voluntary) Oct 1 13:07:57 ceph-0006 kernel: [ 67.092948] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092951] Workqueue: ceph-msgr ceph_con_workfn Oct 1 13:07:57 ceph-0006 kernel: [ 67.092958] RIP: 0010:get_bvec_at+0x1ca/0x230 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092961] Code: 65 48 2b 05 d0 a9 bc 02 75 77 48 81 c4 88 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b e9 c6 fe ff ff e8 6a e9 2d fd eb a4 e8 63 e9 2d fd e9 6c ff Oct 1 13:07:57 ceph-0006 kernel: [ 67.092965] RSP: 0018:ffff888102f67920 EFLAGS: 00010246 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092968] RAX: 1ffff11029f3b86f RBX: ffff88814f9dc378 RCX: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092971] RDX: dffffc0000000000 RSI: ffff888102f67ba0 RDI: ffff88814f9dc378 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092973] RBP: ffff888102f679d0 R08: 0000000000000000 R09: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092975] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000003000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092978] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc030 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092980] FS: 0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092985] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092990] PKRU: 55555554 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092992] Call Trace: Oct 1 13:07:57 ceph-0006 kernel: [ 67.092994] <TASK> Oct 1 13:07:57 ceph-0006 kernel: [ 67.092995] ? __pfx__printk+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093002] ? __pfx_get_bvec_at+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093006] ceph_con_v2_try_read.cold+0x19d5/0x1ac7 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093011] ? update_entity_lag+0x49/0x190 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093017] ? __pfx_ceph_con_v2_try_read+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093021] ? finish_task_switch.isra.0+0x15d/0x830 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093025] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093030] ? mutex_lock+0x84/0xe0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093034] ? __pfx_mutex_lock+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093037] ceph_con_workfn+0x27e/0x10e0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093041] process_one_work+0x611/0xe20 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093045] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093049] worker_thread+0x7e3/0x1580 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093053] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093058] ? __pfx_worker_thread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093061] kthread+0x381/0x7a0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093065] ? __pfx__raw_spin_lock_irq+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093067] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093070] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093073] ? recalc_sigpending+0x160/0x220 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093077] ? _raw_spin_unlock_irq+0xe/0x50 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093080] ? calculate_sigpending+0x78/0xb0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093082] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093085] ret_from_fork+0x2b6/0x380 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093089] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093105] ret_from_fork_asm+0x1a/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093111] </TASK> Oct 1 13:07:57 ceph-0006 kernel: [ 67.093112] ---[ end trace 0000000000000000 ]---
static void get_bvec_at(struct ceph_msg_data_cursor *cursor,
struct bio_vec *bv)
{
struct page *page;
size_t off, len;
WARN_ON(!cursor->total_resid); <-- We have warning here!
/* skip zero-length data items */
while (!cursor->resid)
ceph_msg_data_advance(cursor, 0);
/* get a piece of data, cursor isn't advanced */
page = ceph_msg_data_next(cursor, &off, &len);
bvec_set_page(bv, page, len, off);
}
And the warning has been triggered because: cursor->total_resid has not been initialized. As far as I can see, cursor->total_resid should be initialized in ceph_msg_data_cursor_init():
void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor,
struct ceph_msg *msg, size_t length)
{
BUG_ON(!length);
BUG_ON(length > msg->data_length);
BUG_ON(!msg->num_data_items);
cursor->total_resid = length;
cursor->data = msg->data;
cursor->sr_resid = 0;
__ceph_msg_data_cursor_init(cursor);
}
And it is possible to see that it is usually called:
Oct 1 13:07:52 ceph-0006 kernel: [ 61.313466] libceph: pid 234:net/ceph/messenger_v2.c:1222 decrypt_tail(): data_length 0, data 0000000000000000, num_data_items 0, max_data_items 0, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 0, front_alloc_len 805 Oct 1 13:07:52 ceph-0006 kernel: [ 61.321089] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting... Oct 1 13:07:52 ceph-0006 kernel: [ 61.322694] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting... Oct 1 13:07:52 ceph-0006 kernel: [ 61.323200] libceph: pid 234:net/ceph/messenger_v2.c:1033 setup_message_sgs(): dlen 124 Oct 1 13:07:52 ceph-0006 kernel: [ 61.323691] libceph: pid 234:net/ceph/messenger.c:1079 ceph_msg_data_cursor_init(): length 124, msg->data_length 124, msg->num_data_items 1 Oct 1 13:07:52 ceph-0006 kernel: [ 61.324500] libceph: pid 234:net/ceph/messenger.c:1040 __ceph_msg_data_cursor_init(): cursor->total_resid 124, cursor->data->type 0x5
However, we haven't such call for the case of issue:
Oct 1 13:07:57 ceph-0006 kernel: [ 67.056014] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting... Oct 1 13:07:57 ceph-0006 kernel: [ 67.058569] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting... Oct 1 13:07:57 ceph-0006 kernel: [ 67.060054] libceph: pid 234:net/ceph/messenger_v2.c:1222 decrypt_tail(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570 Oct 1 13:07:57 ceph-0006 kernel: [ 67.062329] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.063731] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.064254] libceph: pid 234:net/ceph/osd_client.c:5813 osd_sparse_read(): CEPH_SPARSE_READ_HDR Oct 1 13:07:57 ceph-0006 kernel: [ 67.065655] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.068180] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570 Oct 1 13:07:57 ceph-0006 kernel: [ 67.071938] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 4 buf 000000004fc26be8 Oct 1 13:07:57 ceph-0006 kernel: [ 67.072806] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.074821] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x1 Oct 1 13:07:57 ceph-0006 kernel: [ 67.075553] libceph: pid 234:net/ceph/osd_client.c:5831 osd_sparse_read(): CEPH_SPARSE_READ_EXTENTS: count 1 Oct 1 13:07:57 ceph-0006 kernel: [ 67.076282] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.077478] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570 Oct 1 13:07:57 ceph-0006 kernel: [ 67.079144] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 10 buf 0000000086bcece0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.080020] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.081020] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x2 Oct 1 13:07:57 ceph-0006 kernel: [ 67.081733] libceph: pid 234:net/ceph/osd_client.c:5862 osd_sparse_read(): CEPH_SPARSE_READ_DATA_LEN: sr->sr_datalen 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.082605] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.083612] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570 Oct 1 13:07:57 ceph-0006 kernel: [ 67.085178] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 4 buf 000000005d398447 Oct 1 13:07:57 ceph-0006 kernel: [ 67.086028] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.087009] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x3 Oct 1 13:07:57 ceph-0006 kernel: [ 67.087662] libceph: pid 234:net/ceph/osd_client.c:5871 osd_sparse_read(): CEPH_SPARSE_READ_DATA_PRE: sr->sr_datalen 12288 Oct 1 13:07:57 ceph-0006 kernel: [ 67.088625] libceph: pid 234:net/ceph/osd_client.c:5893 osd_sparse_read(): CEPH_SPARSE_READ_DATA: ext 0 off 0x0 len 0x3000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.089516] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 12288 Oct 1 13:07:57 ceph-0006 kernel: [ 67.090606] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570 Oct 1 13:07:57 ceph-0006 kernel: [ 67.091797] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 3000 buf 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092868] ------------[ cut here ]------------ Oct 1 13:07:57 ceph-0006 kernel: [ 67.092871] WARNING: CPU: 3 PID: 234 at net/ceph/messenger_v2.c:865 get_bvec_at+0x1ca/0x230 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092886] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds floppy psmouse mac_hid i2c_piix4 vga16fb i2c_smbus pata_acpi serio_raw qemu_fw_cfg bochs vgastate sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore Oct 1 13:07:57 ceph-0006 kernel: [ 67.092943] CPU: 3 UID: 0 PID: 234 Comm: kworker/3:2 Not tainted 6.17.0-rc7+ #6 PREEMPT(voluntary) Oct 1 13:07:57 ceph-0006 kernel: [ 67.092948] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092951] Workqueue: ceph-msgr ceph_con_workfn Oct 1 13:07:57 ceph-0006 kernel: [ 67.092958] RIP: 0010:get_bvec_at+0x1ca/0x230 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092961] Code: 65 48 2b 05 d0 a9 bc 02 75 77 48 81 c4 88 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b e9 c6 fe ff ff e8 6a e9 2d fd eb a4 e8 63 e9 2d fd e9 6c ff Oct 1 13:07:57 ceph-0006 kernel: [ 67.092965] RSP: 0018:ffff888102f67920 EFLAGS: 00010246 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092968] RAX: 1ffff11029f3b86f RBX: ffff88814f9dc378 RCX: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092971] RDX: dffffc0000000000 RSI: ffff888102f67ba0 RDI: ffff88814f9dc378 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092973] RBP: ffff888102f679d0 R08: 0000000000000000 R09: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092975] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000003000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092978] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc030 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092980] FS: 0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092983] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092985] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092990] PKRU: 55555554 Oct 1 13:07:57 ceph-0006 kernel: [ 67.092992] Call Trace: Oct 1 13:07:57 ceph-0006 kernel: [ 67.092994] <TASK> Oct 1 13:07:57 ceph-0006 kernel: [ 67.092995] ? __pfx__printk+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093002] ? __pfx_get_bvec_at+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093006] ceph_con_v2_try_read.cold+0x19d5/0x1ac7 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093011] ? update_entity_lag+0x49/0x190 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093017] ? __pfx_ceph_con_v2_try_read+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093021] ? finish_task_switch.isra.0+0x15d/0x830 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093025] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093030] ? mutex_lock+0x84/0xe0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093034] ? __pfx_mutex_lock+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093037] ceph_con_workfn+0x27e/0x10e0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093041] process_one_work+0x611/0xe20 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093045] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093049] worker_thread+0x7e3/0x1580 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093053] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093058] ? __pfx_worker_thread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093061] kthread+0x381/0x7a0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093065] ? __pfx__raw_spin_lock_irq+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093067] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093070] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093073] ? recalc_sigpending+0x160/0x220 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093077] ? _raw_spin_unlock_irq+0xe/0x50 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093080] ? calculate_sigpending+0x78/0xb0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093082] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093085] ret_from_fork+0x2b6/0x380 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093089] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093105] ret_from_fork_asm+0x1a/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.093111] </TASK> Oct 1 13:07:57 ceph-0006 kernel: [ 67.093112] ---[ end trace 0000000000000000 ]--- Oct 1 13:07:57 ceph-0006 kernel: [ 67.093128] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI Oct 1 13:07:57 ceph-0006 kernel: [ 67.094240] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] Oct 1 13:07:57 ceph-0006 kernel: [ 67.094677] CPU: 3 UID: 0 PID: 234 Comm: kworker/3:2 Tainted: G W 6.17.0-rc7+ #6 PREEMPT(voluntary) Oct 1 13:07:57 ceph-0006 kernel: [ 67.095460] Tainted: [W]=WARN Oct 1 13:07:57 ceph-0006 kernel: [ 67.095664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 Oct 1 13:07:57 ceph-0006 kernel: [ 67.096625] Workqueue: ceph-msgr ceph_con_workfn Oct 1 13:07:57 ceph-0006 kernel: [ 67.097354] RIP: 0010:ceph_msg_data_advance+0x79/0x1a40 Oct 1 13:07:57 ceph-0006 kernel: [ 67.098337] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 d6 10 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e de 0d 00 00 8b 13 83 fa 03 Oct 1 13:07:57 ceph-0006 kernel: [ 67.099217] RSP: 0018:ffff888102f67880 EFLAGS: 00010246 Oct 1 13:07:57 ceph-0006 kernel: [ 67.099488] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.099821] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88814f9dc378 Oct 1 13:07:57 ceph-0006 kernel: [ 67.100145] RBP: ffff888102f67910 R08: 0000000000000000 R09: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.100517] R10: ffff88814f9dc388 R11: 0000000000000000 R12: ffffed1029f3b871 Oct 1 13:07:57 ceph-0006 kernel: [ 67.100846] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc378 Oct 1 13:07:57 ceph-0006 kernel: [ 67.101175] FS: 0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.101575] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 13:07:57 ceph-0006 kernel: [ 67.101840] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.102160] PKRU: 55555554 Oct 1 13:07:57 ceph-0006 kernel: [ 67.102292] Call Trace: Oct 1 13:07:57 ceph-0006 kernel: [ 67.102411] <TASK> Oct 1 13:07:57 ceph-0006 kernel: [ 67.102536] get_bvec_at+0xdb/0x230 Oct 1 13:07:57 ceph-0006 kernel: [ 67.102705] ? __pfx__printk+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.102883] ? __pfx_get_bvec_at+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.103068] ceph_con_v2_try_read.cold+0x19d5/0x1ac7 Oct 1 13:07:57 ceph-0006 kernel: [ 67.103299] ? update_entity_lag+0x49/0x190 Oct 1 13:07:57 ceph-0006 kernel: [ 67.103533] ? __pfx_ceph_con_v2_try_read+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.103761] ? finish_task_switch.isra.0+0x15d/0x830 Oct 1 13:07:57 ceph-0006 kernel: [ 67.104003] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.104207] ? mutex_lock+0x84/0xe0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.104371] ? __pfx_mutex_lock+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.104570] ceph_con_workfn+0x27e/0x10e0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.104765] process_one_work+0x611/0xe20 Oct 1 13:07:57 ceph-0006 kernel: [ 67.105001] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.105246] worker_thread+0x7e3/0x1580 Oct 1 13:07:57 ceph-0006 kernel: [ 67.105654] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.105934] ? __pfx_worker_thread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.106501] kthread+0x381/0x7a0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.106865] ? __pfx__raw_spin_lock_irq+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.107725] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.108415] ? __kasan_check_write+0x14/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.109047] ? recalc_sigpending+0x160/0x220 Oct 1 13:07:57 ceph-0006 kernel: [ 67.109536] ? _raw_spin_unlock_irq+0xe/0x50 Oct 1 13:07:57 ceph-0006 kernel: [ 67.109895] ? calculate_sigpending+0x78/0xb0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.110389] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.110626] ret_from_fork+0x2b6/0x380 Oct 1 13:07:57 ceph-0006 kernel: [ 67.111006] ? __pfx_kthread+0x10/0x10 Oct 1 13:07:57 ceph-0006 kernel: [ 67.111227] ret_from_fork_asm+0x1a/0x30 Oct 1 13:07:57 ceph-0006 kernel: [ 67.111465] </TASK> Oct 1 13:07:57 ceph-0006 kernel: [ 67.111768] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds floppy psmouse mac_hid i2c_piix4 vga16fb i2c_smbus pata_acpi serio_raw qemu_fw_cfg bochs vgastate sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore Oct 1 13:07:57 ceph-0006 kernel: [ 67.114216] ---[ end trace 0000000000000000 ]--- Oct 1 13:07:57 ceph-0006 kernel: [ 67.114523] RIP: 0010:ceph_msg_data_advance+0x79/0x1a40 Oct 1 13:07:57 ceph-0006 kernel: [ 67.114951] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 d6 10 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e de 0d 00 00 8b 13 83 fa 03 Oct 1 13:07:57 ceph-0006 kernel: [ 67.116118] RSP: 0018:ffff888102f67880 EFLAGS: 00010246 Oct 1 13:07:57 ceph-0006 kernel: [ 67.116544] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.116973] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88814f9dc378 Oct 1 13:07:57 ceph-0006 kernel: [ 67.117370] RBP: ffff888102f67910 R08: 0000000000000000 R09: 0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.117895] R10: ffff88814f9dc388 R11: 0000000000000000 R12: ffffed1029f3b871 Oct 1 13:07:57 ceph-0006 kernel: [ 67.118276] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc378 Oct 1 13:07:57 ceph-0006 kernel: [ 67.118767] FS: 0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000 Oct 1 13:07:57 ceph-0006 kernel: [ 67.119211] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Oct 1 13:07:57 ceph-0006 kernel: [ 67.119591] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0 Oct 1 13:07:57 ceph-0006 kernel: [ 67.119983] PKRU: 55555554
Potentially, it could be the reason of the issue.
Updated by Viacheslav Dubeyko 4 months ago
@aterfax I am having a vacation for next 3 weeks. So, it will be no progress on the issue for the next 3 weeks. I need some time for investigating the issue yet. And I don't expect to fix it today or tomorrow.
Thanks,
Slava.
Updated by Viacheslav Dubeyko 2 months ago
My initial assumption was that setup_message_sgs() requires to initialize cursor->total_resid and cursor->data. However, it is wrong assumption because process_v2_sparse_read() reset cursor->total_resid and cursor->data. And we still have crash:
Nov 4 14:36:53 ceph-0006 kernel: [ 73.192827] ------------[ cut here ]------------ Nov 4 14:36:53 ceph-0006 kernel: [ 73.192829] kernel BUG at net/ceph/messenger.c:1143! Nov 4 14:36:53 ceph-0006 kernel: [ 73.193158] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI Nov 4 14:36:53 ceph-0006 kernel: [ 73.193577] CPU: 1 UID: 0 PID: 239 Comm: kworker/1:2 Tainted: G W 6.17.0-rc7+ #10 PREEMPT(voluntary) Nov 4 14:36:53 ceph-0006 kernel: [ 73.194540] Tainted: [W]=WARN Nov 4 14:36:53 ceph-0006 kernel: [ 73.194768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014 Nov 4 14:36:53 ceph-0006 kernel: [ 73.195353] Workqueue: ceph-msgr ceph_con_workfn Nov 4 14:36:53 ceph-0006 kernel: [ 73.195660] RIP: 0010:ceph_msg_data_advance+0xa7c/0x1a50 Nov 4 14:36:53 ceph-0006 kernel: [ 73.196006] Code: ff df 4c 89 d6 48 c1 ee 03 80 3c 16 00 0f 84 6c fc ff ff 4c 89 d7 48 89 4d d0 e8 9f 4c 36 fd 48 8b 4d d0 e9 57 fc ff ff 0f 0b <0f> 0b 41 39 c9 0f 82 02 01 00 00 41 29 c9 41 01 cd 49 8b 73 78 48 Nov 4 14:36:53 ceph-0006 kernel: [ 73.200589] RSP: 0018:ffff8881257a7880 EFLAGS: 00010246 Nov 4 14:36:53 ceph-0006 kernel: [ 73.201775] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000 Nov 4 14:36:53 ceph-0006 kernel: [ 73.203406] RDX: dffffc0000000000 RSI: 1ffff11022195870 RDI: ffff888110cac378 Nov 4 14:36:53 ceph-0006 kernel: [ 73.203857] RBP: ffff8881257a7910 R08: 0000000000000000 R09: 0000000000000000 Nov 4 14:36:53 ceph-0006 kernel: [ 73.204684] R10: ffff888110cac388 R11: 0000000000000000 R12: ffffed1022195871 Nov 4 14:36:53 ceph-0006 kernel: [ 73.205100] R13: 1ffff11024af4f25 R14: ffff8881257a7ba0 R15: ffff888110cac378 Nov 4 14:36:53 ceph-0006 kernel: [ 73.206839] FS: 0000000000000000(0000) GS:ffff888258ade000(0000) knlGS:0000000000000000 Nov 4 14:36:53 ceph-0006 kernel: [ 73.208571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 4 14:36:53 ceph-0006 kernel: [ 73.209865] CR2: 0000624cfb53ef50 CR3: 000000010bf4f006 CR4: 0000000000772ef0 Nov 4 14:36:53 ceph-0006 kernel: [ 73.210886] PKRU: 55555554 Nov 4 14:36:53 ceph-0006 kernel: [ 73.211113] Call Trace: Nov 4 14:36:53 ceph-0006 kernel: [ 73.211355] <TASK> Nov 4 14:36:53 ceph-0006 kernel: [ 73.211530] get_bvec_at+0xdb/0x230 Nov 4 14:36:53 ceph-0006 kernel: [ 73.211789] ? __pfx__printk+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.212103] ? __pfx_get_bvec_at+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.212410] ceph_con_v2_try_read.cold+0x19d5/0x1ac7 Nov 4 14:36:53 ceph-0006 kernel: [ 73.212785] ? update_entity_lag+0x49/0x190 Nov 4 14:36:53 ceph-0006 kernel: [ 73.212947] libceph: pid 889:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting... Nov 4 14:36:53 ceph-0006 kernel: [ 73.213073] ? __pfx_ceph_con_v2_try_read+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.213731] libceph: pid 889:net/ceph/messenger_v2.c:1016 setup_message_sgs(): front_len(msg) 328, middle_len(msg) 0, data_len(msg) 0, pages 0000000000000000 Nov 4 14:36:53 ceph-0006 kernel: [ 73.214078] ? finish_task_switch.isra.0+0x15d/0x830 Nov 4 14:36:53 ceph-0006 kernel: [ 73.214083] ? __kasan_check_write+0x14/0x30 Nov 4 14:36:53 ceph-0006 kernel: [ 73.214087] ? mutex_lock+0x84/0xe0 Nov 4 14:36:53 ceph-0006 kernel: [ 73.214091] ? __pfx_mutex_lock+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.214095] ceph_con_workfn+0x27e/0x10e0 Nov 4 14:36:53 ceph-0006 kernel: [ 73.216918] process_one_work+0x611/0xe20 Nov 4 14:36:53 ceph-0006 kernel: [ 73.217215] ? __kasan_check_write+0x14/0x30 Nov 4 14:36:53 ceph-0006 kernel: [ 73.217541] worker_thread+0x7e3/0x1580 Nov 4 14:36:53 ceph-0006 kernel: [ 73.217775] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.218063] ? __pfx_worker_thread+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.218353] kthread+0x381/0x7a0 Nov 4 14:36:53 ceph-0006 kernel: [ 73.218593] ? __pfx__raw_spin_lock_irq+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.218880] ? __pfx_kthread+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.219128] ? __kasan_check_write+0x14/0x30 Nov 4 14:36:53 ceph-0006 kernel: [ 73.219452] ? recalc_sigpending+0x160/0x220 Nov 4 14:36:53 ceph-0006 kernel: [ 73.219751] ? _raw_spin_unlock_irq+0xe/0x50 Nov 4 14:36:53 ceph-0006 kernel: [ 73.220044] ? calculate_sigpending+0x78/0xb0 Nov 4 14:36:53 ceph-0006 kernel: [ 73.220350] ? __pfx_kthread+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.220602] ret_from_fork+0x2b6/0x380 Nov 4 14:36:53 ceph-0006 kernel: [ 73.220869] ? __pfx_kthread+0x10/0x10 Nov 4 14:36:53 ceph-0006 kernel: [ 73.221131] ret_from_fork_asm+0x1a/0x30 Nov 4 14:36:53 ceph-0006 kernel: [ 73.221431] </TASK> Nov 4 14:36:53 ceph-0006 kernel: [ 73.221590] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel kvm joydev irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl floppy input_leds mac_hid i2c_piix4 psmouse vga16fb serio_raw vgastate i2c_smbus pata_acpi bochs qemu_fw_cfg sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore Nov 4 14:36:53 ceph-0006 kernel: [ 73.224209] ---[ end trace 0000000000000000 ]---
Currently, I assume that it needs to initialize cursor->total_resid and cursor->data exactly before CEPH_SPARSE_READ_DATA.
Updated by Viacheslav Dubeyko 2 months ago
- % Done changed from 20 to 50
I am testing this fix now:
git diff
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 9f6d860411cb..e8370e6806fb 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1128,6 +1128,7 @@ void ceph_msg_data_advance(struct ceph_msg_data_cursor *cursor, size_t bytes)
bool new_piece;
BUG_ON(bytes > cursor->resid);
+ BUG_ON(!cursor->data);
switch (cursor->data->type) {
case CEPH_MSG_DATA_PAGELIST:
new_piece = ceph_msg_data_pagelist_advance(cursor, bytes);
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index 5483b4eed94e..a5fc4576f51f 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1090,6 +1090,8 @@ static int process_v2_sparse_read(struct ceph_connection *con,
struct ceph_msg_data_cursor *cursor = &con->v2.in_cursor;
int ret;
+ ceph_msg_data_cursor_init(cursor, con->in_msg, con->in_msg->data_length);
+
for (;;) {
char *buf = NULL;
Updated by Viacheslav Dubeyko 2 months ago
As far as I can see, this fix works:
sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure cd /mnt/cephfs/ ls -lah ./fscrypt-test-3 total 10K drwx------ 2 root root 1 Sep 29 14:47 . drwxr-xr-x 16 root root 31 Nov 5 14:37 .. -rw-r--r-- 1 slavad 1001 10K Sep 29 14:34 KsaCwBA0zyUFdVQX5badasKvt9tuxJ7LgvKMvBYBXRA fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3 "/mnt/cephfs/fscrypt-test-3" is now unlocked and ready for use. cat ./fscrypt-test-3/area_decrypted.tar .area-decrypted0000644000000000000000000000005414640301620012441 0ustar rootrootIf you can read this the area is decrypted.
Now it needs to run xfstests to check that the fix doesn't break other test-cases.
Updated by Viacheslav Dubeyko 2 months ago
Currently, I can see that several xfstsests have failed:
Failures: generic/198 generic/452 generic/580 generic/639
Failed 4 of 627 tests
However, I never had run xfstsest for messenger protocol v2 before. So, potentially, the reason of the failures could be not the fix but another issues. So, I need to double check the current situation.
Updated by Viacheslav Dubeyko 2 months ago
I see the same issues with generic/198, generic/452, generic/580, generic/639 without applying the patch with fix. So, we have another several issues for the case of messenger protocol v2. I need to create another tickets for these issues investigation.
I am preparing the patch with fix for sending to open-source community for the review.
Updated by Viacheslav Dubeyko 2 months ago
- Status changed from In Progress to Fix Under Review
- % Done changed from 50 to 90
Fix has been sent to open-source community for review:
https://lore.kernel.org/ceph-devel/20251111205627.475128-2-slava@dubeyko.com/T/#u
Updated by Viacheslav Dubeyko 2 months ago
Second version of the patch has been sent to community for review:
https://lore.kernel.org/ceph-devel/20251112195246.495313-2-slava@dubeyko.com/T/#u
Updated by Viacheslav Dubeyko 2 months ago
I created several tickets:
(1) generic/198 issue: https://tracker.ceph.com/issues/73827
(2) generic/452 issue: https://tracker.ceph.com/issues/73828
(3) generic/639 issue: https://tracker.ceph.com/issues/73829
As far as I can see, I cannot reproduce the generic/580 issue. I have checked that generic/580 test-case triggers the kernel crash without the patch. But there is no crash with patch applied.
Updated by Viacheslav Dubeyko 2 months ago
Third version of the patch has been sent to community for review:
https://lore.kernel.org/ceph-devel/20251113223623.508000-2-slava@dubeyko.com/T/#u
Updated by Viacheslav Dubeyko 2 months ago
- % Done changed from 90 to 100
Patch has been applied on testing branch [1]:
commit b33c978f260a00c749f865a7d682f7ea4603cc58 (HEAD -> testing, origin/testing)
Author: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Date: Thu Nov 13 14:36:24 2025 -0800
Updated by Viacheslav Dubeyko about 2 months ago
- Status changed from Fix Under Review to Resolved
Patch has been sent to upstream.
Updated by Ater Fax about 2 months ago
Thanks for the update on progress and for the patch. Happy Thanksgiving if you're US based!
Updated by Viacheslav Dubeyko about 2 months ago
Ater Fax wrote in #note-41:
Thanks for the update on progress and for the patch. Happy Thanksgiving if you're US based!
Thanks. I am enjoying by Thanksgiving. :) Happy Thanksgiving to you too.