Bug #73152: CephFS client side, stops working causing machine partial lockup when using fscrypt - multiple OSes / kernels affected - Linux kernel client - Ceph

Actions

Copy link

Bug #73152

closed

CephFS client side, stops working causing machine partial lockup when using fscrypt - multiple OSes / kernels affected

Added by Ater Fax 4 months ago. Updated about 2 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

Viacheslav Dubeyko

Category:

fs/ceph

Target version:

% Done:

100%

Source:

Community (user)

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

ceph-qa-suite:

Tags (freeform):

Merge Commit:

Fixed In:

Released In:

Upkeep Timestamp:

Description

Expected Behavior:

The system should allow access to fscrypt-encrypted directories on CephFS without causing kernel oops or system instability.

Actual Behavior:

Accessing fscrypt-encrypted directories on CephFS leads to a kernel oops, partial system lockup and failure to access CephFS file system.

Note: without the FScrypt area decrypted there does not appear to be issues / errors.

Client System Information:

Operating Systems: Ubuntu 22.04,24.04 and Debian 13

Ubuntu client kernel Version: 6.5.0-44-generic / 6.8.0-40-generic / 6.8.0-83-generic
Ubuntu client Ceph versions: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) Ubuntu 22.04, ceph version 19.2.1 (9efac4a81335940925dd17dbf407bfd6d3860d28) squid (stable) Ubuntu 24.04

Debian client kernel Version: 6.14.0-29-generic
Debian Ceph version: ceph version 18.2.7 (6b0e988052ec84cf2d4a54ff9bbbc5e720b621ad) reef (stable)

Server System Information:

Proxmox Cluster backed on Proxmox Virtual Environment 9.0.6
Ceph server version: ceph version 19.2.3 (ad1eecf4042e0ce72f382f60c97b709fd6f16a51) squid (stable)

More info / logs:
Significantly more info including DMESG dumps can be found on my original ticket filed under Ubuntu: https://bugs.launchpad.net/kernel/+bug/2073679

Files

Download all files

kernel_log_ceph_debug.2025-09-24.log (333 KB) kernel_log_ceph_debug.2025-09-24.log	Kernel log dump with requested settings.	Ater Fax, 09/24/2025 04:05 AM
area_decrypted.tar (10 KB) area_decrypted.tar		Ater Fax, 09/29/2025 04:14 AM

Actions

Copy link

Updated by Ater Fax 4 months ago

What happens:

A kernel NULL pointer dereference occurred during CephFS operations.

Triggered in libceph, specifically:
get_bvec_at
process_v2_sparse_read
decrypt_tail
populate_in_iter
ceph_con_v2_try_read
ceph_msg_data_advance

The bug occurrs in a kworker thread (kworker/2:0) within the ceph-msgr workqueue.

Kernel flagged supervisor read access at 0x0 → standard NULL dereference.

Actions

Copy link

Updated by Venky Shankar 4 months ago

Project changed from CephFS to Linux kernel client
Category changed from Correctness/Safety to fs/ceph
Assignee set to Viacheslav Dubeyko
Component(FS) deleted (~~libcephfs~~)
Labels (FS) deleted (~~crash~~)

Slava, can you please take a look.

Actions

Copy link

Updated by Viacheslav Dubeyko 4 months ago

@aterfax Could you please explain the reproducing path more clearly?

As far as I can follow, we need:
(1) Create fscrypt-encrypted directory.
(2) Create file in fscrypt-encrypted directory.
(3) Try to access the file content in fscrypt-encrypted directory.

What is mount options on your side?
How did you created the fscrypt-encrypted directory?
How did you created the file?
Which size this file has?
How did you tried to access the file's content?

Thanks,
Slava.

Actions

Copy link

Updated by Ater Fax 4 months ago

Hi Slava,

I believe your steps may work but I would mention that this may not be a perfect replication or trigger the bug. The data in my storage area was created and working (still does work on kernel 6.5.0-41-generic) as a network filesystem for Docker Swarmed containers on Ubuntu 22.04 with kernel 6.5.0-41-generic (this is still working) and prior for a long time. If the bug itself is some artifact of the files being created on with the earlier Ceph version (17.2.7) with FScrypt on that Ubuntu version, we may not successfully replicate as above.

The mount options are as follows:
username@3f5a8c29-91d6-47f1-ae4e-2db63e1c4429.cephfs-docker-shared=/ /mnt/pve/cephfs-docker-shared/ ceph noatime,_netdev,ms_mode=secure 0 0

The files under this area are largely created by the running docker containers with a side dish of the version controlled directory I use to orchestrate it all. When the issue triggers, it will occur when Docker starts and tries to access data in the area, or when Docker is disabled, I can trigger it myself by attempting to access (as far as I can tell) any file in the encrypted area. In terms of accessing the data or files in question, I'm fairly sure I trigger it with any access attempt, be that using 'cat', 'less' any typical editor or 'ls -al' (although I don't think using 'ls' actually causes a lock up, just shows malformed metadata display).

I don't believe the file size, type or anything along those lines has any affect on the bug triggering - it seems that any access / usage attempt = issues. (Although I can't say I have tried doing anything that special yet, e.g. prodding only at metadata or inodes etc...)

Prior to the machines falling over or stopping working entirely I see the 'df' command start to hang infinitely and files under the encrypted directory will begin to have malformed POSIX metadata, e.g. instead of a user and group it may show "???" etc...

Notably, only the client side is affected - not server side. That is to say I have multiple functional clients happily transacting with the Ceph server even if one on a newer kernel has locked up and died. I'm going to clone some of my VMs and do some exploratory work seeing what does and doesn't seem to trigger the issue. Will report back soonish.

Actions

Copy link

Updated by Viacheslav Dubeyko 4 months ago

I have:

uname -a
Linux ceph-0005 6.17.0-rc6-dirty #46 SMP PREEMPT_DYNAMIC Thu Sep 18 16:56:08 PDT 2025 x86_64 x86_64 x86_64 GNU/Linux

I executed such steps:

sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs
mkdir /mnt/cephfs/fscrypt-test
fscrypt setup
fscrypt setup /mnt/cephfs
dd if=/dev/urandom of=/mnt/cephfs/my.key bs=32 count=1
fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test/
dd if=/dev/urandom of=./00001.bin bs=4096 count=10
fscrypt lock /mnt/cephfs/fscrypt-test/
fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test/

ls lah
total 40K
drwx----- 2 root root 1 Sep 22 14:32 .
drwxr-xr-x 9 root root 22 Sep 22 14:30 ..
~~rw-r--r-~~ 1 root root 40K Sep 22 14:32 00001.bin

I tried ls and cat commands. Both commands works perfectly on my side.

So, we need to find a proper way to reproduce the issue. I assumed that, maybe, unaligned file size can trigger the issue. Maybe, 6.5.0-44 doesn't include last fixes in CephFS kernel client.

Actions

Copy link

Updated by Viacheslav Dubeyko 4 months ago

[  148.568906] fscrypt: AES-256-XTS using implementation "xts-aes-aesni" 
[  148.576166] ------------[ cut here ]------------
[  148.576185] WARNING: CPU: 1 PID: 36 at net/ceph/messenger_v2.c:869 get_bvec_at+0x98/0xb0 [libceph]
[  148.576224] Modules linked in: ip_vs_rr xt_ipvs ip_vs xt_nat esp4 veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf nft_chain_
nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc ceph libceph fscache netfs overlay ip6t_REJECT 
nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrac
k nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink binfmt_misc nls_iso8859_1 input_leds joydev serio_raw mac_hid qemu_fw_cfg vmge
nid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid45
6 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear qxl drm_ttm_helper ttm sha25
6_ssse3 sha1_ssse3 drm_kms_helper aesni_intel crypto_simd cryptd psmouse drm i2c_piix4 pata_acpi floppy
[  148.576285] CPU: 1 PID: 36 Comm: kworker/1:1 Not tainted 6.5.0-44-generic #44~22.04.1-Ubuntu
[  148.576287] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  148.576289] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[  148.576316] RIP: 0010:get_bvec_at+0x98/0xb0 [libceph]
[  148.576343] Code: d8 41 89 44 24 0c 48 8b 45 e8 65 48 2b 04 25 28 00 00 00 75 19 48 83 c4 18 5b 41 5c 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc 
cc cc <0f> 0b eb ab e8 9f 9b 99 f2 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
[  148.576345] RSP: 0018:ffffab308013bc58 EFLAGS: 00010246
[  148.576347] RAX: 0000000000000000 RBX: ffff973d49a05378 RCX: ffff973d57bb2920
[  148.576349] RDX: 0000000000000000 RSI: ffffab308013bca8 RDI: ffff973d49a05378
[  148.576350] RBP: ffffab308013bc80 R08: 0000000000000f38 R09: 0000000000000000
[  148.576351] R10: 0000000000000000 R11: 0000000000001000 R12: ffffab308013bca8
[  148.576352] R13: 0000000000000f38 R14: 00000000000000c8 R15: 0000000000001000
[  148.576354] FS:  0000000000000000(0000) GS:ffff973e71d00000(0000) knlGS:0000000000000000
[  148.576355] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  148.576357] CR2: 0000777a5e965d40 CR3: 0000000116514000 CR4: 00000000000006e0
[  148.576361] Call Trace:
[  148.576380]  <TASK>
[  148.576383]  ? show_regs+0x6d/0x80
[  148.576388]  ? __warn+0x89/0x160
[  148.576391]  ? get_bvec_at+0x98/0xb0 [libceph]
[  148.576419]  ? report_bug+0x17e/0x1b0
[  148.576424]  ? handle_bug+0x46/0x90
[  148.576427]  ? exc_invalid_op+0x18/0x80
[  148.576429]  ? asm_exc_invalid_op+0x1b/0x20
[  148.576433]  ? get_bvec_at+0x98/0xb0 [libceph]
[  148.576458]  ? osd_sparse_read+0x1a6/0x290 [libceph]
[  148.576512]  process_v2_sparse_read.constprop.0+0xfb/0x220 [libceph]
[  148.576538]  decrypt_tail+0x201/0x220 [libceph]
[  148.576564]  populate_in_iter+0x592/0x700 [libceph]
[  148.576588]  ? do_recvmsg+0x80/0xe0 [libceph]
[  148.576613]  ceph_con_v2_try_read+0x49/0x160 [libceph]
[  148.576638]  ceph_con_workfn+0x1d9/0x450 [libceph]
[  148.576662]  process_one_work+0x240/0x450
[  148.576665]  worker_thread+0x50/0x3f0
[  148.576668]  ? __pfx_worker_thread+0x10/0x10
[  148.576670]  kthread+0xf2/0x120
[  148.576673]  ? __pfx_kthread+0x10/0x10
[  148.576675]  ret_from_fork+0x47/0x70
[  148.576678]  ? __pfx_kthread+0x10/0x10
[  148.576681]  ret_from_fork_asm+0x1b/0x30
[  148.576684]  </TASK>
[  148.576684] ---[ end trace 0000000000000000 ]---
[  148.576694] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  148.576727] #PF: supervisor read access in kernel mode
[  148.576736] #PF: error_code(0x0000) - not-present page
[  148.576744] PGD 0 P4D 0 
[  148.576750] Oops: 0000 [#1] PREEMPT SMP PTI
[  148.576758] CPU: 1 PID: 36 Comm: kworker/1:1 Tainted: G        W          6.5.0-44-generic #44~22.04.1-Ubuntu
[  148.576772] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[  148.576787] Workqueue: ceph-msgr ceph_con_workfn [libceph]
[  148.576820] RIP: 0010:ceph_msg_data_advance+0x2d/0x380 [libceph]
[  148.576853] Code: 00 00 48 8b 4f 10 48 39 f1 0f 82 f3 02 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 57 08 <8b> 02 83 f8 03 0f 84 a3 02 00 00 0f 87 9c 00 00 00 83 f8 01 0f 84
[  148.576876] RSP: 0018:ffffab308013bc18 EFLAGS: 00010296
[  148.576885] RAX: 0000000000000000 RBX: ffff973d49a05378 RCX: 0000000000000000
[  148.576895] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff973d49a05378
[  148.576905] RBP: ffffab308013bc48 R08: 0000000000000f38 R09: 0000000000000000
[  148.576915] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000000000
[  148.576924] R13: 0000000000000f38 R14: 00000000000000c8 R15: 0000000000001000
[  148.576934] FS:  0000000000000000(0000) GS:ffff973e71d00000(0000) knlGS:0000000000000000
[  148.576945] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  148.576954] CR2: 0000000000000000 CR3: 0000000116514000 CR4: 00000000000006e0
[  148.576967] Call Trace:
[  148.576972]  <TASK>
[  148.576977]  ? show_regs+0x6d/0x80
[  148.576985]  ? __die+0x24/0x80
[  148.576992]  ? page_fault_oops+0x99/0x1b0
[  148.577000]  ? kernelmode_fixup_or_oops+0xb2/0x140
[  148.577009]  ? __bad_area_nosemaphore+0x1a5/0x2c0
[  148.577017]  ? bad_area_nosemaphore+0x16/0x30
[  148.577024]  ? do_user_addr_fault+0x2c4/0x6b0
[  148.577032]  ? get_bvec_at+0x98/0xb0 [libceph]
[  148.577064]  ? exc_page_fault+0x83/0x1b0
[  148.577073]  ? asm_exc_page_fault+0x27/0x30
[  148.577082]  ? ceph_msg_data_advance+0x2d/0x380 [libceph]
[  148.577114]  get_bvec_at+0x47/0xb0 [libceph]
[  148.577168]  process_v2_sparse_read.constprop.0+0xfb/0x220 [libceph]
[  148.577203]  decrypt_tail+0x201/0x220 [libceph]
[  148.577236]  populate_in_iter+0x592/0x700 [libceph]
[  148.577269]  ? do_recvmsg+0x80/0xe0 [libceph]
[  148.577301]  ceph_con_v2_try_read+0x49/0x160 [libceph]
[  148.577333]  ceph_con_workfn+0x1d9/0x450 [libceph]
[  148.577365]  process_one_work+0x240/0x450
[  148.577373]  worker_thread+0x50/0x3f0
[  148.577381]  ? __pfx_worker_thread+0x10/0x10
[  148.577389]  kthread+0xf2/0x120
[  148.577711]  ? __pfx_kthread+0x10/0x10
[  148.577970]  ret_from_fork+0x47/0x70
[  148.578267]  ? __pfx_kthread+0x10/0x10
[  148.578533]  ret_from_fork_asm+0x1b/0x30
[  148.578781]  </TASK>
[  148.579010] Modules linked in: ip_vs_rr xt_ipvs ip_vs xt_nat esp4 veth vxlan ip6_udp_tunnel udp_tunnel xt_policy xt_mark xt_bpf nft_chain_
nat xt_MASQUERADE nf_nat nf_conntrack_netlink xfrm_user xfrm_algo br_netfilter bridge stp llc ceph libceph fscache netfs overlay ip6t_REJECT 
nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrac
k nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables nfnetlink binfmt_misc nls_iso8859_1 input_leds joydev serio_raw mac_hid qemu_fw_cfg vmge
nid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua sch_fq_codel efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid45
6 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear qxl drm_ttm_helper ttm sha25
6_ssse3 sha1_ssse3 drm_kms_helper aesni_intel crypto_simd cryptd psmouse drm i2c_piix4 pata_acpi floppy
[  148.581388] CR2: 0000000000000000
[  148.581652] ---[ end trace 0000000000000000 ]---
[  148.581906] RIP: 0010:ceph_msg_data_advance+0x2d/0x380 [libceph]
[  148.582223] Code: 00 00 48 8b 4f 10 48 39 f1 0f 82 f3 02 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 
57 08 <8b> 02 83 f8 03 0f 84 a3 02 00 00 0f 87 9c 00 00 00 83 f8 01 0f 84
[  148.582760] RSP: 0018:ffffab308013bc18 EFLAGS: 00010296
[  148.583064] RAX: 0000000000000000 RBX: ffff973d49a05378 RCX: 0000000000000000
[  148.583350] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff973d49a05378
[  148.583628] RBP: ffffab308013bc48 R08: 0000000000000f38 R09: 0000000000000000
[  148.583905] R10: 0000000000000000 R11: 0000000000001000 R12: 0000000000000000
[  148.584185] R13: 0000000000000f38 R14: 00000000000000c8 R15: 0000000000001000
[  148.584456] FS:  0000000000000000(0000) GS:ffff973e71d00000(0000) knlGS:0000000000000000
[  148.584732] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  148.585011] CR2: 0000000000000000 CR3: 0000000116514000 CR4: 00000000000006e0
[  148.585303] note: kworker/1:1[36] exited with irqs disabled

As far as I can see, the warning has been triggered at first and we are trying to dereference the NULL pointer then:

static void get_bvec_at(struct ceph_msg_data_cursor *cursor,
            struct bio_vec *bv)
{
    struct page *page;
    size_t off, len;

    WARN_ON(!cursor->total_resid); <-- We have warning here, probably, total_resid == 0

    /* skip zero-length data items */
    while (!cursor->resid)
        ceph_msg_data_advance(cursor, 0); <-- We are trying to dereference NULL pointer in ceph_msg_data_advance()

    /* get a piece of data, cursor isn't advanced */
    page = ceph_msg_data_next(cursor, &off, &len);
    bvec_set_page(bv, page, len, off);
}

The struct ceph_msg_data_cursor:

struct ceph_msg_data_cursor {
    size_t            total_resid;    /* across all data items */

    struct ceph_msg_data    *data;        /* current data item */
    size_t            resid;        /* bytes not yet consumed */
    int            sr_resid;    /* residual sparse_read len */
    bool            need_crc;    /* crc update needed */
    union {
#ifdef CONFIG_BLOCK
        struct ceph_bio_iter    bio_iter;
#endif /* CONFIG_BLOCK */
        struct bvec_iter    bvec_iter;
        struct {                /* pages */
            unsigned int    page_offset;    /* offset in page */
            unsigned short    page_index;    /* index in array */
            unsigned short    page_count;    /* pages in array */
        };
        struct {                /* pagelist */
            struct page    *page;        /* page from list */
            size_t        offset;        /* bytes from list */
        };
        struct {
            struct iov_iter        iov_iter;
            unsigned int        lastlen;
        };
    };
};

Probably, cursor->data is NULL because cursor->total_resid == 0. Technically speaking, it is possible to protect code from trying to dereference the NULL pointer. But how do we have such situation? It is not clear yet at all.

Actions

Copy link

Updated by Ater Fax 4 months ago

Investigating using newer kernel but not FSCrypt decrypting

Had a play about in Ubuntu 22.04 under 6.8.0-83-generic with FScrypt left encrypted, not seeing any issues on the area and can 'df', list metadata and open files as normal.

Investigating using newer kernel and FSCrypt and decrypting the directory

Before attempting to open any files in the encrypted area, listing metadata seems fine showing the correct user:group etc... but after I attempt to 'cat' a file the 'cat' command locks up and does not return info but can be escaped with Ctrl+C , the 'df' command hangs and cannot be escaped, 'ls' onto files hangs and cannot be escaped. Attempting to sudo to elevate permissions hangs and cannot be escaped.

DMESG shows the same Oops/issue as before but I also see it reporting that 1 mds closed the session and has denied reconnect and the other "hung".

[Mon Sep 22 22:39:44 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds1 closed our session
[Mon Sep 22 22:39:44 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds1 reconnect start
[Mon Sep 22 22:39:44 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds1 reconnect denied
[Mon Sep 22 22:39:57 2025] ceph: [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 114485188]: mds0 hung

Checking the MDS/Mons etc... and Ceph Mds/Mgr/Mon logs don't show either of them hung but I also don't see them deny the reconnect.

I also see 'top' showing the load average zoom up to double the core count (4 core, 8 load avg).

How the area was created

I don't actually remember the exact commands used to create encrypted directory and the bash history is long gone however I can see:
/etc/fscrypt.conf: {
"source": "custom_passphrase",
"hash_costs": {
"time": "4",
"memory": "131072",
"parallelism": "2"
},
"options": {
"padding": "32",
"contents": "AES_256_XTS",
"filenames": "AES_256_CTS",
"policy_version": "2"
},
"use_fs_keyring_for_v1_policies": false,
"allow_cross_user_metadata": false
}
/etc/fscrypt.conf

Pretty good chance I followed the arch wiki - https://wiki.archlinux.org/title/Fscrypt

Actions

Copy link

Updated by Ater Fax 4 months ago

Did some more prodding. I created a new file from a node running Ubuntu 22.04 with 6.8.0-83-generic on the area without error. I could also 'cat' that new file without error.

So it seems the files created by the older version/s of the kernel are a component to this bug. Perhaps this will be reproducible if you create a file with kernel 6.5.0-41-generic and then try to open it with 6.5.0-44-generic or higher.

Actions

Copy link

Updated by Viacheslav Dubeyko 4 months ago

I have created the file under Ubuntu 22.04.5 LTS (GNU/Linux 6.5.0-41-generic x86_64). I successfully accessed the file under 6.5.0-41 and 6.17.0-rc6 without any issue. Unfortunately, I didn't reproduce the issue.

Actions

Copy link

#10

Updated by Viacheslav Dubeyko 4 months ago

@aterfax Could you able to enable the debug output and to reproduce the issue again? If you didn't compile the kernel, then you cannot enable the debug output.

Actions

Copy link

#11

Updated by Ater Fax 4 months ago

If you provide some specific instructions on how I can I'm happy to.

Am I needing to compile a new kernel with `scripts/config --enable CEPH_LIB_PRETTYDEBUG` and then edit the node's /etc/ceph/ceph.conf to add the following in the [global] section?

[global]
    debug_client = 20
    debug_ms = 1

Actions

Copy link

#12

Updated by Viacheslav Dubeyko 4 months ago

I have this lines in .config:

CONFIG_CEPH_LIB=y
CONFIG_CEPH_LIB_PRETTYDEBUG=y
CONFIG_CEPH_LIB_USE_DNS_RESOLVER=y
CONFIG_CEPH_FS=y
CONFIG_CEPH_FSCACHE=y
CONFIG_CEPH_FS_POSIX_ACL=y
CONFIG_CEPH_FS_SECURITY_LABEL=y

You can enable output from CephFS kernel client:
echo 'module ceph +p' > /proc/dynamic_debug/control

File:
echo 'file super.c +p' > /proc/dynamic_debug/control

Function:
echo 'func ceph_submit_write +p' > /proc/dynamic_debug/control

You can enable output from libceph:
echo 'module libceph +p' > /proc/dynamic_debug/control

You can disable output:
echo 'module libceph -p' > /proc/dynamic_debug/control

The most complicated is to select the proper files/functions to debug.
Let's try:
(1) net/ceph/messenger_v2.c
echo 'file net/ceph/messenger_v2.c +p' > /proc/dynamic_debug/control
(2) net/ceph/messenger_v1.c
echo 'net/ceph/messenger_v1.c +p' > /proc/dynamic_debug/control
(3) net/ceph/osd_client.c
echo 'net/ceph/osd_client.c +p' > /proc/dynamic_debug/control
(4) net/ceph/messenger.c
echo 'net/ceph/messenger.c +p' > /proc/dynamic_debug/control

It will already generate a lot of debug output. But I am not sure that we will have enough details to analyze the issue deeper. Probably, we will need to select functions then for more clean analysis.

[1] https://www.kernel.org/doc/html/v4.14/admin-guide/dynamic-debug-howto.html

Actions

Copy link

#13

Updated by Ater Fax 4 months ago

File kernel_log_ceph_debug.2025-09-24.log kernel_log_ceph_debug.2025-09-24.log added

Yep. Certainly quite noisy. Snipped large amounts of what appear to be superfluous lines.

Dumped using

root@node003:/root# uname -a
Linux node003 6.5.0-45-generic #45~22.04.1 SMP PREEMPT_DYNAMIC Wed Sep 24 02:50:54 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

root@node003:/root# echo 'module ceph +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'file super.c +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'func ceph_submit_write +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'module libceph +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'module libceph -p' > /proc/dynamic_debug/control
root@node003:/root# echo 'file net/ceph/messenger_v2.c +p' > /proc/dynamic_debug/control
root@node003:/root# echo 'net/ceph/messenger_v1.c +p' > /proc/dynamic_debug/control
bash: echo: write error: Invalid argument
root@node003:/root# echo 'net/ceph/osd_client.c +p' > /proc/dynamic_debug/control
bash: echo: write error: Invalid argument
root@node003:/root# echo 'net/ceph/messenger.c +p' > /proc/dynamic_debug/control
bash: echo: write error: Invalid argument

Decrypt started at Sep 24 03:37:20 and cat onto file '.area-decrypted' and call trace at Sep 24 03:37:20

See attached log.

Please me know if anything should be redacted as I do see some info leakage in terms of paths / file names.

Actions

Copy link

#14

Updated by Ater Fax 4 months ago

Bad paste on my part - Decrypt actually started at Sep 24 03:37:05 at the top where I snipped.

Actions

Copy link

#15

Updated by Viacheslav Dubeyko 4 months ago

As far as I can see, we have several instances of decrypt_tail() call:

Sep 24 03:37:05 node003 kernel: libceph:  messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 00000000d0c44c26 enc_page_cnt 1 sg_cnt 3
Sep 24 03:37:05 node003 kernel: libceph:   messenger.c:1415 : ===== 00000000d0c44c26 17 from mds0 26=client_reply len 2526+0+0 (0 0 0) =====
Sep 24 03:37:05 node003 kernel: ceph:     mds_client.c:3760 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] handle_reply 000000000a785d65

Sep 24 03:37:05 node003 kernel: libceph:  messenger_v2:1158 : decrypt_tail con 000000003af9e106 msg 00000000d20938e1 enc_page_cnt 1 sg_cnt 3
Sep 24 03:37:05 node003 kernel: libceph:   messenger.c:1415 : ===== 00000000d20938e1 21 from mds1 25=client_request_forward len 9+0+0 (0 0 0) =====
Sep 24 03:37:05 node003 kernel: ceph:     mds_client.c:3989 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] forward tid 13 to mds0 (we resend)

Sep 24 03:37:05 node003 kernel: libceph:  messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 00000000d20938e1 enc_page_cnt 1 sg_cnt 3
Sep 24 03:37:05 node003 kernel: libceph:   messenger.c:1415 : ===== 00000000d20938e1 18 from mds0 26=client_reply len 1092+0+0 (0 0 0) =====
Sep 24 03:37:05 node003 kernel: ceph:     mds_client.c:3760 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] handle_reply 000000000a785d65

Sep 24 03:37:17 node003 kernel: libceph:  messenger_v2:1158 : decrypt_tail con 000000003af9e106 msg 00000000df4cdbac enc_page_cnt 1 sg_cnt 3
Sep 24 03:37:17 node003 kernel: libceph:   messenger.c:1415 : ===== 00000000df4cdbac 22 from mds1 22=client_session len 28+0+0 (0 0 0) =====
Sep 24 03:37:17 node003 kernel: ceph:     mds_client.c:4111 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds1 renewcaps 00000000f68d2503 state open seq 18
Sep 24 03:37:17 node003 kernel: ceph:     mds_client.c:2083 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds1 ttl now 4295015512, was fresh, now stale

Sep 24 03:37:17 node003 kernel: libceph:  messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 000000009d61b09c enc_page_cnt 1 sg_cnt 3
Sep 24 03:37:17 node003 kernel: libceph:   messenger.c:1415 : ===== 000000009d61b09c 19 from mds0 22=client_session len 28+0+0 (0 0 0) =====
Sep 24 03:37:17 node003 kernel: ceph:     mds_client.c:4111 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds0 renewcaps 000000009951cbe1 state open seq 6
Sep 24 03:37:17 node003 kernel: ceph:     mds_client.c:2083 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] mds0 ttl now 4295015512, was fresh, now stale

Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1158 : decrypt_tail con 000000009819c90d msg 000000009d61b09c enc_page_cnt 1 sg_cnt 3
Sep 24 03:37:20 node003 kernel: libceph:   messenger.c:1415 : ===== 000000009d61b09c 20 from mds0 26=client_reply len 2526+0+0 (0 0 0) =====
Sep 24 03:37:20 node003 kernel: ceph:     mds_client.c:3760 : [3f5a8c29-91d6-47f1-ae4e-2db63e1c4429 117541320] handle_reply 0000000053b59cce

Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1158 : decrypt_tail con 00000000cba01a13 msg 00000000ee65444e enc_page_cnt 2 sg_cnt 6
Sep 24 03:37:20 node003 kernel: libceph:  osd_client.c:5780 : prep_next_sparse_read: [0] starting new sparse read req
Sep 24 03:37:20 node003 kernel: libceph:  osd_client.c:5819 : prep_next_sparse_read: [0] new sparse read op at idx 0 0x0~0x1000
Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 000000009ad0bb93
Sep 24 03:37:20 node003 kernel: libceph:  osd_client.c:5878 : [0] got 1 extents
Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 10 buf 000000008b76aea3
Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 00000000f62fddb2
Sep 24 03:37:20 node003 kernel: libceph:  osd_client.c:5923 : [0] ext 0 off 0x0 len 0x1000
Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 1000 buf 0000000000000000
Sep 24 03:37:20 node003 kernel: ------------[ cut here ]------------
Sep 24 03:37:20 node003 kernel: WARNING: CPU: 11 PID: 1376 at net/ceph/messenger_v2.c:869 get_bvec_at+0x98/0xb0 [libceph]

However, only latest one works in different way. In all previous cases, we have enc_page_cnt 1 sg_cnt 3, but latest one has enc_page_cnt 2 sg_cnt 6.

Actions

Copy link

#16

Updated by Viacheslav Dubeyko 4 months ago

In the case of issue, we have such sequence of actions:

(1) Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5923 : [0] ext 0 off 0x0 len 0x1000 [1]

static int osd_sparse_read(struct ceph_connection *con,
               struct ceph_msg_data_cursor *cursor,
               char **pbuf)
{
<skipped>
    case CEPH_SPARSE_READ_DATA:
        if (sr->sr_index >= count) {
            sr->sr_state = CEPH_SPARSE_READ_HDR;
            goto next_op;
        }

        eoff = sr->sr_extent[sr->sr_index].off;
        elen = sr->sr_extent[sr->sr_index].len;

        dout("[%d] ext %d off 0x%llx len 0x%llx\n",
             o->o_osd, sr->sr_index, eoff, elen);

        if (elen > INT_MAX) {
            dout("Sparse read extent length too long (0x%llx)\n",
                 elen);
            return -EREMOTEIO;
        }

        /* zero out anything from sr_pos to start of extent */
        if (sr->sr_pos < eoff)
            advance_cursor(cursor, eoff - sr->sr_pos, true);

        /* Set position to end of extent */
        sr->sr_pos = eoff + elen;

        /* send back the new length and nullify the ptr */
        cursor->sr_resid = elen;
        ret = elen;
        *pbuf = NULL;

        /* Bump the array index */
        ++sr->sr_index;
        break;
    }
    return ret;
}

(2) Sep 24 03:37:20 node003 kernel: libceph: messenger_v2:1104 : process_v2_sparse_read: sparse_read return 1000 buf 0000000000000000 [2]

static int process_v2_sparse_read(struct ceph_connection *con,
                  struct page **pages, int spos)
{
    struct ceph_msg_data_cursor *cursor = &con->v2.in_cursor;
    int ret;

    for (;;) {
        char *buf = NULL;

        ret = con->ops->sparse_read(con, cursor, &buf);
        if (ret <= 0)
            return ret;

        dout("%s: sparse_read return %x buf %p\n", __func__, ret, buf);

        do {
            int idx = spos >> PAGE_SHIFT;
            int soff = offset_in_page(spos);
            struct page *spage = con->v2.in_enc_pages[idx];
            int len = min_t(int, ret, PAGE_SIZE - soff);

            if (buf) {
                memcpy_from_page(buf, spage, soff, len);
                buf += len;
            } else {
                struct bio_vec bv;

                get_bvec_at(cursor, &bv);
                len = min_t(int, len, bv.bv_len);
                memcpy_page(bv.bv_page, bv.bv_offset,
                        spage, soff, len);
                ceph_msg_data_advance(cursor, len);
            }
            spos += len;
            ret -= len;
        } while (ret);
    }
}

So, it's clear that osd_sparse_read() returned ret 1000 and buf 1000. This is why we are going to call get_bvec_at().

However, prep_next_sparse_read() has been prepared the read for 0x0~0x1000:

Sep 24 03:37:20 node003 kernel: libceph: osd_client.c:5819 : prep_next_sparse_read: [0] new sparse read op at idx 0 0x0~0x1000

And we had three process_v2_sparse_read() before the failed one:

Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 000000009ad0bb93
Sep 24 03:37:20 node003 kernel: libceph:  osd_client.c:5878 : [0] got 1 extents
Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 10 buf 000000008b76aea3
Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 4 buf 00000000f62fddb2
Sep 24 03:37:20 node003 kernel: libceph:  osd_client.c:5923 : [0] ext 0 off 0x0 len 0x1000
Sep 24 03:37:20 node003 kernel: libceph:  messenger_v2:1104 : process_v2_sparse_read: sparse_read return 1000 buf 0000000000000000

Potentially, other process_v2_sparse_read() could be called in other threads. Otherwise, the whole prepared sparsed read has been processed in these three process_v2_sparse_read() before problematic one. Also, osd_sparse_read() could did something wrong in preparing sparse read.

[1] https://elixir.bootlin.com/linux/v6.17-rc7/source/net/ceph/osd_client.c#L5871
[2] https://elixir.bootlin.com/linux/v6.17-rc7/source/net/ceph/messenger_v2.c#L1100

Actions

Copy link

#17

Updated by Viacheslav Dubeyko 4 months ago

I think I need to spend some time for debugging the situation on my side. I need to build more clear understanding how code is working before making any conclusions about the issue nature.

Actions

Copy link

#18

Updated by Viacheslav Dubeyko 4 months ago

Ater Fax wrote in #note-13:

Please me know if anything should be redacted as I do see some info leakage in terms of paths / file names.

If you believe that some output in the log represent a security leakage, then, I think you need to delete the attachment, correct it, and upload corrected file again.

It's hard for me to make the conclusion what could represent such security leakage. File names look pretty neutral to me.

Actions

Copy link

#19

Updated by Ater Fax 4 months ago

Aye, I had a quick skim through last night and there wasn't anything that I saw that worried me.

Please let me know if you want me to do some further debugging / prodding of the affected files. Happy to help!

For now it sounds like my easiest path around the issue is likely to be mirroring the FSCrypted (decrypted) area to a different filesystem, then deleting and re-staging everything back into the CephFS FScrypted area on the newer kernel/s. I'll start the work for that, but retain at least a subset of files untouched so we have something to keep triggering the bug.

Thanks for the effort!

Actions

Copy link

#20

Updated by Ater Fax 4 months ago

Quick update on my attempt to re-stage all files onto a completely new separate CephFS filesystem this weekend:

- Using Ubuntu 22.04 with 6.8.0-84-generic (node003).
- A totally new Ceph FS file system.
- Initializing a new FSCrypt area inside the new CephFS area with defaults.
- rsync copying the existing data across from a running node (Ubuntu 22.04 - 6.5.0-41-generic) with the existing data area

I am seeing the same error occurring when attempting to read certain files. e.g. one was an AdGuard "querylog.json".

I am none the wiser what about my setup might be causing this.

Actions

Copy link

#21

Updated by Ater Fax 4 months ago

File area_decrypted.tar area_decrypted.tar added

As far as an MVP goes for reproducing this in the new area and a new FSCrypt within it goes, the attached hidden file I use just to indicate the area is decrypted was able to consistently reproduce the error (at least on my nodes).

tl;dr on making it happen

- Have Ubuntu 22.04 with 6.8.0-84-generic (or anything newer than 6.5.0-41-generic I bet).
- Initialize a new FSCrypt area inside the new CephFS area with default settings.
- Copy the file into the unlocked FSCrypt directory.
- Lock the FSCrypt directory.
- Unlock the FSCrypt directory.
- 'cat' the file while observing 'journalctl -k -f'

Same Ceph versions on the servers as detailed in the first report post.

Name: .area-decrypted
Size: 44 bytes
MD5: 6b5a13afab6ee4dc081a456c3f67ed0f

Name: area_decrypted.tar
Size: 10240 bytes : 10 KiB
MD5: 8bd563f222b844c44c3ae832a704907b

Actions

Copy link

#22

Updated by Ater Fax 4 months ago

Forgot to mention that before you lock the FSCrypt directory , you can actually still read the .area-decrypted without error. But if you lock, unlock and 'cat' again it will explode.

Actions

Copy link

#23

Updated by Viacheslav Dubeyko 4 months ago

Thanks a lot for sharing the file. I will try to reproduce the issue with it. Sorry, I was busy last two business days this testing snapshot related issue patchset. I should have time and Ceph cluster available today/tomorrow for debugging the issue.

Actions

Copy link

#24

Updated by Ater Fax 4 months ago

No worries! To be frank, I am a bit gobsmacked that the issue was replicated after mirroring the data store, let alone with that particular file. I'm really surprised the issue was preserved and communicable given I was rsyncing via SSH between nodes.

More so given it was created with a "touch" and "echo 'If you can read this the area is decrypted.' > .area-decrypted" originally. It is far from being special as far as I know and files go, yet crash things it still does.

Actions

Copy link

#25

Updated by Viacheslav Dubeyko 4 months ago

Status changed from New to In Progress
% Done changed from 0 to 20

I has been reproduced the issue. I missed the really important mount option initially (ms_mode=secure) and this is why I wasn't been able to reproduce the issue. The client on my side tried to use messenger v1 protocol instead of v2. Finally, I was able to reproduce the issue with such mount options:

sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure

The reproducing path:
(1) mkdir /mnt/cephfs/fscrypt-test-3
(2) cp area_decrypted.tar /mnt/cephfs/fscrypt-test-3
(3) fscrypt encrypt --source=raw_key --key=./my.key /mnt/cephfs/fscrypt-test-3
(4) fscrypt lock /mnt/cephfs/fscrypt-test-3
(5) fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3
(6) cat /mnt/cephfs/fscrypt-test-3/area_decrypted.tar
(7) Issue has been triggered

And I have such (pretty interesting) call trace:

[  408.072247] ------------[ cut here ]------------
[  408.072251] WARNING: CPU: 1 PID: 392 at net/ceph/messenger_v2.c:865 ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072267] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.072304] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Not tainted 6.17.0-rc7+ #1 PREEMPT(voluntary) 
[  408.072307] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[  408.072310] Workqueue: ceph-msgr ceph_con_workfn
[  408.072314] RIP: 0010:ceph_con_v2_try_read+0x4b39/0x72f0
[  408.072317] Code: c7 c1 20 f0 d4 ae 50 31 d2 48 c7 c6 60 27 d5 ae 48 c7 c7 f8 8e 6f b0 68 60 38 d5 ae e8 00 47 61 fe 48 83 c4 18 e9 ac fc ff ff <0f> 0b e9 06 fe ff ff 4c 8b 9d 98 fd ff ff 0f 84 64 e7 ff ff 89 85
[  408.072319] RSP: 0018:ffff88811c3e7a30 EFLAGS: 00010246
[  408.072322] RAX: ffffed1024874c6f RBX: ffffea00042c2b40 RCX: 0000000000000f38
[  408.072324] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  408.072325] RBP: ffff88811c3e7ca8 R08: 0000000000000000 R09: 00000000000000c8
[  408.072326] R10: 00000000000000c8 R11: 0000000000000000 R12: 00000000000000c8
[  408.072327] R13: dffffc0000000000 R14: ffff8881243a6030 R15: 0000000000003000
[  408.072329] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000
[  408.072331] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.072332] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.072336] PKRU: 55555554
[  408.072337] Call Trace:
[  408.072338]  <TASK>
[  408.072340]  ? sched_clock_noinstr+0x9/0x10
[  408.072344]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.072347]  ? _raw_spin_unlock+0xe/0x40
[  408.072349]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.072353]  ? __kasan_check_write+0x14/0x30
[  408.072357]  ? mutex_lock+0x84/0xe0
[  408.072359]  ? __pfx_mutex_lock+0x10/0x10
[  408.072361]  ceph_con_workfn+0x27e/0x10e0
[  408.072364]  ? metric_delayed_work+0x311/0x2c50
[  408.072367]  process_one_work+0x611/0xe20
[  408.072371]  ? __kasan_check_write+0x14/0x30
[  408.072373]  worker_thread+0x7e3/0x1580
[  408.072375]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.072378]  ? __pfx_worker_thread+0x10/0x10
[  408.072381]  kthread+0x381/0x7a0
[  408.072383]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.072385]  ? __pfx_kthread+0x10/0x10
[  408.072387]  ? __kasan_check_write+0x14/0x30
[  408.072389]  ? recalc_sigpending+0x160/0x220
[  408.072392]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.072394]  ? calculate_sigpending+0x78/0xb0
[  408.072395]  ? __pfx_kthread+0x10/0x10
[  408.072397]  ret_from_fork+0x2b6/0x380
[  408.072400]  ? __pfx_kthread+0x10/0x10
[  408.072402]  ret_from_fork_asm+0x1a/0x30
[  408.072406]  </TASK>
[  408.072407] ---[ end trace 0000000000000000 ]---
[  408.072418] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
[  408.072984] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[  408.073350] CPU: 1 UID: 0 PID: 392 Comm: kworker/1:3 Tainted: G        W           6.17.0-rc7+ #1 PREEMPT(voluntary) 
[  408.073886] Tainted: [W]=WARN
[  408.074042] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[  408.074468] Workqueue: ceph-msgr ceph_con_workfn
[  408.074694] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.074976] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.075884] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.076305] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.076909] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.077466] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.078034] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.078575] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.079159] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000
[  408.079736] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.080039] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.080376] PKRU: 55555554
[  408.080513] Call Trace:
[  408.080630]  <TASK>
[  408.080729]  ceph_con_v2_try_read+0x49b9/0x72f0
[  408.081115]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
[  408.081348]  ? _raw_spin_unlock+0xe/0x40
[  408.081538]  ? finish_task_switch.isra.0+0x15d/0x830
[  408.081768]  ? __kasan_check_write+0x14/0x30
[  408.081986]  ? mutex_lock+0x84/0xe0
[  408.082160]  ? __pfx_mutex_lock+0x10/0x10
[  408.082343]  ceph_con_workfn+0x27e/0x10e0
[  408.082529]  ? metric_delayed_work+0x311/0x2c50
[  408.082737]  process_one_work+0x611/0xe20
[  408.082948]  ? __kasan_check_write+0x14/0x30
[  408.083156]  worker_thread+0x7e3/0x1580
[  408.083331]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  408.083557]  ? __pfx_worker_thread+0x10/0x10
[  408.083751]  kthread+0x381/0x7a0
[  408.083922]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  408.084139]  ? __pfx_kthread+0x10/0x10
[  408.084310]  ? __kasan_check_write+0x14/0x30
[  408.084510]  ? recalc_sigpending+0x160/0x220
[  408.084708]  ? _raw_spin_unlock_irq+0xe/0x50
[  408.084917]  ? calculate_sigpending+0x78/0xb0
[  408.085138]  ? __pfx_kthread+0x10/0x10
[  408.085335]  ret_from_fork+0x2b6/0x380
[  408.085525]  ? __pfx_kthread+0x10/0x10
[  408.085720]  ret_from_fork_asm+0x1a/0x30
[  408.085922]  </TASK>
[  408.086036] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds psmouse serio_raw i2c_piix4 vga16fb bochs vgastate i2c_smbus floppy mac_hid qemu_fw_cfg pata_acpi sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
[  408.087778] ---[ end trace 0000000000000000 ]---
[  408.088007] RIP: 0010:ceph_msg_data_advance+0x79/0x1a80
[  408.088260] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 07 11 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e 0f 0e 00 00 8b 13 83 fa 03
[  408.089118] RSP: 0018:ffff88811c3e7990 EFLAGS: 00010246
[  408.089357] RAX: ffff8881243a6388 RBX: 0000000000000000 RCX: 0000000000000000
[  408.089678] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff8881243a6378
[  408.090020] RBP: ffff88811c3e7a20 R08: 0000000000000000 R09: 00000000000000c8
[  408.090360] R10: ffff8881243a6388 R11: 0000000000000000 R12: ffffed1024874c71
[  408.090687] R13: dffffc0000000000 R14: ffff8881243a6030 R15: ffff8881243a6378
[  408.091035] FS:  0000000000000000(0000) GS:ffff88823eadf000(0000) knlGS:0000000000000000
[  408.091452] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  408.092015] CR2: 000000c0003c6000 CR3: 000000010c106005 CR4: 0000000000772ef0
[  408.092530] PKRU: 55555554
[  417.112915] ==================================================================
[  417.113491] BUG: KASAN: slab-use-after-free in __mutex_lock.constprop.0+0x1522/0x1610
[  417.114014] Read of size 4 at addr ffff888124870034 by task kworker/2:0/4951

[  417.114587] CPU: 2 UID: 0 PID: 4951 Comm: kworker/2:0 Tainted: G      D W           6.17.0-rc7+ #1 PREEMPT(voluntary) 
[  417.114592] Tainted: [D]=DIE, [W]=WARN
[  417.114593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
[  417.114596] Workqueue: events handle_timeout
[  417.114601] Call Trace:
[  417.114602]  <TASK>
[  417.114604]  dump_stack_lvl+0x5c/0x90
[  417.114610]  print_report+0x171/0x4dc
[  417.114613]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114617]  ? kasan_complete_mode_report_info+0x80/0x220
[  417.114621]  kasan_report+0xbd/0x100
[  417.114625]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114628]  ? __mutex_lock.constprop.0+0x1522/0x1610
[  417.114630]  __asan_report_load4_noabort+0x14/0x30
[  417.114633]  __mutex_lock.constprop.0+0x1522/0x1610
[  417.114635]  ? queue_con_delay+0x8d/0x200
[  417.114638]  ? __pfx___mutex_lock.constprop.0+0x10/0x10
[  417.114641]  ? __send_subscribe+0x529/0xb20
[  417.114644]  __mutex_lock_slowpath+0x13/0x20
[  417.114646]  mutex_lock+0xd4/0xe0
[  417.114649]  ? __pfx_mutex_lock+0x10/0x10
[  417.114652]  ? ceph_monc_renew_subs+0x2a/0x40
[  417.114654]  ceph_con_keepalive+0x22/0x110
[  417.114656]  handle_timeout+0x6b3/0x11d0
[  417.114659]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114662]  ? __pfx_handle_timeout+0x10/0x10
[  417.114664]  ? queue_delayed_work_on+0x8e/0xa0
[  417.114669]  process_one_work+0x611/0xe20
[  417.114672]  ? __kasan_check_write+0x14/0x30
[  417.114676]  worker_thread+0x7e3/0x1580
[  417.114678]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  417.114682]  ? __pfx_sched_setscheduler_nocheck+0x10/0x10
[  417.114687]  ? __pfx_worker_thread+0x10/0x10
[  417.114689]  kthread+0x381/0x7a0
[  417.114692]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  417.114694]  ? __pfx_kthread+0x10/0x10
[  417.114697]  ? __kasan_check_write+0x14/0x30
[  417.114699]  ? recalc_sigpending+0x160/0x220
[  417.114703]  ? _raw_spin_unlock_irq+0xe/0x50
[  417.114705]  ? calculate_sigpending+0x78/0xb0
[  417.114707]  ? __pfx_kthread+0x10/0x10
[  417.114710]  ret_from_fork+0x2b6/0x380
[  417.114713]  ? __pfx_kthread+0x10/0x10
[  417.114715]  ret_from_fork_asm+0x1a/0x30
[  417.114720]  </TASK>

[  417.125171] Allocated by task 2:
[  417.125333]  kasan_save_stack+0x26/0x60
[  417.125522]  kasan_save_track+0x14/0x40
[  417.125742]  kasan_save_alloc_info+0x39/0x60
[  417.125945]  __kasan_slab_alloc+0x8b/0xb0
[  417.126133]  kmem_cache_alloc_node_noprof+0x13b/0x460
[  417.126381]  copy_process+0x320/0x6250
[  417.126595]  kernel_clone+0xb7/0x840
[  417.126792]  kernel_thread+0xd6/0x120
[  417.126995]  kthreadd+0x85c/0xbe0
[  417.127176]  ret_from_fork+0x2b6/0x380
[  417.127378]  ret_from_fork_asm+0x1a/0x30

[  417.127692] Freed by task 0:
[  417.127851]  kasan_save_stack+0x26/0x60
[  417.128057]  kasan_save_track+0x14/0x40
[  417.128267]  kasan_save_free_info+0x3b/0x60
[  417.128491]  __kasan_slab_free+0x6c/0xa0
[  417.128708]  kmem_cache_free+0x182/0x550
[  417.128906]  free_task+0xeb/0x140
[  417.129070]  __put_task_struct+0x1d2/0x4f0
[  417.129259]  __put_task_struct_rcu_cb+0x15/0x20
[  417.129480]  rcu_do_batch+0x3d3/0xe70
[  417.129681]  rcu_core+0x549/0xb30
[  417.129839]  rcu_core_si+0xe/0x20
[  417.130005]  handle_softirqs+0x160/0x570
[  417.130190]  __irq_exit_rcu+0x189/0x1e0
[  417.130369]  irq_exit_rcu+0xe/0x20
[  417.130531]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.130768]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.131082] Last potentially related work creation:
[  417.131305]  kasan_save_stack+0x26/0x60
[  417.131484]  kasan_record_aux_stack+0xae/0xd0
[  417.131695]  __call_rcu_common+0xcd/0x14b0
[  417.131909]  call_rcu+0x31/0x50
[  417.132071]  delayed_put_task_struct+0x128/0x190
[  417.132295]  rcu_do_batch+0x3d3/0xe70
[  417.132478]  rcu_core+0x549/0xb30
[  417.132658]  rcu_core_si+0xe/0x20
[  417.132808]  handle_softirqs+0x160/0x570
[  417.132993]  __irq_exit_rcu+0x189/0x1e0
[  417.133181]  irq_exit_rcu+0xe/0x20
[  417.133353]  sysvec_apic_timer_interrupt+0x9f/0xd0
[  417.133584]  asm_sysvec_apic_timer_interrupt+0x1b/0x20

[  417.133921] Second to last potentially related work creation:
[  417.134183]  kasan_save_stack+0x26/0x60
[  417.134362]  kasan_record_aux_stack+0xae/0xd0
[  417.134566]  __call_rcu_common+0xcd/0x14b0
[  417.134782]  call_rcu+0x31/0x50
[  417.134929]  put_task_struct_rcu_user+0x58/0xb0
[  417.135143]  finish_task_switch.isra.0+0x5d3/0x830
[  417.135366]  __schedule+0xd30/0x5100
[  417.135534]  schedule_idle+0x5a/0x90
[  417.135712]  do_idle+0x25f/0x410
[  417.135871]  cpu_startup_entry+0x53/0x70
[  417.136053]  start_secondary+0x216/0x2c0
[  417.136233]  common_startup_64+0x13e/0x141

[  417.136894] The buggy address belongs to the object at ffff888124870000
                which belongs to the cache task_struct of size 10504
[  417.138122] The buggy address is located 52 bytes inside of
                freed 10504-byte region [ffff888124870000, ffff888124872908)

[  417.139465] The buggy address belongs to the physical page:
[  417.140016] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x124870
[  417.140789] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[  417.141519] memcg:ffff88811aa20e01
[  417.141874] anon flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[  417.142600] page_type: f5(slab)
[  417.142922] raw: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001
[  417.143554] raw: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01
[  417.143954] head: 0017ffffc0000040 ffff88810094f040 0000000000000000 dead000000000001
[  417.144329] head: 0000000000000000 0000000000030003 00000000f5000000 ffff88811aa20e01
[  417.144710] head: 0017ffffc0000003 ffffea0004921c01 00000000ffffffff 00000000ffffffff
[  417.145106] head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
[  417.145485] page dumped because: kasan: bad access detected

[  417.145859] Memory state around the buggy address:
[  417.146094]  ffff88812486ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  417.146439]  ffff88812486ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  417.146791] >ffff888124870000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  417.147145]                                      ^
[  417.147387]  ffff888124870080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  417.147751]  ffff888124870100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  417.148123] ==================================================================

Actions

Copy link

#26

Updated by Viacheslav Dubeyko 4 months ago

First of all, we have warning:

Oct  1 13:07:57 ceph-0006 kernel: [   67.092868] ------------[ cut here ]------------
Oct  1 13:07:57 ceph-0006 kernel: [   67.092871] WARNING: CPU: 3 PID: 234 at net/ceph/messenger_v2.c:865 get_bvec_at+0x1ca/0x230
Oct  1 13:07:57 ceph-0006 kernel: [   67.092886] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds floppy psmouse mac_hid i2c_piix4 vga16fb i2c_smbus pata_acpi serio_raw qemu_fw_cfg bochs vgastate sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
Oct  1 13:07:57 ceph-0006 kernel: [   67.092943] CPU: 3 UID: 0 PID: 234 Comm: kworker/3:2 Not tainted 6.17.0-rc7+ #6 PREEMPT(voluntary) 
Oct  1 13:07:57 ceph-0006 kernel: [   67.092948] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
Oct  1 13:07:57 ceph-0006 kernel: [   67.092951] Workqueue: ceph-msgr ceph_con_workfn
Oct  1 13:07:57 ceph-0006 kernel: [   67.092958] RIP: 0010:get_bvec_at+0x1ca/0x230
Oct  1 13:07:57 ceph-0006 kernel: [   67.092961] Code: 65 48 2b 05 d0 a9 bc 02 75 77 48 81 c4 88 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b e9 c6 fe ff ff e8 6a e9 2d fd eb a4 e8 63 e9 2d fd e9 6c ff
Oct  1 13:07:57 ceph-0006 kernel: [   67.092965] RSP: 0018:ffff888102f67920 EFLAGS: 00010246
Oct  1 13:07:57 ceph-0006 kernel: [   67.092968] RAX: 1ffff11029f3b86f RBX: ffff88814f9dc378 RCX: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092971] RDX: dffffc0000000000 RSI: ffff888102f67ba0 RDI: ffff88814f9dc378
Oct  1 13:07:57 ceph-0006 kernel: [   67.092973] RBP: ffff888102f679d0 R08: 0000000000000000 R09: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092975] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000003000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092978] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc030
Oct  1 13:07:57 ceph-0006 kernel: [   67.092980] FS:  0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  1 13:07:57 ceph-0006 kernel: [   67.092985] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0
Oct  1 13:07:57 ceph-0006 kernel: [   67.092990] PKRU: 55555554
Oct  1 13:07:57 ceph-0006 kernel: [   67.092992] Call Trace:
Oct  1 13:07:57 ceph-0006 kernel: [   67.092994]  <TASK>
Oct  1 13:07:57 ceph-0006 kernel: [   67.092995]  ? __pfx__printk+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093002]  ? __pfx_get_bvec_at+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093006]  ceph_con_v2_try_read.cold+0x19d5/0x1ac7
Oct  1 13:07:57 ceph-0006 kernel: [   67.093011]  ? update_entity_lag+0x49/0x190
Oct  1 13:07:57 ceph-0006 kernel: [   67.093017]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093021]  ? finish_task_switch.isra.0+0x15d/0x830
Oct  1 13:07:57 ceph-0006 kernel: [   67.093025]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093030]  ? mutex_lock+0x84/0xe0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093034]  ? __pfx_mutex_lock+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093037]  ceph_con_workfn+0x27e/0x10e0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093041]  process_one_work+0x611/0xe20
Oct  1 13:07:57 ceph-0006 kernel: [   67.093045]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093049]  worker_thread+0x7e3/0x1580
Oct  1 13:07:57 ceph-0006 kernel: [   67.093053]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093058]  ? __pfx_worker_thread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093061]  kthread+0x381/0x7a0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093065]  ? __pfx__raw_spin_lock_irq+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093067]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093070]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093073]  ? recalc_sigpending+0x160/0x220
Oct  1 13:07:57 ceph-0006 kernel: [   67.093077]  ? _raw_spin_unlock_irq+0xe/0x50
Oct  1 13:07:57 ceph-0006 kernel: [   67.093080]  ? calculate_sigpending+0x78/0xb0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093082]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093085]  ret_from_fork+0x2b6/0x380
Oct  1 13:07:57 ceph-0006 kernel: [   67.093089]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093105]  ret_from_fork_asm+0x1a/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093111]  </TASK>
Oct  1 13:07:57 ceph-0006 kernel: [   67.093112] ---[ end trace 0000000000000000 ]---

static void get_bvec_at(struct ceph_msg_data_cursor *cursor,
            struct bio_vec *bv)
{
    struct page *page;
    size_t off, len;

    WARN_ON(!cursor->total_resid); <-- We have warning here!

    /* skip zero-length data items */
    while (!cursor->resid)
        ceph_msg_data_advance(cursor, 0);

    /* get a piece of data, cursor isn't advanced */
    page = ceph_msg_data_next(cursor, &off, &len);
    bvec_set_page(bv, page, len, off);
}

And the warning has been triggered because: cursor->total_resid has not been initialized. As far as I can see, cursor->total_resid should be initialized in ceph_msg_data_cursor_init():

void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor,
                   struct ceph_msg *msg, size_t length)
{
    BUG_ON(!length);
    BUG_ON(length > msg->data_length);
    BUG_ON(!msg->num_data_items);

    cursor->total_resid = length;
    cursor->data = msg->data;
    cursor->sr_resid = 0;

    __ceph_msg_data_cursor_init(cursor);
}

And it is possible to see that it is usually called:

Oct  1 13:07:52 ceph-0006 kernel: [   61.313466] libceph: pid 234:net/ceph/messenger_v2.c:1222 decrypt_tail(): data_length 0, data 0000000000000000, num_data_items 0, max_data_items 0, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 0, front_alloc_len 805
Oct  1 13:07:52 ceph-0006 kernel: [   61.321089] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting...
Oct  1 13:07:52 ceph-0006 kernel: [   61.322694] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting...
Oct  1 13:07:52 ceph-0006 kernel: [   61.323200] libceph: pid 234:net/ceph/messenger_v2.c:1033 setup_message_sgs(): dlen 124
Oct  1 13:07:52 ceph-0006 kernel: [   61.323691] libceph: pid 234:net/ceph/messenger.c:1079 ceph_msg_data_cursor_init(): length 124, msg->data_length 124, msg->num_data_items 1
Oct  1 13:07:52 ceph-0006 kernel: [   61.324500] libceph: pid 234:net/ceph/messenger.c:1040 __ceph_msg_data_cursor_init(): cursor->total_resid 124, cursor->data->type 0x5

However, we haven't such call for the case of issue:

Oct  1 13:07:57 ceph-0006 kernel: [   67.056014] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting...
Oct  1 13:07:57 ceph-0006 kernel: [   67.058569] libceph: pid 234:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting...
Oct  1 13:07:57 ceph-0006 kernel: [   67.060054] libceph: pid 234:net/ceph/messenger_v2.c:1222 decrypt_tail(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570
Oct  1 13:07:57 ceph-0006 kernel: [   67.062329] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.063731] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x0
Oct  1 13:07:57 ceph-0006 kernel: [   67.064254] libceph: pid 234:net/ceph/osd_client.c:5813 osd_sparse_read(): CEPH_SPARSE_READ_HDR
Oct  1 13:07:57 ceph-0006 kernel: [   67.065655] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.068180] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570
Oct  1 13:07:57 ceph-0006 kernel: [   67.071938] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 4 buf 000000004fc26be8
Oct  1 13:07:57 ceph-0006 kernel: [   67.072806] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.074821] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x1
Oct  1 13:07:57 ceph-0006 kernel: [   67.075553] libceph: pid 234:net/ceph/osd_client.c:5831 osd_sparse_read(): CEPH_SPARSE_READ_EXTENTS: count 1
Oct  1 13:07:57 ceph-0006 kernel: [   67.076282] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.077478] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570
Oct  1 13:07:57 ceph-0006 kernel: [   67.079144] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 10 buf 0000000086bcece0
Oct  1 13:07:57 ceph-0006 kernel: [   67.080020] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.081020] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x2
Oct  1 13:07:57 ceph-0006 kernel: [   67.081733] libceph: pid 234:net/ceph/osd_client.c:5862 osd_sparse_read(): CEPH_SPARSE_READ_DATA_LEN: sr->sr_datalen 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.082605] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.083612] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570
Oct  1 13:07:57 ceph-0006 kernel: [   67.085178] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 4 buf 000000005d398447
Oct  1 13:07:57 ceph-0006 kernel: [   67.086028] libceph: pid 234:net/ceph/messenger_v2.c:1109 process_v2_sparse_read(): BEFORE sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 0
Oct  1 13:07:57 ceph-0006 kernel: [   67.087009] libceph: pid 234:net/ceph/osd_client.c:5805 osd_sparse_read(): sr->sr_state 0x3
Oct  1 13:07:57 ceph-0006 kernel: [   67.087662] libceph: pid 234:net/ceph/osd_client.c:5871 osd_sparse_read(): CEPH_SPARSE_READ_DATA_PRE: sr->sr_datalen 12288
Oct  1 13:07:57 ceph-0006 kernel: [   67.088625] libceph: pid 234:net/ceph/osd_client.c:5893 osd_sparse_read(): CEPH_SPARSE_READ_DATA: ext 0 off 0x0 len 0x3000
Oct  1 13:07:57 ceph-0006 kernel: [   67.089516] libceph: pid 234:net/ceph/messenger_v2.c:1126 process_v2_sparse_read(): AFTER sparse_read: total_resid 0, data 0000000000000000, resid 0, sr_resid 12288
Oct  1 13:07:57 ceph-0006 kernel: [   67.090606] libceph: pid 234:net/ceph/messenger_v2.c:1143 process_v2_sparse_read(): data_length 12288, data 00000000ff24bbcb, num_data_items 1, max_data_items 1, more_to_follow 0x0, needs_out_seq 0x0, sparse_read_total 12288, front_alloc_len 570
Oct  1 13:07:57 ceph-0006 kernel: [   67.091797] libceph: pid 234:net/ceph/messenger_v2.c:1157 process_v2_sparse_read(): sparse_read return 3000 buf 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092868] ------------[ cut here ]------------
Oct  1 13:07:57 ceph-0006 kernel: [   67.092871] WARNING: CPU: 3 PID: 234 at net/ceph/messenger_v2.c:865 get_bvec_at+0x1ca/0x230
Oct  1 13:07:57 ceph-0006 kernel: [   67.092886] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds floppy psmouse mac_hid i2c_piix4 vga16fb i2c_smbus pata_acpi serio_raw qemu_fw_cfg bochs vgastate sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
Oct  1 13:07:57 ceph-0006 kernel: [   67.092943] CPU: 3 UID: 0 PID: 234 Comm: kworker/3:2 Not tainted 6.17.0-rc7+ #6 PREEMPT(voluntary) 
Oct  1 13:07:57 ceph-0006 kernel: [   67.092948] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
Oct  1 13:07:57 ceph-0006 kernel: [   67.092951] Workqueue: ceph-msgr ceph_con_workfn
Oct  1 13:07:57 ceph-0006 kernel: [   67.092958] RIP: 0010:get_bvec_at+0x1ca/0x230
Oct  1 13:07:57 ceph-0006 kernel: [   67.092961] Code: 65 48 2b 05 d0 a9 bc 02 75 77 48 81 c4 88 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc <0f> 0b e9 c6 fe ff ff e8 6a e9 2d fd eb a4 e8 63 e9 2d fd e9 6c ff
Oct  1 13:07:57 ceph-0006 kernel: [   67.092965] RSP: 0018:ffff888102f67920 EFLAGS: 00010246
Oct  1 13:07:57 ceph-0006 kernel: [   67.092968] RAX: 1ffff11029f3b86f RBX: ffff88814f9dc378 RCX: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092971] RDX: dffffc0000000000 RSI: ffff888102f67ba0 RDI: ffff88814f9dc378
Oct  1 13:07:57 ceph-0006 kernel: [   67.092973] RBP: ffff888102f679d0 R08: 0000000000000000 R09: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092975] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000003000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092978] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc030
Oct  1 13:07:57 ceph-0006 kernel: [   67.092980] FS:  0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.092983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  1 13:07:57 ceph-0006 kernel: [   67.092985] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0
Oct  1 13:07:57 ceph-0006 kernel: [   67.092990] PKRU: 55555554
Oct  1 13:07:57 ceph-0006 kernel: [   67.092992] Call Trace:
Oct  1 13:07:57 ceph-0006 kernel: [   67.092994]  <TASK>
Oct  1 13:07:57 ceph-0006 kernel: [   67.092995]  ? __pfx__printk+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093002]  ? __pfx_get_bvec_at+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093006]  ceph_con_v2_try_read.cold+0x19d5/0x1ac7
Oct  1 13:07:57 ceph-0006 kernel: [   67.093011]  ? update_entity_lag+0x49/0x190
Oct  1 13:07:57 ceph-0006 kernel: [   67.093017]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093021]  ? finish_task_switch.isra.0+0x15d/0x830
Oct  1 13:07:57 ceph-0006 kernel: [   67.093025]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093030]  ? mutex_lock+0x84/0xe0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093034]  ? __pfx_mutex_lock+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093037]  ceph_con_workfn+0x27e/0x10e0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093041]  process_one_work+0x611/0xe20
Oct  1 13:07:57 ceph-0006 kernel: [   67.093045]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093049]  worker_thread+0x7e3/0x1580
Oct  1 13:07:57 ceph-0006 kernel: [   67.093053]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093058]  ? __pfx_worker_thread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093061]  kthread+0x381/0x7a0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093065]  ? __pfx__raw_spin_lock_irq+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093067]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093070]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093073]  ? recalc_sigpending+0x160/0x220
Oct  1 13:07:57 ceph-0006 kernel: [   67.093077]  ? _raw_spin_unlock_irq+0xe/0x50
Oct  1 13:07:57 ceph-0006 kernel: [   67.093080]  ? calculate_sigpending+0x78/0xb0
Oct  1 13:07:57 ceph-0006 kernel: [   67.093082]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093085]  ret_from_fork+0x2b6/0x380
Oct  1 13:07:57 ceph-0006 kernel: [   67.093089]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.093105]  ret_from_fork_asm+0x1a/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.093111]  </TASK>
Oct  1 13:07:57 ceph-0006 kernel: [   67.093112] ---[ end trace 0000000000000000 ]---
Oct  1 13:07:57 ceph-0006 kernel: [   67.093128] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
Oct  1 13:07:57 ceph-0006 kernel: [   67.094240] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
Oct  1 13:07:57 ceph-0006 kernel: [   67.094677] CPU: 3 UID: 0 PID: 234 Comm: kworker/3:2 Tainted: G        W           6.17.0-rc7+ #6 PREEMPT(voluntary) 
Oct  1 13:07:57 ceph-0006 kernel: [   67.095460] Tainted: [W]=WARN
Oct  1 13:07:57 ceph-0006 kernel: [   67.095664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
Oct  1 13:07:57 ceph-0006 kernel: [   67.096625] Workqueue: ceph-msgr ceph_con_workfn
Oct  1 13:07:57 ceph-0006 kernel: [   67.097354] RIP: 0010:ceph_msg_data_advance+0x79/0x1a40
Oct  1 13:07:57 ceph-0006 kernel: [   67.098337] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 d6 10 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e de 0d 00 00 8b 13 83 fa 03
Oct  1 13:07:57 ceph-0006 kernel: [   67.099217] RSP: 0018:ffff888102f67880 EFLAGS: 00010246
Oct  1 13:07:57 ceph-0006 kernel: [   67.099488] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.099821] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88814f9dc378
Oct  1 13:07:57 ceph-0006 kernel: [   67.100145] RBP: ffff888102f67910 R08: 0000000000000000 R09: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.100517] R10: ffff88814f9dc388 R11: 0000000000000000 R12: ffffed1029f3b871
Oct  1 13:07:57 ceph-0006 kernel: [   67.100846] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc378
Oct  1 13:07:57 ceph-0006 kernel: [   67.101175] FS:  0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.101575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  1 13:07:57 ceph-0006 kernel: [   67.101840] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0
Oct  1 13:07:57 ceph-0006 kernel: [   67.102160] PKRU: 55555554
Oct  1 13:07:57 ceph-0006 kernel: [   67.102292] Call Trace:
Oct  1 13:07:57 ceph-0006 kernel: [   67.102411]  <TASK>
Oct  1 13:07:57 ceph-0006 kernel: [   67.102536]  get_bvec_at+0xdb/0x230
Oct  1 13:07:57 ceph-0006 kernel: [   67.102705]  ? __pfx__printk+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.102883]  ? __pfx_get_bvec_at+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.103068]  ceph_con_v2_try_read.cold+0x19d5/0x1ac7
Oct  1 13:07:57 ceph-0006 kernel: [   67.103299]  ? update_entity_lag+0x49/0x190
Oct  1 13:07:57 ceph-0006 kernel: [   67.103533]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.103761]  ? finish_task_switch.isra.0+0x15d/0x830
Oct  1 13:07:57 ceph-0006 kernel: [   67.104003]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.104207]  ? mutex_lock+0x84/0xe0
Oct  1 13:07:57 ceph-0006 kernel: [   67.104371]  ? __pfx_mutex_lock+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.104570]  ceph_con_workfn+0x27e/0x10e0
Oct  1 13:07:57 ceph-0006 kernel: [   67.104765]  process_one_work+0x611/0xe20
Oct  1 13:07:57 ceph-0006 kernel: [   67.105001]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.105246]  worker_thread+0x7e3/0x1580
Oct  1 13:07:57 ceph-0006 kernel: [   67.105654]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.105934]  ? __pfx_worker_thread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.106501]  kthread+0x381/0x7a0
Oct  1 13:07:57 ceph-0006 kernel: [   67.106865]  ? __pfx__raw_spin_lock_irq+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.107725]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.108415]  ? __kasan_check_write+0x14/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.109047]  ? recalc_sigpending+0x160/0x220
Oct  1 13:07:57 ceph-0006 kernel: [   67.109536]  ? _raw_spin_unlock_irq+0xe/0x50
Oct  1 13:07:57 ceph-0006 kernel: [   67.109895]  ? calculate_sigpending+0x78/0xb0
Oct  1 13:07:57 ceph-0006 kernel: [   67.110389]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.110626]  ret_from_fork+0x2b6/0x380
Oct  1 13:07:57 ceph-0006 kernel: [   67.111006]  ? __pfx_kthread+0x10/0x10
Oct  1 13:07:57 ceph-0006 kernel: [   67.111227]  ret_from_fork_asm+0x1a/0x30
Oct  1 13:07:57 ceph-0006 kernel: [   67.111465]  </TASK>
Oct  1 13:07:57 ceph-0006 kernel: [   67.111768] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel joydev kvm irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl input_leds floppy psmouse mac_hid i2c_piix4 vga16fb i2c_smbus pata_acpi serio_raw qemu_fw_cfg bochs vgastate sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
Oct  1 13:07:57 ceph-0006 kernel: [   67.114216] ---[ end trace 0000000000000000 ]---
Oct  1 13:07:57 ceph-0006 kernel: [   67.114523] RIP: 0010:ceph_msg_data_advance+0x79/0x1a40
Oct  1 13:07:57 ceph-0006 kernel: [   67.114951] Code: fc ff df 49 8d 77 08 48 c1 ee 03 80 3c 16 00 0f 85 d6 10 00 00 48 ba 00 00 00 00 00 fc ff df 49 8b 5f 08 48 89 de 48 c1 ee 03 <0f> b6 14 16 84 d2 74 09 80 fa 03 0f 8e de 0d 00 00 8b 13 83 fa 03
Oct  1 13:07:57 ceph-0006 kernel: [   67.116118] RSP: 0018:ffff888102f67880 EFLAGS: 00010246
Oct  1 13:07:57 ceph-0006 kernel: [   67.116544] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.116973] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: ffff88814f9dc378
Oct  1 13:07:57 ceph-0006 kernel: [   67.117370] RBP: ffff888102f67910 R08: 0000000000000000 R09: 0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.117895] R10: ffff88814f9dc388 R11: 0000000000000000 R12: ffffed1029f3b871
Oct  1 13:07:57 ceph-0006 kernel: [   67.118276] R13: 1ffff110205ecf25 R14: ffff888102f67ba0 R15: ffff88814f9dc378
Oct  1 13:07:57 ceph-0006 kernel: [   67.118767] FS:  0000000000000000(0000) GS:ffff888248bde000(0000) knlGS:0000000000000000
Oct  1 13:07:57 ceph-0006 kernel: [   67.119211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct  1 13:07:57 ceph-0006 kernel: [   67.119591] CR2: 00007ea02c000020 CR3: 000000013015d005 CR4: 0000000000772ef0
Oct  1 13:07:57 ceph-0006 kernel: [   67.119983] PKRU: 55555554

Potentially, it could be the reason of the issue.

Actions

Copy link

#27

Updated by Viacheslav Dubeyko 4 months ago

@aterfax I am having a vacation for next 3 weeks. So, it will be no progress on the issue for the next 3 weeks. I need some time for investigating the issue yet. And I don't expect to fix it today or tomorrow.

Thanks,
Slava.

Actions

Copy link

#28

Updated by Ater Fax 4 months ago

No worries, enjoy your holiday!

Many thanks for letting me know!

Actions

Copy link

#29

Updated by Viacheslav Dubeyko 2 months ago

My initial assumption was that setup_message_sgs() requires to initialize cursor->total_resid and cursor->data. However, it is wrong assumption because process_v2_sparse_read() reset cursor->total_resid and cursor->data. And we still have crash:

Nov  4 14:36:53 ceph-0006 kernel: [   73.192827] ------------[ cut here ]------------
Nov  4 14:36:53 ceph-0006 kernel: [   73.192829] kernel BUG at net/ceph/messenger.c:1143!
Nov  4 14:36:53 ceph-0006 kernel: [   73.193158] Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
Nov  4 14:36:53 ceph-0006 kernel: [   73.193577] CPU: 1 UID: 0 PID: 239 Comm: kworker/1:2 Tainted: G        W           6.17.0-rc7+ #10 PREEMPT(voluntary) 
Nov  4 14:36:53 ceph-0006 kernel: [   73.194540] Tainted: [W]=WARN
Nov  4 14:36:53 ceph-0006 kernel: [   73.194768] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
Nov  4 14:36:53 ceph-0006 kernel: [   73.195353] Workqueue: ceph-msgr ceph_con_workfn
Nov  4 14:36:53 ceph-0006 kernel: [   73.195660] RIP: 0010:ceph_msg_data_advance+0xa7c/0x1a50
Nov  4 14:36:53 ceph-0006 kernel: [   73.196006] Code: ff df 4c 89 d6 48 c1 ee 03 80 3c 16 00 0f 84 6c fc ff ff 4c 89 d7 48 89 4d d0 e8 9f 4c 36 fd 48 8b 4d d0 e9 57 fc ff ff 0f 0b <0f> 0b 41 39 c9 0f 82 02 01 00 00 41 29 c9 41 01 cd 49 8b 73 78 48
Nov  4 14:36:53 ceph-0006 kernel: [   73.200589] RSP: 0018:ffff8881257a7880 EFLAGS: 00010246
Nov  4 14:36:53 ceph-0006 kernel: [   73.201775] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
Nov  4 14:36:53 ceph-0006 kernel: [   73.203406] RDX: dffffc0000000000 RSI: 1ffff11022195870 RDI: ffff888110cac378
Nov  4 14:36:53 ceph-0006 kernel: [   73.203857] RBP: ffff8881257a7910 R08: 0000000000000000 R09: 0000000000000000
Nov  4 14:36:53 ceph-0006 kernel: [   73.204684] R10: ffff888110cac388 R11: 0000000000000000 R12: ffffed1022195871
Nov  4 14:36:53 ceph-0006 kernel: [   73.205100] R13: 1ffff11024af4f25 R14: ffff8881257a7ba0 R15: ffff888110cac378
Nov  4 14:36:53 ceph-0006 kernel: [   73.206839] FS:  0000000000000000(0000) GS:ffff888258ade000(0000) knlGS:0000000000000000
Nov  4 14:36:53 ceph-0006 kernel: [   73.208571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  4 14:36:53 ceph-0006 kernel: [   73.209865] CR2: 0000624cfb53ef50 CR3: 000000010bf4f006 CR4: 0000000000772ef0
Nov  4 14:36:53 ceph-0006 kernel: [   73.210886] PKRU: 55555554
Nov  4 14:36:53 ceph-0006 kernel: [   73.211113] Call Trace:
Nov  4 14:36:53 ceph-0006 kernel: [   73.211355]  <TASK>
Nov  4 14:36:53 ceph-0006 kernel: [   73.211530]  get_bvec_at+0xdb/0x230
Nov  4 14:36:53 ceph-0006 kernel: [   73.211789]  ? __pfx__printk+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.212103]  ? __pfx_get_bvec_at+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.212410]  ceph_con_v2_try_read.cold+0x19d5/0x1ac7
Nov  4 14:36:53 ceph-0006 kernel: [   73.212785]  ? update_entity_lag+0x49/0x190
Nov  4 14:36:53 ceph-0006 kernel: [   73.212947] libceph: pid 889:net/ceph/messenger_v2.c:1013 setup_message_sgs(): starting...
Nov  4 14:36:53 ceph-0006 kernel: [   73.213073]  ? __pfx_ceph_con_v2_try_read+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.213731] libceph: pid 889:net/ceph/messenger_v2.c:1016 setup_message_sgs(): front_len(msg) 328, middle_len(msg) 0, data_len(msg) 0, pages 0000000000000000
Nov  4 14:36:53 ceph-0006 kernel: [   73.214078]  ? finish_task_switch.isra.0+0x15d/0x830
Nov  4 14:36:53 ceph-0006 kernel: [   73.214083]  ? __kasan_check_write+0x14/0x30
Nov  4 14:36:53 ceph-0006 kernel: [   73.214087]  ? mutex_lock+0x84/0xe0
Nov  4 14:36:53 ceph-0006 kernel: [   73.214091]  ? __pfx_mutex_lock+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.214095]  ceph_con_workfn+0x27e/0x10e0
Nov  4 14:36:53 ceph-0006 kernel: [   73.216918]  process_one_work+0x611/0xe20
Nov  4 14:36:53 ceph-0006 kernel: [   73.217215]  ? __kasan_check_write+0x14/0x30
Nov  4 14:36:53 ceph-0006 kernel: [   73.217541]  worker_thread+0x7e3/0x1580
Nov  4 14:36:53 ceph-0006 kernel: [   73.217775]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.218063]  ? __pfx_worker_thread+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.218353]  kthread+0x381/0x7a0
Nov  4 14:36:53 ceph-0006 kernel: [   73.218593]  ? __pfx__raw_spin_lock_irq+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.218880]  ? __pfx_kthread+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.219128]  ? __kasan_check_write+0x14/0x30
Nov  4 14:36:53 ceph-0006 kernel: [   73.219452]  ? recalc_sigpending+0x160/0x220
Nov  4 14:36:53 ceph-0006 kernel: [   73.219751]  ? _raw_spin_unlock_irq+0xe/0x50
Nov  4 14:36:53 ceph-0006 kernel: [   73.220044]  ? calculate_sigpending+0x78/0xb0
Nov  4 14:36:53 ceph-0006 kernel: [   73.220350]  ? __pfx_kthread+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.220602]  ret_from_fork+0x2b6/0x380
Nov  4 14:36:53 ceph-0006 kernel: [   73.220869]  ? __pfx_kthread+0x10/0x10
Nov  4 14:36:53 ceph-0006 kernel: [   73.221131]  ret_from_fork_asm+0x1a/0x30
Nov  4 14:36:53 ceph-0006 kernel: [   73.221431]  </TASK>
Nov  4 14:36:53 ceph-0006 kernel: [   73.221590] Modules linked in: intel_rapl_msr intel_rapl_common intel_uncore_frequency_common intel_pmc_core pmt_telemetry pmt_discovery pmt_class intel_pmc_ssram_telemetry intel_vsec kvm_intel kvm joydev irqbypass polyval_clmulni ghash_clmulni_intel aesni_intel rapl floppy input_leds mac_hid i2c_piix4 psmouse vga16fb serio_raw vgastate i2c_smbus pata_acpi bochs qemu_fw_cfg sch_fq_codel rbd msr parport_pc ppdev lp parport efi_pstore
Nov  4 14:36:53 ceph-0006 kernel: [   73.224209] ---[ end trace 0000000000000000 ]---

Currently, I assume that it needs to initialize cursor->total_resid and cursor->data exactly before CEPH_SPARSE_READ_DATA.

Actions

Copy link

#30

Updated by Viacheslav Dubeyko 2 months ago

% Done changed from 20 to 50

I am testing this fix now:

git diff
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index 9f6d860411cb..e8370e6806fb 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -1128,6 +1128,7 @@ void ceph_msg_data_advance(struct ceph_msg_data_cursor *cursor, size_t bytes)
        bool new_piece;

        BUG_ON(bytes > cursor->resid);
+       BUG_ON(!cursor->data);
        switch (cursor->data->type) {
        case CEPH_MSG_DATA_PAGELIST:
                new_piece = ceph_msg_data_pagelist_advance(cursor, bytes);
diff --git a/net/ceph/messenger_v2.c b/net/ceph/messenger_v2.c
index 5483b4eed94e..a5fc4576f51f 100644
--- a/net/ceph/messenger_v2.c
+++ b/net/ceph/messenger_v2.c
@@ -1090,6 +1090,8 @@ static int process_v2_sparse_read(struct ceph_connection *con,
        struct ceph_msg_data_cursor *cursor = &con->v2.in_cursor;
        int ret;

+       ceph_msg_data_cursor_init(cursor, con->in_msg, con->in_msg->data_length);
+
        for (;;) {
                char *buf = NULL;

Actions

Copy link

#31

Updated by Viacheslav Dubeyko 2 months ago

As far as I can see, this fix works:

sudo mount -t ceph :/ /mnt/cephfs/ -o name=admin,fs=cephfs,ms_mode=secure
cd /mnt/cephfs/

ls -lah ./fscrypt-test-3
total 10K
drwx------  2 root   root   1 Sep 29 14:47 .
drwxr-xr-x 16 root   root  31 Nov  5 14:37 ..
-rw-r--r--  1 slavad 1001 10K Sep 29 14:34 KsaCwBA0zyUFdVQX5badasKvt9tuxJ7LgvKMvBYBXRA

fscrypt unlock --key=my.key /mnt/cephfs/fscrypt-test-3
"/mnt/cephfs/fscrypt-test-3" is now unlocked and ready for use.

cat ./fscrypt-test-3/area_decrypted.tar
.area-decrypted0000644000000000000000000000005414640301620012441 0ustar  rootrootIf you can read this the area is decrypted.

Now it needs to run xfstests to check that the fix doesn't break other test-cases.

Actions

Copy link

#32

Updated by Ater Fax 2 months ago

Many thanks for the effort and fix, much appreciated!

Actions

Copy link

#33

Updated by Viacheslav Dubeyko 2 months ago

Currently, I can see that several xfstsests have failed:

Failures: generic/198 generic/452 generic/580 generic/639
Failed 4 of 627 tests

However, I never had run xfstsest for messenger protocol v2 before. So, potentially, the reason of the failures could be not the fix but another issues. So, I need to double check the current situation.

Actions

Copy link

#34

Updated by Viacheslav Dubeyko 2 months ago

I see the same issues with generic/198, generic/452, generic/580, generic/639 without applying the patch with fix. So, we have another several issues for the case of messenger protocol v2. I need to create another tickets for these issues investigation.

I am preparing the patch with fix for sending to open-source community for the review.

Actions

Copy link

#35

Updated by Viacheslav Dubeyko 2 months ago

Status changed from In Progress to Fix Under Review
% Done changed from 50 to 90

Fix has been sent to open-source community for review:

https://lore.kernel.org/ceph-devel/20251111205627.475128-2-slava@dubeyko.com/T/#u

Actions

Copy link

#36

Updated by Viacheslav Dubeyko 2 months ago

Second version of the patch has been sent to community for review:

https://lore.kernel.org/ceph-devel/20251112195246.495313-2-slava@dubeyko.com/T/#u

Actions

Copy link

#37

Updated by Viacheslav Dubeyko 2 months ago

I created several tickets:
(1) generic/198 issue: https://tracker.ceph.com/issues/73827
(2) generic/452 issue: https://tracker.ceph.com/issues/73828
(3) generic/639 issue: https://tracker.ceph.com/issues/73829

As far as I can see, I cannot reproduce the generic/580 issue. I have checked that generic/580 test-case triggers the kernel crash without the patch. But there is no crash with patch applied.

Actions

Copy link

#38

Updated by Viacheslav Dubeyko 2 months ago

Third version of the patch has been sent to community for review:

https://lore.kernel.org/ceph-devel/20251113223623.508000-2-slava@dubeyko.com/T/#u

Actions

Copy link

#39

Updated by Viacheslav Dubeyko 2 months ago

% Done changed from 90 to 100

Patch has been applied on testing branch [1]:

commit b33c978f260a00c749f865a7d682f7ea4603cc58 (HEAD -> testing, origin/testing)
Author: Viacheslav Dubeyko <Slava.Dubeyko@ibm.com>
Date: Thu Nov 13 14:36:24 2025 -0800

[1] https://github.com/ceph/ceph-client.git

Actions

Copy link

#40

Updated by Viacheslav Dubeyko about 2 months ago

Status changed from Fix Under Review to Resolved

Patch has been sent to upstream.

https://lore.kernel.org/ceph-devel/176427376575.20489.18398247980926572316.pr-tracker-bot@kernel.org/T/#t

Actions

Copy link

#41

Updated by Ater Fax about 2 months ago

Thanks for the update on progress and for the patch. Happy Thanksgiving if you're US based!

Actions

Copy link

#42

Updated by Viacheslav Dubeyko about 2 months ago

Ater Fax wrote in #note-41:

Thanks for the update on progress and for the patch. Happy Thanksgiving if you're US based!

Thanks. I am enjoying by Thanksgiving. :) Happy Thanksgiving to you too.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #73152

CephFS client side, stops working causing machine partial lockup when using fscrypt - multiple OSes / kernels affected

Updated by Ater Fax 4 months ago

Updated by Venky Shankar 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Ater Fax 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Ater Fax 4 months ago

Updated by Ater Fax 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Ater Fax 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Ater Fax 4 months ago

Updated by Ater Fax 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Ater Fax 4 months ago

Updated by Ater Fax 4 months ago

Updated by Ater Fax 4 months ago

Updated by Ater Fax 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Ater Fax 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Viacheslav Dubeyko 4 months ago

Updated by Ater Fax 4 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Ater Fax 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko about 2 months ago

Updated by Ater Fax about 2 months ago

Updated by Viacheslav Dubeyko about 2 months ago