Bug #74156: CephFS in kernel client appears to be "leaking" folios - Linux kernel client - Ceph

Actions

Copy link

Bug #74156

open

CephFS in kernel client appears to be "leaking" folios

Added by Malcolm Haak 3 months ago. Updated 20 days ago.

Status:

In Progress

Priority:

Normal

Assignee:

Viacheslav Dubeyko

Category:

fs/ceph

Target version:

% Done:

Source:

Community (user)

Backport:

Regression:

Severity:

2 - major

Reviewed:

Affected Versions:

Ceph - v19.2.1, Ceph - v19.2.2, Ceph - v19.2.3, Ceph - v19.2.4, Ceph - v20.2.1

ceph-qa-suite:

Tags (freeform):

Merge Commit:

Fixed In:

Released In:

Upkeep Timestamp:

Description

Hello,

I am running a server that has a heavy read/write workload to a cephfs
file system. It is a VM.

Over time it appears that the non-cache usage of kernel dynamic memory
increases. The kernel seems to think the pages are reclaimable however
nothing appears to trigger the reclaim. This leads to workloads getting
killed via oomkiller.

smem -wp output:

Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 88.21% 36.25% 51.96%
userspace memory 9.49% 0.15% 9.34%
free memory 2.30% 2.30% 0.00%

free -h output:

total  used   free   shared  buff/cache available 
Mem:   31Gi   3.6Gi  500Mi  4.0Mi   11Gi      27Gi 
Swap:  4.0Gi  179Mi  3.8Gi

Unmounting the file system has no effect on the used kernel dynamic memory.
Nor does dropping caches.
I have enabled allocation tracking and got the following:

Ran the rsync workload for about 9 hours. It started to look like it
was happening.

smem -pw
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 80.46% 65.80% 14.66%
userspace memory 0.35% 0.16% 0.19%
free memory 19.19% 19.19% 0.00%

sort -g /proc/allocinfo|tail|numfmt --to=iec
22M 5609 mm/memory.c:1190 func:folio_prealloc
23M 1932 fs/xfs/xfs_buf.c:226 [xfs]func:xfs_buf_alloc_backing_mem
24M 24135 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc
27M 6693 mm/memory.c:1192 func:folio_prealloc
58M 14784 mm/page_ext.c:271 func:alloc_page_ext
258M 129 mm/khugepaged.c:1069 func:alloc_charge_folio
430M 770788 lib/xarray.c:378 func:xas_alloc
545M 36444 mm/slub.c:3059 func:alloc_slab_page
9.8G 2563617 mm/readahead.c:189 func:ractl_alloc_folio
20G 5164004 mm/filemap.c:2012 func:__filemap_get_folio

So I stopped the workload and dropped caches to confirm.

echo 3 > /proc/sys/vm/drop_caches
smem -pw
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 33.45% 0.09% 33.36%
userspace memory 0.36% 0.16% 0.19%
free memory 66.20% 66.20% 0.00%
sort -g /proc/allocinfo|tail|numfmt --to=iec
12M 2987 mm/execmem.c:41 func:execmem_vmalloc
12M 3 kernel/dma/pool.c:96 func:atomic_pool_expand
13M 751 mm/slub.c:3061 func:alloc_slab_page
16M 8 mm/khugepaged.c:1069 func:alloc_charge_folio
18M 4355 mm/memory.c:1190 func:folio_prealloc
24M 6119 mm/memory.c:1192 func:folio_prealloc
58M 14784 mm/page_ext.c:271 func:alloc_page_ext
61M 15448 mm/readahead.c:189 func:ractl_alloc_folio
79M 6726 mm/slub.c:3059 func:alloc_slab_page
11G 2674488 mm/filemap.c:2012 func:__filemap_get_folio

Reverting to the previous LTS (6.12) fixes the issue
After 24hrs of operation.
smem -wp output:
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 80.22% 79.32% 0.90%
userspace memory 10.48% 0.20% 10.28%
free memory 9.30% 9.30% 0.00%

I have tested 6.18, 6.17 and am in the process of testing the 6.16 kernel, it appears to be affected also.

The reproducer is simple.

I have one VM. 32GB of ram and 16 cores. It has a cephfs filesystem mounted.
I have two rsync copies (rsync -a --progress ./source ./dest/) with the source and destination being different for both copies (four different folders), but all being on the same filesystem.
(I am moving two 5TB data sets from EC pools onto replicated pools as I am currently affected by #70390.)

But I have also replicated this with other large write only workloads. (Downloading data sets from online sources. And unpacking large datasets out of archives) This was before I discovered the issue with squid created OSD's
The leak appears to be quite slow. I usually find I can confirm the issue is present after 6-9hrs of continuous data migration (it's running at an average of around 120MB/s)

I originally emailed the kernel mailing list:
https://lkml.org/lkml/2025/11/10/309

And was referred here after being referred to the memory allocation tracking and getting a result there.

The ceph cluster has been upgraded multuple times during my attempts to find the issue. It started at 19.2.1 and was upgraded through 19.2.2, 19.2.3 and 20.2.1

But I believe the issue is in the kernel client, so the cluster might not be important.

Files

Download all files

vmcore-dmesg.log (144 KB) vmcore-dmesg.log		Malcolm Haak, 12/17/2025 11:37 PM
patch.patch (1.85 KB) patch.patch		Malcolm Haak, 12/17/2025 11:39 PM
new.log (106 KB) new.log		Malcolm Haak, 12/19/2025 01:16 AM
ddloop.sh (200 Bytes) ddloop.sh		Malcolm Haak, 12/19/2025 01:42 AM
repro_run.sh (88 Bytes) repro_run.sh		Malcolm Haak, 12/19/2025 01:42 AM
latest_dmesg.log (128 KB) latest_dmesg.log		Malcolm Haak, 12/19/2025 12:14 PM

Actions

Copy link

Updated by Malcolm Haak 3 months ago

I have a new faster reproducer.

I realized it leaks an amount per file. The initial workload I encountered this issue with was downloading datasets, which are lots of ~50MB files, in parallel.

I created a small VM. 2GB of ram, 16 cores.

I have two bash files.

The first has a loop that creates 32 50MB files with dd in parallel and waits for all files to finish.
The second calls the first script 100's of times.

This crashes a vm in about 5 mins.

Actions

Copy link

Updated by Malcolm Haak 3 months ago

repo_run.sh
#!/bin/bash

mkdir -p /mnt/ceph/repro

for i in $(seq 1 100);
do
    ./ddloop.sh $i
done

ddloop.sh
#!/bin/bash

for i in $(seq 1 32);
do
    dd if=/dev/zero of=/mnt/ceph/repro/$i.$1 bs=1M count=60 &
done
wait
echo $1 complete

This is the reproducer. It assumes cephfs is mounted at /mnt/ceph
It does a decent job of replicating the workload I was running.
Thanks

Actions

Copy link

Updated by Viacheslav Dubeyko 3 months ago

Assignee set to Viacheslav Dubeyko

Actions

Copy link

Updated by Viacheslav Dubeyko 3 months ago

Status changed from New to In Progress

I cannot reproduce the issue. The script works already several hours and I don't see any memory leaks in the system. It looks like that one important piece of the puzzle is missing. Which mount options do you have on your side? How have you mounted your CephFS instance?

Actions

Copy link

Updated by Malcolm Haak 3 months ago

Mount options from /etc/fstab
192.168.0.244:/ /mnt/ceph ceph rw,relatime,_netdev

Resulting mount line:
192.168.0.244:/ on /mnt/ceph type ceph (rw,relatime,secret=<hidden>,fsid=969a4eab-2826-4766-87e1-ecb18a7b5a13,acl,_netdev)

Kernels tested:
All Arch linux kernels from 6.12 - 6.18 as well as mainline kernels from 6.14 - 6.19-rc1

Other cluster details:
4 servers in the cluster
47 OSD's split somewhat evenly between the 4 hosts
10GBe on all hosts.
Ceph 20.2.0 currently used on all servers.
3 Nodes used as MON. 4 running MGR. 3 running MDS, only 1 active MDS.
7 pools. Mix of EC and replicated. Issue happens regardless of pool type.
Auto-scale enabled
Auto-balance also enabled.
Cluster was freshly created on 19.2.x with bluestore_elastic_shared_blobs = false

Just trying to get out ahead of any other questions you might have.

Actions

Copy link

Updated by Malcolm Haak 3 months ago

Also just checking:

One of the files used when reproducing:

getfattr -n ceph.file.layout 145.12

file: 145.12
ceph.file.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=cephfs_data"

cephfs_data is a replicated pool.

How else can we instrument the kernel to figure out what's going on as it happens without fail? I have no NVME/Flash so all slow disks.
I have setup at test VM with kdump. (not full memory dump. But I can do that) and enabled crash_on_oom. I'm happy to upload the kernel dumps to my nextcloud for you to access.
If it helps I can build a test VM with any distro/kernel you would like and configure it the same.

Actions

Copy link

Updated by Malcolm Haak 3 months ago

Just in case it's important:

VM's running on Proxmox 9.0.11.
VM's have OVMF bios, x86-64-v2-AES cpu type.Virt-IO used for network and 'local' disk.

Actions

Copy link

Updated by Malcolm Haak 3 months ago

The crash after 5m in on the 2GB sized was not directly due to the bug. My apologies.
I forgot to update the ticket. The most recent run (that just finished) on a 2GB vm took 6hrs. It seems amount of ram isn't doesn't have a large impact as once it starts it snowballs quickly. But getting it to start seems to take some time.

Also I replicated the issue on a physical machine. It took 9 hrs but it has 32GB of ram and was only connected to the cluster via 1GbE.

I'm currently running the reproducer again, with kdump enabled. I will make the dump available once it crashes in, I assume 6-7 hrs.

Actions

Copy link

Updated by Malcolm Haak 3 months ago

Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00%
kernel image                  0.00%      0.00%      0.00%
kernel dynamic memory        87.45%     34.89%     52.56%
userspace memory              7.15%      1.08%      6.08%
free memory                   5.40%      5.40%      0.00%

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory          1.2G     491.2M     760.7M
userspace memory             103.1M      15.5M      87.5M
free memory                   84.0M      84.0M          0

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       429Mi       111Mi       3.9Mi       483Mi       1.0Gi
Swap:          718Mi        22Mi       696Mi

#sort -g /proc/allocinfo|tail|numfmt --to=iec
        8.4M     2660 kernel/fork.c:311 func:alloc_thread_stack_node 
        8.9M     9033 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc 
        9.1M      573 mm/slub.c:3061 func:alloc_slab_page 
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
         16M     3937 mm/memory.c:1192 func:folio_prealloc 
         22M    38876 lib/xarray.c:378 func:xas_alloc 
         35M     8775 mm/readahead.c:189 func:ractl_alloc_folio 
         61M    15420 mm/memory.c:1190 func:folio_prealloc 
        108M     8333 mm/slub.c:3059 func:alloc_slab_page 
        970M   248277 mm/filemap.c:2012 func:__filemap_get_folio

My ceph cluster is busy doing quite a bit of remapping due to OSD's being re-created. It has slowed down the reproducer considerably.
This is after 12hrs of running. I'm going to wait for it to oom and collect the crash dump as by that time most of the ram should be claimed by folio's

As you can see a considerable amount of memory is being consumed by the noncache part. That value was around 100-120MB 11hrs ago. Swapping has started, so large amounts of it are failing to reclaim already.

Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00%
kernel image                  0.00%      0.00%      0.00%
kernel dynamic memory        86.68%     32.35%     54.33%
userspace memory              7.30%      1.25%      6.05%
free memory                   6.02%      6.02%      0.00%

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory          1.2G     461.3M     782.3M
userspace memory             105.2M      18.0M      87.2M
free memory                   90.1M      90.1M          0

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       434Mi        95Mi       3.9Mi       481Mi       1.0Gi
Swap:          718Mi        23Mi       695Mi

Actions

Copy link

#10

Updated by Malcolm Haak 3 months ago

Sorry I prematurely hit send.

The second output was collected immediately after issuing a

sync;echo 3 >/proc/sys/vm/drop_caches

In the past, this is when I would have attempted to get some of that memory back by unmounting the filesystem, as my monitoring would be going nuts. As I mentioned above, this has no effect.
I've also, to try and figure out where the memory was being used, gone on an rmmod rampage, unloading the ceph/cephfs/netfs modules has no effect on the memory usage.

Anyway, I'll leave it run overnight and hopefully wake up to a 2GB crash dump.

Actions

Copy link

#11

Updated by Viacheslav Dubeyko 3 months ago

Malcolm Haak wrote in #note-6:

Also just checking:

One of the files used when reproducing:

getfattr -n ceph.file.layout 145.12

file: 145.12
ceph.file.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=cephfs_data"

cephfs_data is a replicated pool.

How else can we instrument the kernel to figure out what's going on as it happens without fail? I have no NVME/Flash so all slow disks.
I have setup at test VM with kdump. (not full memory dump. But I can do that) and enabled crash_on_oom. I'm happy to upload the kernel dumps to my nextcloud for you to access.
If it helps I can build a test VM with any distro/kernel you would like and configure it the same.

The ready-made VM with correct environment for the issue reproduction will help a lot. Thanks in advance.

Actions

Copy link

#12

Updated by Viacheslav Dubeyko 3 months ago · Edited

#sort -g /proc/allocinfo|tail|numfmt --to=iec
        8.4M     2660 kernel/fork.c:311 func:alloc_thread_stack_node 
        8.9M     9033 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc 
        9.1M      573 mm/slub.c:3061 func:alloc_slab_page
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
         16M     3937 mm/memory.c:1192 func:folio_prealloc 
         22M    38876 lib/xarray.c:378 func:xas_alloc 
         35M     8775 mm/readahead.c:189 func:ractl_alloc_folio 
         61M    15420 mm/memory.c:1190 func:folio_prealloc 
        108M     8333 mm/slub.c:3059 func:alloc_slab_page 
        970M   248277 mm/filemap.c:2012 func:__filemap_get_folio <-- This point looks interesting!!!

We have __filemap_get_folio() in fill_readdir_cache() method [1]. Potentially, it could be the place of issue. But, currently, I don't see how the issue could happen. Because, ceph_readdir_cache_release() [2] includes folio_release_kmap() which includes folio_put() [3]. Potentially, somehow, folio's reference counter can be increased unreasonably. But I cannot see right now how it could happen and, maybe, my hypothesis is wrong here. Let me sleep on this and dive deeper in the code.

[1] https://elixir.bootlin.com/linux/v6.18/source/fs/ceph/inode.c#L1940
[2] https://elixir.bootlin.com/linux/v6.18/source/fs/ceph/inode.c#L1915
[3] https://elixir.bootlin.com/linux/v6.18/source/include/linux/highmem.h#L682

Actions

Copy link

#13

Updated by Malcolm Haak 3 months ago

Ok I have a VM running my full kernel/client setup. I can make the drive for the vm available?

It did crash early this morning and I have the crash dump however I realized you'll need my kernel and debug symbols. Also makedumpfile was called with -d 31 not -d 2. That's my fault I should have checked the defaults on the dump.

I can make a VM available with a new dump file, I'll re-run everything and get a full dump of the memory. Did you want ssh access or I can pack the whole thing up, vm with crash and all, and make it available.?

I'd prefer not to post the url for the download in the ticket. And I probably can't upload a several GB file to the bug tracker so how would you like me to get it to you?

The VM is expecting kvm, but otherwise nothing special. I'll reset all the passwords to something simple like ceph.

Otherwise, send me a pubkey and I can get that added and give you details of how to ssh in.

Actions

Copy link

#14

Updated by Viacheslav Dubeyko 3 months ago

Malcolm Haak wrote in #note-13:

Ok I have a VM running my full kernel/client setup. I can make the drive for the vm available?

It did crash early this morning and I have the crash dump however I realized you'll need my kernel and debug symbols. Also makedumpfile was called with -d 31 not -d 2. That's my fault I should have checked the defaults on the dump.

I can make a VM available with a new dump file, I'll re-run everything and get a full dump of the memory. Did you want ssh access or I can pack the whole thing up, vm with crash and all, and make it available.?

I'd prefer not to post the url for the download in the ticket. And I probably can't upload a several GB file to the bug tracker so how would you like me to get it to you?

The VM is expecting kvm, but otherwise nothing special. I'll reset all the passwords to something simple like ceph.

Otherwise, send me a pubkey and I can get that added and give you details of how to ssh in.

Let me spend some time on reproducing the issue on my side. I have some vision how to investigate the issue on my own. If it doesn't work, then I will ask you to provide some artifacts. Thanks.

Actions

Copy link Download all files

#15

Updated by Malcolm Haak 3 months ago

File vmcore-dmesg.log vmcore-dmesg.log added
File patch.patch patch.patch added

Oh also I forgot. I did a run with the patch suggested by David on the kernel mailing list.

It added tracing to __filemap_get_folio

[64793.828030] [ T382379] Memory allocations (profiling is currently turned on):
[64793.828047] [ T382379]     1.18 GiB   308269 fs/netfs/buffered_read.c:635 [netfs] func:netfs_write_begin
[64793.828060] [ T382379]     76.2 MiB     6697 mm/slub.c:3059 func:alloc_slab_page
[64793.828070] [ T382379]     11.2 MiB     3001 mm/execmem.c:41 func:execmem_vmalloc
[64793.828078] [ T382379]     9.16 MiB      579 mm/slub.c:3061 func:alloc_slab_page
[64793.828082] [ T382379]     8.92 MiB     2851 kernel/fork.c:311 func:alloc_thread_stack_node
[64793.828091] [ T382379]     8.65 MiB    15533 lib/xarray.c:378 func:xas_alloc
[64793.828095] [ T382379]     7.95 MiB     2034 mm/readahead.c:189 func:ractl_alloc_folio
[64793.828099] [ T382379]     7.43 MiB     1901 mm/zsmalloc.c:237 func:alloc_zpdesc
[64793.828114] [ T382379]     7.07 MiB     1811 arch/x86/mm/pgtable.c:18 func:pte_alloc_one
[64793.828124] [ T382379]     4.23 MiB     1083 drivers/block/zram/zram_drv.c:1597 [zram] func:zram_meta_alloc

It seems every call to __filemap_get_folio that is slowly accumulating is coming from func:netfs_write_begin. I'm not sure if that confirms or throws a spanner in the works of your theory.
I've included the dmesg from that run and the patch.

Hopefully that helps!

Actions

Copy link

#16

Updated by Viacheslav Dubeyko 3 months ago

Could you please share the kernel .config file that you've used to compile kernel? Potentially, my kernel could not have necessary features to trigger the issue. Thanks.

Actions

Copy link

#17

Updated by Malcolm Haak 3 months ago

All my nodes are running an Arch kernel. Even the one I compiled is based on the Arch config available here:
https://aur.archlinux.org/cgit/aur.git/tree/config?h=linux-mainline

The only differences between this an my kernel are the addition of

CONFIG_MEM_ALLOC_PROFILING=y
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y

It is incredibly slow. I didn't see true signs of it happening with the DD workload until around hour 6-7 and as you can see above it wasn't until 12hrs that it really started to show through. It was failing faster with less CPU cores... I might test that as I increased the cores to 32 to re-build the kernel faster and the host that replicates it the fastest (with a different workload not the test one) only has 4 cpus. Weird timing issue under cpu load perhaps?

Actions

Copy link

#18

Updated by Malcolm Haak 3 months ago

Ok two vCPUs and 2GB of ram. 3hrs of running the dd reproducer:

# sort -g /proc/allocinfo|tail|numfmt --to=iec
        4.0M        2 mm/khugepaged.c:1069 func:alloc_charge_folio 
        4.1M     1049 mm/percpu.c:512 func:pcpu_mem_zalloc 
        4.3M     1087 drivers/block/zram/zram_drv.c:1597 [zram] func:zram_meta_alloc 
        4.6M     1157 mm/shmem.c:1870 func:shmem_alloc_folio 
        8.3M     2117 mm/memory.c:1190 func:folio_prealloc 
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
         20M     4896 mm/memory.c:1192 func:folio_prealloc 
         29M     4020 mm/slub.c:3059 func:alloc_slab_page 
         56M    14205 mm/readahead.c:189 func:ractl_alloc_folio 
        279M    71404 fs/netfs/buffered_read.c:635 [netfs] func:netfs_write_begin

# sync; echo 3 > /proc/sys/vm/drop_caches;smem -wp;echo;smem -wk; echo; free -h
Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00%
kernel image                  0.00%      0.00%      0.00%
kernel dynamic memory        28.24%      1.12%     27.12%
userspace memory              5.94%      3.26%      2.67%
free memory                  65.82%     65.82%      0.00%

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory        407.6M      16.2M     391.4M
userspace memory              85.8M      47.1M      38.7M
free memory                  950.1M     950.1M          0

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       284Mi       950Mi       4.5Mi        63Mi       1.1Gi
Swap:          721Mi       8.0Ki       721Mi

It's already showing up at this point. Most of that 391M will be unreclaimable (I'm guessing something close to 279MB). So I'm getting ~90MB every hour.
It had just completed loop 101.

This is using an updated ddloop.sh:

#!/bin/bash

for i in $(seq 1 256);
do
    dd if=/dev/zero of=/mnt/ceph/repro/$i.$1 bs=163864 count=410 &
done

wait
sync
for i in $(seq 1 256);
do
    rm /mnt/ceph/repro/$i.$1 &
done
wait
echo $1 complete

I was aiming for misaligned writes and was also trying to replicate the "re-assemble and remove source" behavior from the dataset tool. Well just the "remove source" part anyway. Also that tool does a force sync in between stages.

Anyway, it's well on it's way to crashing and getting a much more complete crash dump. It's doing a

 -d 2

not 31.

So hopefully that should allow a full diagnosis of the issue.

Actions

Copy link

#19

Updated by Viacheslav Dubeyko 3 months ago

Frankly speaking, I don't quite follow what is definition of the issue. What should I detect as the symptoms of issue? How do you define the symptoms of the issue?

As far as I can see, currently, I cannot detect any memory leaks. If I do such steps:

sync; echo 3 > /proc/sys/vm/drop_caches;smem -wp;echo;smem -wk; echo; free -h

Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        28.70%      4.06%     24.64% 
userspace memory             36.42%     18.15%     18.26% 
free memory                  34.88%     34.88%      0.00% 

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory        469.4M      71.2M     398.2M 
userspace memory             588.5M     293.4M     295.1M 
free memory                  557.8M     557.8M          0 

               total        used        free      shared  buff/cache   available
Mem:           1.6Gi       691Mi       557Mi        24Mi       366Mi       758Mi
Swap:          2.0Gi        21Mi       2.0Gi

cat /proc/meminfo 
MemTotal:        1654452 kB
MemFree:          498560 kB
MemAvailable:     803040 kB
Buffers:            6820 kB
Cached:           434764 kB
SwapCached:         4032 kB
Active:           532060 kB
Inactive:         201756 kB
Active(anon):     275648 kB
Inactive(anon):    37784 kB
Active(file):     256412 kB
Inactive(file):   163972 kB
Unevictable:        7656 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        2075640 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               432 kB
Writeback:             0 kB
AnonPages:        298424 kB
Mapped:           320312 kB
Shmem:             21320 kB
KReclaimable:      29016 kB
Slab:             288240 kB
SReclaimable:      29016 kB
SUnreclaim:       259224 kB
KernelStack:       11392 kB
PageTables:        13944 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2924372 kB
Committed_AS:    2371600 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       21380 kB
VmallocChunk:          0 kB
Percpu:             1512 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
Balloon:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      178036 kB
DirectMap2M:     1918976 kB
DirectMap1G:           0 kB

/mnt/cephfs/repro1# dd if=/dev/urandom of=./test.0001 bs=1048576 count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB, 1000 MiB) copied, 788.002 s, 1.3 MB/s

cat /proc/meminfo 
MemTotal:        1654452 kB
MemFree:           71956 kB
MemAvailable:     936076 kB
Buffers:            1520 kB
Cached:           981652 kB
SwapCached:        13400 kB
Active:           246580 kB
Inactive:         868384 kB
Active(anon):      29952 kB
Inactive(anon):   114964 kB
Active(file):     216628 kB
Inactive(file):   753420 kB
Unevictable:        8176 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        1893800 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               852 kB
Writeback:             0 kB
AnonPages:        136544 kB
Mapped:           132464 kB
Shmem:             13236 kB
KReclaimable:      48968 kB
Slab:             344432 kB
SReclaimable:      48968 kB
SUnreclaim:       295464 kB
KernelStack:       11360 kB
PageTables:        13860 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2924372 kB
Committed_AS:    2371344 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       21376 kB
VmallocChunk:          0 kB
Percpu:             1728 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
Balloon:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      178036 kB
DirectMap2M:     1918976 kB
DirectMap1G:           0 kB

sync; echo 3 > /proc/sys/vm/drop_caches;smem -wp;echo;smem -wk; echo; free -h
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        27.57%      4.27%     23.31% 
userspace memory             16.86%      8.20%      8.66% 
free memory                  55.57%     55.57%      0.00% 

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory        447.4M      68.8M     378.6M 
userspace memory             272.4M     132.5M     139.8M 
free memory                  895.9M     895.9M          0 

               total        used        free      shared  buff/cache   available
Mem:           1.6Gi       519Mi       893Mi        12Mi       203Mi       933Mi
Swap:          2.0Gi       198Mi       1.8Gi

cat /proc/meminfo 
MemTotal:        1654452 kB
MemFree:          814832 kB
MemAvailable:     937028 kB
Buffers:            6308 kB
Cached:           244872 kB
SwapCached:        13420 kB
Active:           169368 kB
Inactive:         213988 kB
Active(anon):      29732 kB
Inactive(anon):   115048 kB
Active(file):     139636 kB
Inactive(file):    98940 kB
Unevictable:        8176 kB
Mlocked:               0 kB
SwapTotal:       2097148 kB
SwapFree:        1893712 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               440 kB
Writeback:             0 kB
AnonPages:        136412 kB
Mapped:           132528 kB
Shmem:             13236 kB
KReclaimable:      28060 kB
Slab:             309844 kB
SReclaimable:      28060 kB
SUnreclaim:       281784 kB
KernelStack:       11360 kB
PageTables:        13856 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2924372 kB
Committed_AS:    2371344 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       21376 kB
VmallocChunk:          0 kB
Percpu:             1728 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
Unaccepted:            0 kB
Balloon:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:      178036 kB
DirectMap2M:     1918976 kB
DirectMap1G:           0 kB

I can see that memory has been returned to free state.

If we suspect the netfs_write_begin(), then I have checked the folios' reference counter. And, currently, I don't see any anomalies in this code. Probably, I could miss something.

So, what is your definition of the problem/issue? What am I missing?

Thanks,
Slava.

Actions

Copy link

#20

Updated by Malcolm Haak 3 months ago

File new.log new.log added

The issue is, folios accumulate and cannot be freed at all by anyone or anything. They are stuck.

They are marked as available, but nothing can free them. When the memory usage has expaneded to consume 90% of ram, you can unmount the filesystem, call sync, drop_caches. Remove every module from the kernel and the memory usage of said folios will remain at 90%. The machine will be swapping like crazy to have enough ram to function in, it will claim it has heaps of 'available' memory, but it can never free these pages to actually use them. Memory pressure can't get the "available" pages back.

Your single random one shot dd does not replicate the issue in a way that is observable.. That's 1 file.That is not the replication workload, which is why I provided the scripts. I've been very specific that whatever is happening is either, not every file, or a very very small amount per file. The replication workload creates thousands/millions of files for a reason.

Also you're statement suggests you fundamentally misunderstand the issue. On that box, right now.
kernel dynamic memory 447.4M 68.8M 378.6M
Try and get that 378MB back down to 100MB or even 200MB. Unmount the ceph filesystem, see how it doesn't change. It never changes, that ram is un-freeable by any mechanism in the kernel.

I have uploaded another dmesg from the machine that crashed yesterday. All it was running was lots of streams of DD, then rm'ing the files, on loop. Why would that workload cause the computer to run out of available memory and crash? I can do that exact workload on ANY other filesystem be it local, NFS, SMB, lustre, BeeGeeFS, MooseFS and it will run forever as it rm's all the files it creates. The "Non-cache kernel dynamic memory" doesn't climb over time to 100% of system ram on any of those other filesystems. It does with cephfs and has since kernel 6.15

Look. I'm going to run my reproducer, which I will upload as attachments again today. I will disable panic_on_oom and then run it until the box his 80% ram usage by "non-cache kernel dynamic memory" and then I will run any and all commands you want to see the output of as well as provide and and all logs out of it. Hell I'll give you remote access to it so you can see the issue in full effect. Perhaps this is a language barrier, or perhaps you've not been waiting long enough for it to reproduce, I don't know, I don't care, I just want to give you all the information you want/need to see what I am seeing. Please understand I am not mad/frustrated with you if I come across that way. Any/all frustration is at my inability to effectively communicate the issue.

Actions

Copy link Download all files

#21

Updated by Malcolm Haak 3 months ago

File ddloop.sh ddloop.sh added
File repro_run.sh repro_run.sh added

Apologies, here are the scripts as I have been using.

Actions

Copy link

#22

Updated by Malcolm Haak 3 months ago

File latest_dmesg.log latest_dmesg.log added

Ok I ran the VM for ~11hrs straight collecting output from time to time

Pre-workload start

[root@kerneltest ~]# smem -wp;echo;smem -wk; echo; free -h
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        28.76%     20.14%      8.63% 
userspace memory              5.71%      3.21%      2.50% 
free memory                  65.53%     65.53%      0.00% 

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory        415.7M     290.7M     125.0M 
userspace memory              82.4M      46.3M      36.0M 
free memory                  945.4M     945.4M          0 

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       309Mi       945Mi       4.5Mi       337Mi       1.1Gi
Swap:          721Mi       8.0Ki       721Mi
[root@kerneltest ~]# sort -g /proc/allocinfo|tail|numfmt --to=iec
        4.3M     1087 drivers/block/zram/zram_drv.c:1597 [zram] func:zram_meta_alloc 
        4.6M     1157 mm/shmem.c:1870 func:shmem_alloc_folio 
        5.6M    30066 fs/dcache.c:1690 func:__d_alloc 
        9.7M     2458 mm/memory.c:1190 func:folio_prealloc 
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
         20M     5014 mm/memory.c:1192 func:folio_prealloc 
         23M     1920 fs/xfs/xfs_buf.c:226 [xfs] func:xfs_buf_alloc_backing_mem 
         23M    22840 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc 
         55M     6728 mm/slub.c:3059 func:alloc_slab_page 
        295M    54777 mm/readahead.c:189 func:ractl_alloc_folio 
uptime, 11 hours at start

Every 2.0s: smem -wp;echo;smem -wk; echo; free -h; echo; uptime                                                                                                                                                                                          kerneltest: 11:43:34
                                                                                                                                                                                                                                                                in 0.323s (0)
Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00%
kernel image                  0.00%      0.00%      0.00%
kernel dynamic memory        83.53%     73.86%      9.67%
userspace memory             10.80%      3.23%      7.58%
free memory                   5.67%      5.67%      0.00%

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory          1.2G       1.0G     138.5M
userspace memory             155.9M      46.5M     109.3M
free memory                   78.5M      78.5M          0

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       393Mi        75Mi       4.5Mi       1.1Gi       1.0Gi
Swap:          721Mi       8.0Ki       721Mi

 11:43:35 up 12:11,  1 user,  load average: 256.43, 227.67, 130.74

Every 2.0s: smem -wp;echo;smem -wk; echo; free -h; echo; uptime                                                                                                                                                                                          kerneltest: 12:33:17
                                                                                                                                                                                                                                                                in 0.332s (0)
Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00%
kernel image                  0.00%      0.00%      0.00%
kernel dynamic memory        84.22%     70.27%     13.95%
userspace memory             10.81%      3.23%      7.58%
free memory                   4.97%      4.97%      0.00%

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory          1.2G    1011.5M     200.9M
userspace memory             156.0M      46.6M     109.4M
free memory                   75.1M      75.1M          0

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       388Mi        73Mi       4.5Mi       1.0Gi       1.0Gi
Swap:          721Mi       8.0Ki       721Mi

 12:33:18 up 13:00,  1 user,  load average: 254.04, 253.54, 247.31

Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00%
kernel image                  0.00%      0.00%      0.00%
kernel dynamic memory        84.00%     63.02%     20.98%
userspace memory             10.74%      3.23%      7.51%
free memory                   5.26%      5.26%      0.00%

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory          1.2G     879.3M     303.8M
userspace memory             154.8M      46.6M     108.2M
free memory                  105.6M     105.6M          0

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       390Mi       101Mi       4.5Mi       930Mi       1.0Gi
Swap:          721Mi       8.0Ki       721Mi

 13:43:36 up 14:11,  1 user,  load average: 256.11, 250.50, 249.63

Every 2.0s: smem -wp;echo;smem -wk; echo; free -h; echo; sort -g /proc/allocinfo|tail|numfmt --to=iec; echo; uptime                                                                                                                                      kerneltest: 15:47:27
                                                                                                                                                                                                                                                                in 0.523s (0)
Area                           Used      Cache   Noncache
firmware/hardware             0.00%      0.00%      0.00%
kernel image                  0.00%      0.00%      0.00%
kernel dynamic memory        83.43%     49.86%     33.57%
userspace memory             10.78%      3.23%      7.55%
free memory                   5.79%      5.79%      0.00%

Area                           Used      Cache   Noncache
firmware/hardware                 0          0          0
kernel image                      0          0          0
kernel dynamic memory          1.2G     728.1M     488.3M
userspace memory             155.6M      46.6M     109.0M
free memory                   71.5M      71.5M          0

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       385Mi       109Mi       4.5Mi       747Mi       1.0Gi
Swap:          721Mi       8.0Ki       721Mi

        7.5M     1903 arch/x86/mm/pgtable.c:18 func:pte_alloc_one
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc
         22M    38907 lib/xarray.c:378 func:xas_alloc
         23M     1920 fs/xfs/xfs_buf.c:226 [xfs] func:xfs_buf_alloc_backing_mem
         23M    22847 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc
         32M     8119 mm/memory.c:1192 func:folio_prealloc
         71M    18040 mm/memory.c:1190 func:folio_prealloc
         88M    10810 mm/slub.c:3059 func:alloc_slab_page
        194M    46893 mm/readahead.c:189 func:ractl_alloc_folio
        848M   216999 fs/netfs/buffered_read.c:635 [netfs] func:netfs_write_begin

 15:47:27 up 16:14,  1 user,  load average: 255.76, 252.11, 250.47

 smem -wp;echo;smem -wk; echo; free -h; echo; sort -g /proc/allocinfo|tail|numfmt --to=iec; echo; uptime
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        89.89%     14.32%     75.57% 
userspace memory              3.77%      2.49%      1.28% 
free memory                   6.34%      6.34%      0.00% 

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          1.3G     206.8M       1.1G 
userspace memory              54.4M      36.0M      18.4M 
free memory                   91.5M      91.5M          0 

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       262Mi        91Mi       4.1Mi       242Mi       1.2Gi
Swap:          721Mi        18Mi       702Mi

        3.8M      960 mm/page_ext.c:271 func:alloc_page_ext 
        4.1M     1049 mm/percpu.c:512 func:pcpu_mem_zalloc 
        4.2M     1059 mm/shmem.c:1870 func:shmem_alloc_folio 
        4.3M     1087 drivers/block/zram/zram_drv.c:1597 [zram] func:zram_meta_alloc 
        5.9M    10485 lib/xarray.c:378 func:xas_alloc 
        6.2M     1566 mm/memory.c:4414 func:__alloc_swap_folio 
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
         43M     5821 mm/slub.c:3059 func:alloc_slab_page 
         96M    24360 mm/readahead.c:189 func:ractl_alloc_folio 
        1.1G   284870 fs/netfs/buffered_read.c:635 [netfs] func:netfs_write_begin 

 22:03:22 up 22:30,  2 users,  load average: 89.54, 185.80, 221.69

 [root@kerneltest ~]# smem -wp;echo;smem -wk; echo; free -h; echo; sort -g /proc/allocinfo|tail|numfmt --to=iec; echo; uptime
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        89.89%     14.32%     75.57% 
userspace memory              3.77%      2.49%      1.28% 
free memory                   6.34%      6.34%      0.00% 

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          1.3G     206.8M       1.1G 
userspace memory              54.4M      36.0M      18.4M 
free memory                   91.5M      91.5M          0 

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       262Mi        91Mi       4.1Mi       242Mi       1.2Gi
Swap:          721Mi        18Mi       702Mi

        3.8M      960 mm/page_ext.c:271 func:alloc_page_ext 
        4.1M     1049 mm/percpu.c:512 func:pcpu_mem_zalloc 
        4.2M     1059 mm/shmem.c:1870 func:shmem_alloc_folio 
        4.3M     1087 drivers/block/zram/zram_drv.c:1597 [zram] func:zram_meta_alloc 
        5.9M    10485 lib/xarray.c:378 func:xas_alloc 
        6.2M     1566 mm/memory.c:4414 func:__alloc_swap_folio 
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
         43M     5821 mm/slub.c:3059 func:alloc_slab_page 
         96M    24360 mm/readahead.c:189 func:ractl_alloc_folio 
        1.1G   284870 fs/netfs/buffered_read.c:635 [netfs] func:netfs_write_begin 

 22:03:22 up 22:30,  2 users,  load average: 89.54, 185.80, 221.69
[root@kerneltest ~]# sync; echo 3 >/proc/sys/mem/drop_caches;smem -wp;echo;smem -wk; echo; free -h; echo; sort -g /proc/allocinfo|tail|numfmt --to=iec; echo; uptime
-bash: /proc/sys/mem/drop_caches: No such file or directory
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        79.18%      4.88%     74.29% 
userspace memory              3.78%      2.49%      1.28% 
free memory                  17.04%     17.04%      0.00% 

Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          1.1G      70.5M       1.0G 
userspace memory              54.5M      36.0M      18.5M 
free memory                  246.0M     246.0M          0 

               total        used        free      shared  buff/cache   available
Mem:           1.4Gi       237Mi       246Mi       4.1Mi       106Mi       1.2Gi
Swap:          721Mi        18Mi       702Mi

        3.7M      522 mm/slub.c:3061 func:alloc_slab_page 
        3.8M      960 mm/page_ext.c:271 func:alloc_page_ext 
        4.1M     1049 mm/percpu.c:512 func:pcpu_mem_zalloc 
        4.2M     1059 mm/shmem.c:1870 func:shmem_alloc_folio 
        4.3M     1087 drivers/block/zram/zram_drv.c:1597 [zram] func:zram_meta_alloc 
        6.3M     1592 mm/memory.c:4414 func:__alloc_swap_folio 
         12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
         33M     4578 mm/slub.c:3059 func:alloc_slab_page 
         96M    24386 mm/readahead.c:189 func:ractl_alloc_folio 
        988M   252826 fs/netfs/buffered_read.c:635 [netfs] func:netfs_write_begin 

 22:04:17 up 22:31,  2 users,  load average: 35.75, 154.55, 208.94

 [root@kerneltest ~]# cat /proc/meminfo 
MemTotal:        1478080 kB
MemFree:          250216 kB
MemAvailable:    1233092 kB
Buffers:               0 kB
Cached:           101824 kB
SwapCached:          828 kB
Active:           653032 kB
Inactive:         468360 kB
Active(anon):      11276 kB
Inactive(anon):     1164 kB
Active(file):     641756 kB
Inactive(file):   467196 kB
Unevictable:        4000 kB
Mlocked:               0 kB
SwapTotal:        738812 kB
SwapFree:         719612 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:         11620 kB
Mapped:            32840 kB
Shmem:              4236 kB
KReclaimable:       7292 kB
Slab:              43388 kB
SReclaimable:       7292 kB
SUnreclaim:        36096 kB
KernelStack:        2096 kB
PageTables:         2636 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1477852 kB
Committed_AS:     129916 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       23948 kB
VmallocChunk:          0 kB
Percpu:             1008 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
Unaccepted:            0 kB
Balloon:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:       78412 kB
DirectMap2M:     2013184 kB

As you can see each time the Non-Cache part of "Kernel dynamic memory" had grown in percentage as well as size.

You can also see that after issuing a sync and drop_caches that 1GB of that was still in memory. It's still technically "available" but nothing can free it.
As you can also see, there is still 988M of mm/filemap.c:2012 func:__filemap_get_folio called from fs/netfs/buffered_read.c:635 [netfs] func:netfs_write_begin. Which I would normally assume would be able to be freed after the pages/folios were all marked as clean post flushing out to ceph.

I have not yet attempted to reclaim that memory by way of unmounting a filesystem or removing modules from the kernel or anything else. I can run any test, dump the current kernel memory, or whatever you would like.

I have gdb installed. I have a full set of kernel debug symbols. I'm ready to do anything you would like done. I can even give you access to the incredibly slow VM if it would help.

As you will be able to see from dmesg, memory is not being reclaimed. Memory pressure is through the roof. And all that has been running is my reproducer, which just runs DD and then deletes the resulting file full of zeros, a net sum of nothing.

Hopefully something in all this output is helpful, or at the least something I can get out of this stuck VM is helpful.

Actions

Copy link

#23

Updated by Viacheslav Dubeyko 3 months ago

I have run your scripts 24 hours already. The system continues to work. I don't see any sign of memory leaks.

I think I need to have:
(1) the same kernel source code that you are using
(2) the same kernel configuration
(3) the same virtual machine + qemu configuration

If you can prepare simple VM image with simple password, then I can download everything. You can share the link to VM, login and password in private email. Could it work for you?

Thanks,
Slava.

Actions

Copy link

#24

Updated by Malcolm Haak 2 months ago

Sorry I've been on holidays.

I can do that. But I'm not using anything weird. It's literally stock Arch linux installed with archinstall.

For VM I'm using proxmox with default settings. Only change is uefi bios. Otherwise it's proxmox defaults. I can dump the vm definition if it will help.

I still have the VM you are more than welcome to log into. I'll just clone it. It's got nothing of value or anything in it so I'll just set the root password to root.

I'll have a download image available shortly.

Actions

Copy link

#25

Updated by Viacheslav Dubeyko 2 months ago

Malcolm Haak wrote in #note-24:

Sorry I've been on holidays.

I can do that. But I'm not using anything weird. It's literally stock Arch linux installed with archinstall.

For VM I'm using proxmox with default settings. Only change is uefi bios. Otherwise it's proxmox defaults. I can dump the vm definition if it will help.

I still have the VM you are more than welcome to log into. I'll just clone it. It's got nothing of value or anything in it so I'll just set the root password to root.

I'll have a download image available shortly.

It's OK. I am trying to relax too. :) Thanks a lot.

Actions

Copy link

#26

Updated by Viacheslav Dubeyko about 2 months ago

Ping... Any hope to have VM for the issue reproduction?

Thanks,
Slava.

Actions

Copy link

#27

Updated by Malcolm Haak about 2 months ago

Apologies. I will get it packed up today and send you a download link directly.

Actions

Copy link

#28

Updated by Viacheslav Dubeyko 23 days ago

Should I close the ticket? Because, I cannot reproduce it and I haven't received any other means for the issue reproduction.

Actions

Copy link

#29

Updated by Malcolm Haak 20 days ago

Sorry, personal stuff happened. I needed to remove personal information from the VM.

I can still box up the VM but it's literally the stock Arch Linux kernel.

Also the Arch Linux "mainline" build https://aur.archlinux.org/packages/linux-mainline

Actions

Copy link

#30

Updated by Viacheslav Dubeyko 20 days ago

Malcolm Haak wrote in #note-29:

Sorry, personal stuff happened. I needed to remove personal information from the VM.

I can still box up the VM but it's literally the stock Arch Linux kernel.

Also the Arch Linux "mainline" build https://aur.archlinux.org/packages/linux-mainline

I need exactly the same VM that you were able to reproduce the issue. You simply need to create the VM from scratch without any personal details. And if you can reproduce the issue, then you can share the VM with me.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Ceph » Linux kernel client

Custom queries

Bug #74156

CephFS in kernel client appears to be "leaking" folios

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Viacheslav Dubeyko 3 months ago

Updated by Viacheslav Dubeyko 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Viacheslav Dubeyko 3 months ago

Updated by Viacheslav Dubeyko 3 months ago · Edited

Updated by Malcolm Haak 3 months ago

Updated by Viacheslav Dubeyko 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Viacheslav Dubeyko 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Viacheslav Dubeyko 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Malcolm Haak 3 months ago

Updated by Viacheslav Dubeyko 3 months ago

Updated by Malcolm Haak 2 months ago

Updated by Viacheslav Dubeyko 2 months ago

Updated by Viacheslav Dubeyko about 2 months ago

Updated by Malcolm Haak about 2 months ago

Updated by Viacheslav Dubeyko 23 days ago

Updated by Malcolm Haak 20 days ago

Updated by Viacheslav Dubeyko 20 days ago