Issue Description
As part of this PR to fix issue podman-container-tools/podman#12702 the memory surveillance for cgroup v2 was changed from the value reported in memory.current to only the anon value in memory.stat.
This results in reporting too low memory consumption for containers (specially .Net containers which load a lot of DLLs).
Therefore, the reported MEM USAGE returned by podman stats can not be used to define a suitable mem_limit in the compose file. The predicted max memory usage will probably be set too low, which results in OOM kills.
The OOM Kill uses the values from the memory.current.
Steps to reproduce the issue
Steps to reproduce the issue
- Start a .Net container (or any container that uses a lot of file backed memory) and get the PID for the container process
- Execute
top -p 1427 and notice that the used memory is ~150MB
top - 09:03:20 up 58 min, 1 user, load average: 5.88, 7.69, 7.28
Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 7924.4 total, 3146.9 free, 2940.6 used, 2622.8 buff/cache
MiB Swap: 512.0 total, 512.0 free, 0.0 used. 4983.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1427 myuser 20 0 262.4g 153096 95476 S 0.0 1.9 0:29.54 dotnet
- Go the container's cgroup and
cat memory.current. Notice the used memory is ~200MB
cat memory.stat and notice that most of the used memory is file backed and not anon
anon 59613184
file 147480576
kernel 2732032
kernel_stack 344064
pagetables 737280
sec_pagetables 0
percpu 288
sock 0
vmalloc 12288
shmem 31166464
zswap 0
zswapped 0
file_mapped 89444352
file_dirty 12288
file_writeback 0
swapcached 0
anon_thp 35651584
file_thp 0
shmem_thp 0
inactive_anon 86704128
active_anon 4075520
inactive_file 112803840
active_file 3510272
unevictable 0
slab_reclaimable 1045744
slab_unreclaimable 564688
slab 1610432
workingset_refault_anon 0
workingset_refault_file 0
workingset_activate_anon 0
workingset_activate_file 0
workingset_restore_anon 0
workingset_restore_file 0
workingset_nodereclaim 0
pgscan 0
pgsteal 0
pgscan_kswapd 0
pgscan_direct 0
pgsteal_kswapd 0
pgsteal_direct 0
pgfault 27689
pgmajfault 146
pgrefill 0
pgactivate 2481
pgdeactivate 0
pglazyfree 0
pglazyfreed 0
zswpin 0
zswpout 0
thp_fault_alloc 6
thp_collapse_alloc 14
podman stats mycontainer and notice that only ~60MB MEM USAGE are reported
ID NAME CPU % MEM USAGE / LIMIT MEM % NET IO BLOCK IO PIDS CPU TIME AVG CPU %
53f7d03b389c mycontainer 0.83% 59.61MB / 8.309GB 0.72% 906.7kB / 3.174MB 0B / 0B 23 31.703577s 0.83%
- If the
mem_limit in the compose file would have been set to 200M, the OOM kill would soon terminate the container
Describe the results you received
The MEM USAGE reported by podman stats are misleading, as it only contain the anon memory and does not represent the real memory usage of the container
Describe the results you expected
podman stats should return the real used memory of the process. The same value that is used by the OOM kill.
podman info output
host:
arch: amd64
buildahVersion: 1.28.2
cgroupControllers:
- cpuset
- cpu
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon_2.1.6+ds1-1_amd64
path: /usr/bin/conmon
version: 'conmon version 2.1.6, commit: unknown'
cpuUtilization:
idlePercent: 75.92
systemPercent: 21.81
userPercent: 2.27
cpus: 6
distribution:
codename: bookworm
distribution: debian
version: "12"
eventLogger: journald
hostname: myhost
idMappings:
gidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
uidmap:
- container_id: 0
host_id: 1000
size: 1
- container_id: 1
host_id: 100000
size: 65536
kernel: 6.1.0-18-amd64
linkmode: dynamic
logDriver: journald
memFree: 3156512768
memTotal: 8309284864
networkBackend: cni
ociRuntime:
name: crun
package: crun_1.8.1-1+deb12u1_amd64
path: /usr/bin/crun
version: |-
crun version 1.8.1
commit: f8a096be060b22ccd3d5f3ebe44108517fbf6c30
rundir: /run/user/1000/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
exists: true
path: /run/user/1000/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns_1.2.0-1_amd64
version: |-
slirp4netns version 1.2.0
commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.4
swapFree: 536866816
swapTotal: 536866816
uptime: 1h 11m 37.00s (Approximately 0.04 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries: {}
store:
configFile: /opt/roche/home/.config/containers/storage.conf
containerStore:
number: 24
paused: 0
running: 24
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /opt/roche/home/.local/share/containers/storage
graphRootAllocated: 256623226880
graphRootUsed: 22553309184
graphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 24
runRoot: /run/user/1000/containers
volumePath: /opt/roche/home/.local/share/containers/storage/volumes
version:
APIVersion: 4.3.1
Built: 0
BuiltTime: Thu Jan 1 00:00:00 1970
GitCommit: ""
GoVersion: go1.19.8
Os: linux
OsArch: linux/amd64
Version: 4.3.1
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Additional environment details
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting
Issue Description
As part of this PR to fix issue podman-container-tools/podman#12702 the memory surveillance for cgroup v2 was changed from the value reported in
memory.currentto only theanonvalue inmemory.stat.This results in reporting too low memory consumption for containers (specially .Net containers which load a lot of DLLs).
Therefore, the reported
MEM USAGEreturned bypodman statscan not be used to define a suitablemem_limitin the compose file. The predicted max memory usage will probably be set too low, which results in OOM kills.The OOM Kill uses the values from the
memory.current.Steps to reproduce the issue
Steps to reproduce the issue
top -p 1427and notice that the used memory is ~150MBcat memory.current. Notice the used memory is ~200MBcat memory.statand notice that most of the used memory is file backed and not anonpodman stats mycontainerand notice that only ~60MBMEM USAGEare reportedmem_limitin the compose file would have been set to 200M, the OOM kill would soon terminate the containerDescribe the results you received
The
MEM USAGEreported bypodman statsare misleading, as it only contain theanonmemory and does not represent the real memory usage of the containerDescribe the results you expected
podman statsshould return the real used memory of the process. The same value that is used by the OOM kill.podman info output
Podman in a container
No
Privileged Or Rootless
Rootless
Upstream Latest Release
No
Additional environment details
Additional environment details
Additional information
Additional information like issue happens only occasionally or issue happens with a particular architecture or on a particular setting