Skip to content

[CVE-2019-5736]: Runc uses more memory during start up after the fix #1980

@Random-Liu

Description

@Random-Liu

We observed higher memory usage (likely during container startup) after the fix for CVE 0a8e411.

We had a test that specifies 10m container cgroup limit, which never failed before, but now the container get oom-killed a lot. For example https://gubernator.k8s.io/build/kubernetes-jenkins/logs/ci-containerd-node-e2e-1-2/2500.

kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=998
kernel: runc:[2:INIT] cpuset=80e651c417ebd71d83e5023ee59b281e585497468bd71ee7c7b3ae6730d9ec8f mems_allowed=0
kernel: CPU: 0 PID: 333 Comm: runc:[2:INIT] Not tainted 4.4.64+ #1
kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
kernel:  0000000000000000 ffff880003e87ca8 ffffffff9f317334 ffff880003e87d88
kernel:  ffff8800bb3e8000 ffff880003e87d18 ffffffff9f1a8fb4 ffff880003e87ce0
kernel:  ffffffff9f13e780 ffff8800bb3eb500 0000000000000206 ffff880003e87cf0
kernel: Call Trace:
kernel:  [<ffffffff9f317334>] dump_stack+0x63/0x8f
kernel:  [<ffffffff9f1a8fb4>] dump_header+0x65/0x1d4
kernel:  [<ffffffff9f13e780>] ? find_lock_task_mm+0x20/0xb0
kernel:  [<ffffffff9f13ef1d>] oom_kill_process+0x28d/0x430
kernel:  [<ffffffff9f1a3e6b>] ? mem_cgroup_iter+0x1db/0x390
kernel:  [<ffffffff9f1a6374>] mem_cgroup_out_of_memory+0x284/0x2d0
kernel:  [<ffffffff9f1a6de9>] mem_cgroup_oom_synchronize+0x2f9/0x310
kernel:  [<ffffffff9f1a1ab0>] ? memory_high_write+0xc0/0xc0
kernel:  [<ffffffff9f13f5f8>] pagefault_out_of_memory+0x38/0xa0
kernel:  [<ffffffff9f045a27>] mm_fault_error+0x77/0x150
kernel:  [<ffffffff9f046264>] __do_page_fault+0x414/0x420
kernel:  [<ffffffff9f046292>] do_page_fault+0x22/0x30
kernel:  [<ffffffff9f5b1f98>] page_fault+0x28/0x30

It seems to be caused by the memory spike introduced by binary copy. Should we always enforce a minimum memory limit for runc containers in the future?

My runc binary is statically linked:

$ ls -alh usr/local/sbin/runc 
-rwxr-xr-x 1 lantaol primarygroup 7.8M Feb 12 13:46 usr/local/sbin/runc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions