Skip to content

dockerd eating too much memory and keeps growing #45842

@nasen23

Description

@nasen23

Description

We have deployed kubernetes(rke) on a single node and some applications on it. Sometimes memory usage of dockerd process keeps growing and cpu usage is also very high. Finally that would take up all memory resources on that machine and we have to reboot.

Reproduce

Don't think this could be easily reproduced. This might be related to the applications. Arbitrarily speaking the steps could be deploying a rke kubernetes cluster, running grafana's loki stack and running some applications that produce numerous logs.

Expected behavior

No response

docker version

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:52:22 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:52:22 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client: Docker Engine - Community
 Version:    24.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 486
  Running: 302
  Paused: 0
  Stopped: 184
 Images: 264
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
 Kernel Version: 5.15.0-73-generic
 Operating System: Ubuntu 20.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 48
 Total Memory: 502.6GiB
 Name: server
 ID: R3MZ:NUMU:RKFE:VSG2:V5PB:VOFO:UB3U:7NUS:F45X:73TR:RAXZ:KUSY
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 5270
  Goroutines: 2397
  System Time: 2023-06-29T10:56:13.096261094+08:00
  EventsListeners: 0
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Registry Mirrors:
  https://hub-mirror.c.163.com/
 Live Restore Enabled: true

Additional Info

I've run go pprof to capture some information about cpu usage and memory usage when dockerd memory is growing fast. You can click the two images below to see the full image.

cpu profile
profile002

Also the original pprof files are attached.
dockerd_prof.zip

The applications we're running on this kubernetes cluster are one that would create many argo workflows to do some sort of experiment and a log scraper (grafana's loki + promtail). The log volume is pretty numerous (sometimes more than 10k lines/s). Don't know if that could be responsible for this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions