Skip to content

cgroups: invalid group path leads to a segmentation violation #11001

@mbana

Description

@mbana

Description

I cannot disclose the full details of the setup I am running as it is propriety, that said, I am running k3d (which uses k3s) like below, in particular notice the --volume /sys/fs/cgroup:/sys/fs/cgroup:rw mount:

In any event, the error message invalid group path should not lead to a SIGSEGV, unless of course you have other opinions.

Steps to reproduce the issue

$ k3d cluster create \
    --no-lb \
    --no-rollback \
    --agents 6 \
    --image="${IMAGE}" \
    --gpus=all \
    --k3s-arg "--disable=traefik,servicelb,metrics-server@server:*" \
    --k3s-arg "-v=6@server:*" \
    --k3s-arg "--debug@server:*" \
    --k3s-arg "--alsologtostderr@server:*" \
    --volume /sys/fs/cgroup:/sys/fs/cgroup:rw \
    --trace --verbose
# Install something
$ helm upgrade cert-manager cert-manager \
    --repo=https://charts.jetstack.io \
    --namespace cert-manager \
    --create-namespace \
    --install \
    --version=v1.10.2 \
    --set=installCRDs=true \
    --wait \
    --wait-for-jobs

I've tried all combinations of K3D_FIX_CGROUPV2=0|1 and K3D_FIX_MOUNTS=0|1 but it did not make a difference.

Additional Information

$ docker exec -it k3d-k3s-default-server-0 bash -c 'mount -v | grep -i cgroup'
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer)

Describe the results you received and expected

Logs

containerd

time="2024-11-12T22:21:27.911145320Z" level=debug msg="shim bootstrap parameters" address="unix:///run/containerd/s/92c72650133423e0998fdd4bd84988d1f197650fc55d21e9468a9e22218d1f80" namespace=k8s.io protocol=ttrpc
time="2024-11-12T22:21:27.916349466Z" level=info msg="loading plugin ½"io.containerd.event.v1.publisher½"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2024-11-12T22:21:27.916430504Z" level=info msg="loading plugin ½"io.containerd.internal.v1.shutdown½"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2024-11-12T22:21:27.916445573Z" level=info msg="loading plugin ½"io.containerd.ttrpc.v1.task½"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-11-12T22:21:27.916551908Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.task
time="2024-11-12T22:21:27.916570129Z" level=info msg="loading plugin ½"io.containerd.ttrpc.v1.pause½"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-11-12T22:21:27.916581130Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.pause
time="2024-11-12T22:21:27.916714737Z" level=debug msg="serving api on socket" socket="ÿinherited from parent¦"
time="2024-11-12T22:21:27.916751507Z" level=debug msg="starting signal loop" namespace=k8s.io path=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8 pid=2225 runtime=io.containerd.runc.v2
time="2024-11-12T22:21:27.975199118Z" level=error msg="loading cgroup2 for 2249" error="cgroups: invalid group path"
panic: runtime error: invalid memory address or nil pointer dereference
ÿsignal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6f835a¦

goroutine 27 ÿrunning¦:
github.com/containerd/cgroups/v3/cgroup2.(*Manager).RootControllers(0xc0000c4660?)
	/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/cgroups/v3/cgroup2/manager.go:270 +0x1a
github.com/containerd/containerd/runtime/v2/runc/task.(*service).Start(0xc000164d80, ¨0xb96838, 0xc000382700¼, 0xc0002594f0)
	/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/runtime/v2/runc/task/service.go:314 +0x305
github.com/containerd/containerd/api/runtime/task/v2.RegisterTaskService.func3(¨0xb96838, 0xc000382700¼, 0xc00004e2e0)
	/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/containerd/api/runtime/task/v2/shim_ttrpc.pb.go:53 +0x8c
github.com/containerd/ttrpc.defaultServerInterceptor(¨0xb96838?, 0xc000382700?¼, 0x7fffb14c0e68?, 0x10?, 0x7ffff7fb95b8?)
	/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/interceptor.go:52 +0x22
github.com/containerd/ttrpc.(*serviceSet).unaryCall(0xc0000b2330, ¨0xb96838, 0xc000382700¼, 0xc0000b2378, 0xc0002c0740, ¨0xc0001b4140, 0x42, 0x50¼)
	/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:75 +0xe3
github.com/containerd/ttrpc.(*serviceSet).handle.func1()
	/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:118 +0x158
created by github.com/containerd/ttrpc.(*serviceSet).handle in goroutine 41
	/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:111 +0x14c
time="2024-11-12T22:21:27.989682727Z" level=info msg="shim disconnected" id=ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8 namespace=k8s.io
time="2024-11-12T22:21:27.989743198Z" level=warning msg="cleaning up after shim disconnected" id=ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8 namespace=k8s.io
time="2024-11-12T22:21:27.989753741Z" level=info msg="cleaning up dead shim" namespace=k8s.io
time="2024-11-12T22:21:27.989902787Z" level=error msg="Failed to delete sandbox container ½"ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8½"" error="ttrpc: closed: unknown"
time="2024-11-12T22:21:27.990347663Z" level=error msg="encountered an error cleaning up failed sandbox ½"ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8½", marking sandbox state as SANDBOX_UNKNOWN" error="ttrpc: closed: unknown"
time="2024-11-12T22:21:27.990394639Z" level=error msg="RunPodSandbox for &PodSandboxMetadata¨Name:k8s-device-plugin-daemonset-qj8x8,Uid:c3d6073e-395d-4cca-a27d-57a30c29be6c,Namespace:nvidia,Attempt:6,¼ failed, error" error="failed to start sandbox container task ½"ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8½": ttrpc: closed: unknown"

kubectl

$ kubectl describe pod -n cert-manager cert-manager-5dfb9c94b5-k6hj2
Events:
  Type     Reason                  Age                      From     Message
  ----     ------                  ----                     ----     -------
  Warning  FailedCreatePodSandBox  8m46s (x13192 over 11h)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "15867aa6216f302ec59607cbfa619dbc766b5432109731711406bb82c293d1a5": ttrpc: closed: unknown
  Normal   SandboxChanged          3m45s (x15307 over 11h)  kubelet  Pod sandbox changed, it will be killed and re-created.

Really this should should something like level=error msg="loading cgroup2 for 2249" error="cgroups: invalid group path", the warning (not error) leaves one clueless.

What version of containerd are you using?

v1.7.22-k3s1.28

Any other relevant information

$ docker exec -it k3d-k3s-default-server-0 bash -c 'containerd --version'
containerd github.com/k3s-io/containerd v1.7.22-k3s1.28
$ docker exec -it k3d-k3s-default-server-0 bash -c 'runc --version'
runc version 1.1.14
commit: 12de61f
spec: 1.0.2-dev
go: go1.22.8
libseccomp: 2.5.5
$ docker info
Client:
 Version:    24.0.7
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx

Server:
 Containers: 8
  Running: 7
  Paused: 0
  Stopped: 1
 Images: 28
 Server Version: 24.0.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 nvidia runc
 Default Runtime: nvidia
 Init Binary: docker-init
 containerd version: 83031836b2cf55637d7abf847b17134c51b38e53
 runc version: v1.1.12-0-g51d5e946
 init version:
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 6.8.0-48-generic
 Operating System: Ubuntu 22.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 62.79GiB
 Name: mbana-2
 ID: 6e707ea5-5476-44c3-82ee-616d9f97a99a
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 80
  Goroutines: 78
  System Time: 2024-11-13T09:50:55.884211506Z
  EventsListeners: 0
 Username: mohamedbana
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

Show configuration if it is related to CRI plugin.

$ docker exec -it k3d-k3s-default-server-0 bash -c 'cat /etc/containerd/config.toml' 
version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]

    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "nvidia"

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions