-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Closed
Labels
Description
Description
I cannot disclose the full details of the setup I am running as it is propriety, that said, I am running k3d (which uses k3s) like below, in particular notice the --volume /sys/fs/cgroup:/sys/fs/cgroup:rw mount:
In any event, the error message invalid group path should not lead to a SIGSEGV, unless of course you have other opinions.
Steps to reproduce the issue
$ k3d cluster create \
--no-lb \
--no-rollback \
--agents 6 \
--image="${IMAGE}" \
--gpus=all \
--k3s-arg "--disable=traefik,servicelb,metrics-server@server:*" \
--k3s-arg "-v=6@server:*" \
--k3s-arg "--debug@server:*" \
--k3s-arg "--alsologtostderr@server:*" \
--volume /sys/fs/cgroup:/sys/fs/cgroup:rw \
--trace --verbose
# Install something
$ helm upgrade cert-manager cert-manager \
--repo=https://charts.jetstack.io \
--namespace cert-manager \
--create-namespace \
--install \
--version=v1.10.2 \
--set=installCRDs=true \
--wait \
--wait-for-jobs
I've tried all combinations of K3D_FIX_CGROUPV2=0|1 and K3D_FIX_MOUNTS=0|1 but it did not make a difference.
Additional Information
$ docker exec -it k3d-k3s-default-server-0 bash -c 'mount -v | grep -i cgroup'
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
freezer on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer)
Describe the results you received and expected
Logs
containerd
time="2024-11-12T22:21:27.911145320Z" level=debug msg="shim bootstrap parameters" address="unix:///run/containerd/s/92c72650133423e0998fdd4bd84988d1f197650fc55d21e9468a9e22218d1f80" namespace=k8s.io protocol=ttrpc
time="2024-11-12T22:21:27.916349466Z" level=info msg="loading plugin ½"io.containerd.event.v1.publisher½"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
time="2024-11-12T22:21:27.916430504Z" level=info msg="loading plugin ½"io.containerd.internal.v1.shutdown½"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
time="2024-11-12T22:21:27.916445573Z" level=info msg="loading plugin ½"io.containerd.ttrpc.v1.task½"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-11-12T22:21:27.916551908Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.task
time="2024-11-12T22:21:27.916570129Z" level=info msg="loading plugin ½"io.containerd.ttrpc.v1.pause½"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
time="2024-11-12T22:21:27.916581130Z" level=debug msg="registering ttrpc service" id=io.containerd.ttrpc.v1.pause
time="2024-11-12T22:21:27.916714737Z" level=debug msg="serving api on socket" socket="ÿinherited from parent¦"
time="2024-11-12T22:21:27.916751507Z" level=debug msg="starting signal loop" namespace=k8s.io path=/run/k3s/containerd/io.containerd.runtime.v2.task/k8s.io/ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8 pid=2225 runtime=io.containerd.runc.v2
time="2024-11-12T22:21:27.975199118Z" level=error msg="loading cgroup2 for 2249" error="cgroups: invalid group path"
panic: runtime error: invalid memory address or nil pointer dereference
ÿsignal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x6f835a¦
goroutine 27 ÿrunning¦:
github.com/containerd/cgroups/v3/cgroup2.(*Manager).RootControllers(0xc0000c4660?)
/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/cgroups/v3/cgroup2/manager.go:270 +0x1a
github.com/containerd/containerd/runtime/v2/runc/task.(*service).Start(0xc000164d80, ¨0xb96838, 0xc000382700¼, 0xc0002594f0)
/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/runtime/v2/runc/task/service.go:314 +0x305
github.com/containerd/containerd/api/runtime/task/v2.RegisterTaskService.func3(¨0xb96838, 0xc000382700¼, 0xc00004e2e0)
/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/containerd/api/runtime/task/v2/shim_ttrpc.pb.go:53 +0x8c
github.com/containerd/ttrpc.defaultServerInterceptor(¨0xb96838?, 0xc000382700?¼, 0x7fffb14c0e68?, 0x10?, 0x7ffff7fb95b8?)
/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/interceptor.go:52 +0x22
github.com/containerd/ttrpc.(*serviceSet).unaryCall(0xc0000b2330, ¨0xb96838, 0xc000382700¼, 0xc0000b2378, 0xc0002c0740, ¨0xc0001b4140, 0x42, 0x50¼)
/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:75 +0xe3
github.com/containerd/ttrpc.(*serviceSet).handle.func1()
/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:118 +0x158
created by github.com/containerd/ttrpc.(*serviceSet).handle in goroutine 41
/go/src/github.com/k3s-io/k3s/build/src/github.com/containerd/containerd/vendor/github.com/containerd/ttrpc/services.go:111 +0x14c
time="2024-11-12T22:21:27.989682727Z" level=info msg="shim disconnected" id=ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8 namespace=k8s.io
time="2024-11-12T22:21:27.989743198Z" level=warning msg="cleaning up after shim disconnected" id=ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8 namespace=k8s.io
time="2024-11-12T22:21:27.989753741Z" level=info msg="cleaning up dead shim" namespace=k8s.io
time="2024-11-12T22:21:27.989902787Z" level=error msg="Failed to delete sandbox container ½"ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8½"" error="ttrpc: closed: unknown"
time="2024-11-12T22:21:27.990347663Z" level=error msg="encountered an error cleaning up failed sandbox ½"ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8½", marking sandbox state as SANDBOX_UNKNOWN" error="ttrpc: closed: unknown"
time="2024-11-12T22:21:27.990394639Z" level=error msg="RunPodSandbox for &PodSandboxMetadata¨Name:k8s-device-plugin-daemonset-qj8x8,Uid:c3d6073e-395d-4cca-a27d-57a30c29be6c,Namespace:nvidia,Attempt:6,¼ failed, error" error="failed to start sandbox container task ½"ad5170f629e608538b87774a2fc5c74fca871396c2b49d705b38f168283b8cd8½": ttrpc: closed: unknown"
kubectl
$ kubectl describe pod -n cert-manager cert-manager-5dfb9c94b5-k6hj2
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 8m46s (x13192 over 11h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container task "15867aa6216f302ec59607cbfa619dbc766b5432109731711406bb82c293d1a5": ttrpc: closed: unknown
Normal SandboxChanged 3m45s (x15307 over 11h) kubelet Pod sandbox changed, it will be killed and re-created.
Really this should should something like level=error msg="loading cgroup2 for 2249" error="cgroups: invalid group path", the warning (not error) leaves one clueless.
What version of containerd are you using?
v1.7.22-k3s1.28
Any other relevant information
$ docker exec -it k3d-k3s-default-server-0 bash -c 'containerd --version'
containerd github.com/k3s-io/containerd v1.7.22-k3s1.28
$ docker exec -it k3d-k3s-default-server-0 bash -c 'runc --version'
runc version 1.1.14
commit: 12de61f
spec: 1.0.2-dev
go: go1.22.8
libseccomp: 2.5.5
$ docker info
Client:
Version: 24.0.7
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: 0.12.1
Path: /usr/libexec/docker/cli-plugins/docker-buildx
Server:
Containers: 8
Running: 7
Paused: 0
Stopped: 1
Images: 28
Server Version: 24.0.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: nvidia
Init Binary: docker-init
containerd version: 83031836b2cf55637d7abf847b17134c51b38e53
runc version: v1.1.12-0-g51d5e946
init version:
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 6.8.0-48-generic
Operating System: Ubuntu 22.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 62.79GiB
Name: mbana-2
ID: 6e707ea5-5476-44c3-82ee-616d9f97a99a
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 80
Goroutines: 78
System Time: 2024-11-13T09:50:55.884211506Z
EventsListeners: 0
Username: mohamedbana
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Show configuration if it is related to CRI plugin.
$ docker exec -it k3d-k3s-default-server-0 bash -c 'cat /etc/containerd/config.toml'
version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "nvidia"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
privileged_without_host_devices = false
runtime_engine = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
errordeveloper