The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
1. Quick Debug Checklist
1. Issue or feature description
nov 02 18:00:58 beck containerd[10237]: time="2022-11-02T18:00:58.738797825+02:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:gpu-feature-discovery-qfjgk,Uid:02c7d4ad-db02-4145-846b-616a94416008,Namespace:gpu-operator,Attempt:2,} failed, error" error="failed to get sandbox runtime: no runtime for \"nvidia\" is configured"
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
(base) beck@beck:/$ ls -la /run/nvidia/
total 4
drwxr-xr-x 4 root root 100 nov 2 18:48 .
drwxr-xr-x 39 root root 1140 nov 2 18:47 ..
drwxr-xr-x 2 root root 40 nov 2 17:59 driver
-rw-r--r-- 1 root root 7 nov 2 18:48 toolkit.pid
drwxr-xr-x 2 root root 80 nov 2 18:48 validations
Driver folder is empty:
(base) beck@beck:/$ ls -la /run/nvidia/driver/
total 0
drwxr-xr-x 2 root root 40 nov 2 17:59 .
drwxr-xr-x 4 root root 80 nov 2 18:48 ..
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
1. Quick Debug Checklist
i2c_coreandipmi_msghandlerloaded on the nodes?kubectl describe clusterpolicies --all-namespaces)1. Issue or feature description
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
kubernetes pods status:
kubectl get pods --all-namespaceskubernetes daemonset status:
kubectl get ds --all-namespacesIf a pod/ds is in an error state or pending state
kubectl describe pod -n NAMESPACE POD_NAMEIf a pod/ds is in an error state or pending state
kubectl logs -n NAMESPACE POD_NAMEOutput of running a container on the GPU machine:
docker run -it alpine echo fooDocker configuration file:
cat /etc/docker/daemon.jsonDocker runtime configuration:
docker info | grep runtimeNVIDIA shared directory:
ls -la /run/nvidiaNVIDIA packages directory:
ls -la /usr/local/nvidia/toolkitNVIDIA driver directory:
ls -la /run/nvidia/driverkubelet logs
journalctl -u kubelet > kubelet.logsDriver folder is empty: