1. Issue or feature description
When installing the operator on nodes that already have a driver installed the CUDA version labels attached to the node do not match the installed CUDA version.
I have driver 495.46 and CUDA 11.5 installed on my workstation but the labels show CUDA 11.6.
2. Steps to reproduce the issue
Create cluster with driver 495.46 installed on nodes. (I use kind on a Ubuntu workstation for k8s development).
Install operator with driver.enabled=false.
$ helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false
Check nvidia-smi on node (shows driver 495.46 and CUDA 11.5):
$ nvidia-smi
Fri Aug 12 11:25:23 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46 Driver Version: 495.46 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
...
+-----------------------------------------------------------------------------+
Check nvidia-smi in a pod (shows driver 495.46 and CUDA 11.5):
$ cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
name: nvidia-version-check
spec:
restartPolicy: OnFailure
containers:
- name: nvidia-version-check
image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: "1"
EOF
pod/nvidia-version-check created
$ kubectl logs nvidia-version-check
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46 Driver Version: 495.46 CUDA Version: 11.5 |
|-------------------------------+----------------------+----------------------+
...
+-----------------------------------------------------------------------------+
$ kubectl delete pod nvidia-version-check
pod "nvidia-version-check" deleted
Check node labels created by the operator (shows driver 495.46 and CUDA 11.6):
$ kubectl describe node -A | grep nvidia.com/cuda
nvidia.com/cuda.driver.major=495
nvidia.com/cuda.driver.minor=46
nvidia.com/cuda.driver.rev=
nvidia.com/cuda.runtime.major=11
nvidia.com/cuda.runtime.minor=6
1. Issue or feature description
When installing the operator on nodes that already have a driver installed the CUDA version labels attached to the node do not match the installed CUDA version.
I have driver
495.46and CUDA11.5installed on my workstation but the labels show CUDA11.6.2. Steps to reproduce the issue
Create cluster with driver 495.46 installed on nodes. (I use kind on a Ubuntu workstation for k8s development).
Install operator with
driver.enabled=false.Check
nvidia-smion node (shows driver495.46and CUDA11.5):Check
nvidia-smiin a pod (shows driver495.46and CUDA11.5):Check node labels created by the operator (shows driver
495.46and CUDA11.6):