1. Quick Debug Information
- OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu20.04
- Kernel Version: 5.15.x
- Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): CRI-O
- K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): K8s
- GPU Operator Version: v23.9.0
2. Issue or feature description
-
From the code we can see the repoConfig is not mounted into GDS container, so the apt repository cannot be set to on-premise repository, causing the container in CrashLoopBackoff state.
It should contain the following in nvidia-fs-ctr
https://github.com/NVIDIA/gpu-operator/blob/master/manifests/state-driver/0500_daemonset.yaml
{{- if and .AdditionalConfigs .AdditionalConfigs.VolumeMounts }}
{{- range .AdditionalConfigs.VolumeMounts }}
-
What's more, the GDS image name should concatenate os info, like what we do for nvidia driver pod.
The default values.yaml, will cause the image pull backoff since the image tag is not correct (missing os, it should be 2.16.1-ubuntu20.04)
gds:
version: "2.16.1"
From the code, the os is not used to construct imagePath.
|
func getGDSSpec(spec *nvidiav1alpha1.NVIDIADriverSpec) (*gdsDriverSpec, error) { |
driver image path does reference os.
https://github.com/NVIDIA/gpu-operator/blob/master/internal/state/driver.go#L472
3. Steps to reproduce the issue
Enable gds then the issue is reproduced.
@shivamerla Please help to resolve these issues to use GDS properly.
1. Quick Debug Information
2. Issue or feature description
From the code we can see the repoConfig is not mounted into GDS container, so the apt repository cannot be set to on-premise repository, causing the container in CrashLoopBackoff state.
It should contain the following in nvidia-fs-ctr
https://github.com/NVIDIA/gpu-operator/blob/master/manifests/state-driver/0500_daemonset.yaml
{{- if and .AdditionalConfigs .AdditionalConfigs.VolumeMounts }}
{{- range .AdditionalConfigs.VolumeMounts }}
What's more, the GDS image name should concatenate os info, like what we do for nvidia driver pod.
The default values.yaml, will cause the image pull backoff since the image tag is not correct (missing os, it should be 2.16.1-ubuntu20.04)
gds:
version: "2.16.1"
From the code, the os is not used to construct imagePath.
gpu-operator/internal/state/driver.go
Line 533 in 79fe1cc
driver image path does reference os.
https://github.com/NVIDIA/gpu-operator/blob/master/internal/state/driver.go#L472
3. Steps to reproduce the issue
Enable gds then the issue is reproduced.
@shivamerla Please help to resolve these issues to use GDS properly.