Skip to content

Cannot enable GDRcopy using Nvidia driver CRD due to wrong indentation in 0500_daemonset.yaml #713

@age9990

Description

@age9990

1. Quick Debug Information

  • OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu22.04
  • Kernel Version: 6.2.0-39
  • Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker):CRI-O
  • K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS):K8s
  • GPU Operator Version:v24.3.0

2. Issue or feature description

When enable GDRcopy in nvidia driver CR, driver daemonset is not changed and error log showed in gpu operator pod.
{"level":"error","ts":"2024-05-03T06:29:33.398Z","msg":"Error while syncing state","controller":"nvidia-driver-controller","object":{"name":"default"},"namespace":"","name":"default","reconcileID":"a902d530-65d4-480e-8157-0e0c21d0a332","error":"failed to create k8s objects from manifests: failed to render kubernetes manifests: error rendering file /opt/gpu-operator/manifests/state-driver/0500_daemonset.yaml: failed to unmarshal manifest /opt/gpu-operator/manifests/state-driver/0500_daemonset.yaml: error converting YAML to JSON: yaml: line 195: did not find expected key"}

Looking into this file, the indentation is not correct, missing two spaces from L493 to L496.

volumeMounts:
- name: run-nvidia
mountPath: /run/nvidia
mountPropagation: HostToContainer
- name: var-log
mountPath: /var/log
- name: dev-log
mountPath: /dev/log
readOnly: true
{{- if and (.Openshift) (.Runtime.OpenshiftDriverToolkitEnabled) }}
- name: shared-nvidia-driver-toolkit
mountPath: /mnt/shared-nvidia-driver-toolkit
{{- end}}
{{- if and .AdditionalConfigs .AdditionalConfigs.VolumeMounts }}
{{- range .AdditionalConfigs.VolumeMounts }}
- name: {{ .Name }}
mountPath: {{ .MountPath }}
subPath: {{ .SubPath }}
readOnly: {{ .ReadOnly }}
{{- end }}
{{- end }}

Once I fixed the indentation and rebuilt the image, the GDRcopy can be enabled with no error.

Metadata

Metadata

Assignees

Labels

bugIssue/PR to expose/discuss/fix a bug

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions