GPU Operator crash loop due to missing CRDs

### 1. Quick Debug Information
* OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 22.04.3 LTS
* Kernel Version: 5.15.0-87-generic x86_64
* Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): Containerd
* K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): K3S
* GPU Operator Version: v23.9.0


### 2. Issue or feature description
Upgrading the gpu-operator to v23.9.0 should not have the gpu-operator pod be stuck in a crash loop. The error `failed to get API group resources: unable to retrieve the complete list of server APIs: nvidia.com/v1alpha1: the server could not find the requested resource` repeats numerous times in the container logs, before the container stops with the error `failed to wait for nvidia-driver-controller caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.NVIDIADriver`. This looks like a regression from the new GPU Driver Custom Resource Definition, and when not deployed, causes the operator to not function properly.

### 3. Steps to reproduce the issue
Install gpu-operator v23.6.1, upgrade the Helm chart to v23.9.0 and observe the gpu-operator pod in a crash loop.

[gpu-operator-6fdbc66bd4-k82lb_gpu-operator.log](https://github.com/NVIDIA/gpu-operator/files/13097491/gpu-operator-6fdbc66bd4-k82lb_gpu-operator.log)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Operator crash loop due to missing CRDs #602

1. Quick Debug Information

2. Issue or feature description

3. Steps to reproduce the issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

GPU Operator crash loop due to missing CRDs #602

Description

1. Quick Debug Information

2. Issue or feature description

3. Steps to reproduce the issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions