1. Quick Debug Information
- OS/Version(e.g. RHEL8.6, Ubuntu22.04): RHEL 9.2
- Kernel Version: 5.14.0-284.71.1.el9_2.x86_64
- Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): CRI-O
- K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): OCP 4.15.20
- GPU Operator Version: 24.6.0
2. Issue or feature description
It appears after upgrading from 24.3.0 to 24.6.0 there may be missing permissions for the operator, we're using OLM to deploy the operator on OpenShift.
list and watch on configmaps:
{"level":"info","ts":"2024-08-01T16:39:37Z","logger":"controllers.Upgrade","msg":"ProcessUncordonRequiredNodes"}
{"level":"info","ts":"2024-08-01T16:39:37Z","logger":"controllers.Upgrade","msg":"State Manager, finished processing"}
W0801 16:40:09.232591 1 reflector.go:547] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0801 16:40:09.232618 1 reflector.go:150] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0801 16:40:57.614741 1 reflector.go:547] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0801 16:40:57.614766 1 reflector.go:150] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0801 16:41:28.767493 1 reflector.go:547] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0801 16:41:28.767520 1 reflector.go:150] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
{"level":"info","ts":"2024-08-01T16:41:37Z","logger":"controllers.Upgrade","msg":"Reconciling Upgrade","upgrade":{"name":"gpu-cluster-policy"}}
{"level":"info","ts":"2024-08-01T16:41:37Z","logger":"controllers.Upgrade","msg":"Using label selector","upgrade":{"name":"gpu-cluster-policy"},"key":"openshift.driver-toolkit","value":"true"}
3. Steps to reproduce the issue
Upgrade from 24.3.0 to 24.6.0 via OLM in OpenShift
Let me know if you need anything else.
1. Quick Debug Information
2. Issue or feature description
It appears after upgrading from 24.3.0 to 24.6.0 there may be missing permissions for the operator, we're using OLM to deploy the operator on OpenShift.
list and watch on configmaps:
3. Steps to reproduce the issue
Upgrade from 24.3.0 to 24.6.0 via OLM in OpenShift
Let me know if you need anything else.