Skip to content

After upgrading from 24.3.0 to 24.6.0 via OLM, the operator appears to be missing expected permissions on configmaps #883

@benhwebster

Description

@benhwebster

1. Quick Debug Information

  • OS/Version(e.g. RHEL8.6, Ubuntu22.04): RHEL 9.2
  • Kernel Version: 5.14.0-284.71.1.el9_2.x86_64
  • Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): CRI-O
  • K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): OCP 4.15.20
  • GPU Operator Version: 24.6.0

2. Issue or feature description

It appears after upgrading from 24.3.0 to 24.6.0 there may be missing permissions for the operator, we're using OLM to deploy the operator on OpenShift.

list and watch on configmaps:

{"level":"info","ts":"2024-08-01T16:39:37Z","logger":"controllers.Upgrade","msg":"ProcessUncordonRequiredNodes"}
{"level":"info","ts":"2024-08-01T16:39:37Z","logger":"controllers.Upgrade","msg":"State Manager, finished processing"}
W0801 16:40:09.232591 1 reflector.go:547] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0801 16:40:09.232618 1 reflector.go:150] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0801 16:40:57.614741 1 reflector.go:547] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0801 16:40:57.614766 1 reflector.go:150] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0801 16:41:28.767493 1 reflector.go:547] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0801 16:41:28.767520 1 reflector.go:150] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:106: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:nvidia-gpu-operator:gpu-operator" cannot list resource "configmaps" in API group "" at the cluster scope
{"level":"info","ts":"2024-08-01T16:41:37Z","logger":"controllers.Upgrade","msg":"Reconciling Upgrade","upgrade":{"name":"gpu-cluster-policy"}}
{"level":"info","ts":"2024-08-01T16:41:37Z","logger":"controllers.Upgrade","msg":"Using label selector","upgrade":{"name":"gpu-cluster-policy"},"key":"openshift.driver-toolkit","value":"true"}

3. Steps to reproduce the issue

Upgrade from 24.3.0 to 24.6.0 via OLM in OpenShift

Let me know if you need anything else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR to expose/discuss/fix a bug

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions