Skip to content

Openshift: NVIDIA GPU Operator: nvidia-container-toolkit-daemonset: InvalidImageName #513

@tormig-softronic

Description

@tormig-softronic

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Quick Debug Checklist

  • Are you running on an Ubuntu 18.04 node?
  • Are you running Kubernetes v1.13+?
  • Are you running Docker (>= 18.06) or CRIO (>= 1.13+)?
  • Do you have i2c_core and ipmi_msghandler loaded on the nodes?
  • Did you apply the CRD (kubectl describe clusterpolicies --all-namespaces)

Running on Openshift 4.11.25

1. Issue or feature description

After an upgrade to 23.3.0 the gpu-cluster-policy fails.

The nvidia-container-toolkit-daemonset appears to a have an erroneous image tag:

Failed to apply default image tag "nvcr.io/nvidia/k8s/container-toolkit@sha256:489125ceae5864280e4d6a9ab52ab0f650b3179349a7298c4a204feb60b661a": couldn't parse image reference "nvcr.io/nvidia/k8s/container-toolkit@sha256:489125ceae5864280e4d6a9ab52ab0f650b3179349a7298c4a204feb60b661a": invalid checksum digest length

2. Steps to reproduce the issue

Upgrade the operatro to 23.3.0 from 22.9.2, the ClusterPolicy fails to install. Remove the policy and applied a new, it still fails

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions