Skip to content

Upgrading gpu-operator on Rancher RKE2 results in nvidia-container-toolkit-daemonset failing to initialize #1099

@nikito

Description

@nikito

When upgrading to latest gpu-operator v24.9.0, when the nvidia-container-toolkit-daemonset fails to initialize with the following error:
level=error msg="error running nvidia-toolkit: unable to determine runtime options: unable to load containerd config: failed to load config: failed to run command chroot [/host containerd config dump]: exit status 127"

If I rollback to v24.6.2 everything initializes correctly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions