Skip to content

GPU-enabled enroot containers fail after updating to latest nvidia-container-toolkit #4144

@tpdownes

Description

@tpdownes

Release 1.17.7 of the nvidia-container-toolkit contains a regression that impacts Slurm clusters that update to this version. See the following reports:

We advise customers not to upgrade running systems to this package. All compute nodes with GPUs can do this by creating a file at /etc/apt/preferences.d/block-broken-nvidia-container:

Package: nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Pin: version 1.17.7-1
Pin-Priority: 100

We will integrate this workaround robustly into our solutions while NVIDIA work on a fix to their packaging and solution.

Additionally, we recommend disabling unattended-upgrades.service if it is not already disabled:

sudo systemctl disable unattended-upgrades.service
sudo systemctl stop unattended-upgrades.service

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions