Release 1.17.7 of the nvidia-container-toolkit contains a regression that impacts Slurm clusters that update to this version. See the following reports:
We advise customers not to upgrade running systems to this package. All compute nodes with GPUs can do this by creating a file at /etc/apt/preferences.d/block-broken-nvidia-container:
Package: nvidia-container-toolkit nvidia-container-toolkit-base libnvidia-container-tools libnvidia-container1
Pin: version 1.17.7-1
Pin-Priority: 100
We will integrate this workaround robustly into our solutions while NVIDIA work on a fix to their packaging and solution.
Additionally, we recommend disabling unattended-upgrades.service if it is not already disabled:
sudo systemctl disable unattended-upgrades.service
sudo systemctl stop unattended-upgrades.service
Release 1.17.7 of the nvidia-container-toolkit contains a regression that impacts Slurm clusters that update to this version. See the following reports:
We advise customers not to upgrade running systems to this package. All compute nodes with GPUs can do this by creating a file at
/etc/apt/preferences.d/block-broken-nvidia-container:We will integrate this workaround robustly into our solutions while NVIDIA work on a fix to their packaging and solution.
Additionally, we recommend disabling
unattended-upgrades.serviceif it is not already disabled: