|
Requires: libnvidia-container-tools >= %{libnvidia_container_tools_version}, libnvidia-container-tools < 2.0.0 |
|
Requires: nvidia-container-toolkit-base == %{version}-%{release} |
The latest release of nvidia-container-toolkit bricked a lot of jobs on pytorch's CUDA CI (see example log) because it mistakenly upgraded the following packages:
- nvidia-container-tools
- libnvidia-container1
- nvidia-container-toolkit-base
The issue manifested itself with containers being unable to access GPU resources and thus we silently stopped running CUDA CI altogether (this will be remedied by pytorch/test-infra#6638).
I'm creating this issue more as a discussion point to check int to see if these dependencies can be pinned.
If they can be pinned I'll happily submit over a PR but wanted to get context on why they were not before in the past and if we should have a reasonable expectation that mismatched versions of these packages should work.
nvidia-container-toolkit/packaging/rpm/SPECS/nvidia-container-toolkit.spec
Lines 24 to 25 in ac8f190
The latest release of nvidia-container-toolkit bricked a lot of jobs on pytorch's CUDA CI (see example log) because it mistakenly upgraded the following packages:
The issue manifested itself with containers being unable to access GPU resources and thus we silently stopped running CUDA CI altogether (this will be remedied by pytorch/test-infra#6638).
I'm creating this issue more as a discussion point to check int to see if these dependencies can be pinned.
If they can be pinned I'll happily submit over a PR but wanted to get context on why they were not before in the past and if we should have a reasonable expectation that mismatched versions of these packages should work.