Skip to content

[rpm] libnvidia-container-tools should pin to nvidia-container-toolkit version #1091

@seemethere

Description

@seemethere

Requires: libnvidia-container-tools >= %{libnvidia_container_tools_version}, libnvidia-container-tools < 2.0.0
Requires: nvidia-container-toolkit-base == %{version}-%{release}

The latest release of nvidia-container-toolkit bricked a lot of jobs on pytorch's CUDA CI (see example log) because it mistakenly upgraded the following packages:

  • nvidia-container-tools
  • libnvidia-container1
  • nvidia-container-toolkit-base

The issue manifested itself with containers being unable to access GPU resources and thus we silently stopped running CUDA CI altogether (this will be remedied by pytorch/test-infra#6638).

I'm creating this issue more as a discussion point to check int to see if these dependencies can be pinned.

If they can be pinned I'll happily submit over a PR but wanted to get context on why they were not before in the past and if we should have a reasonable expectation that mismatched versions of these packages should work.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions