Skip to content

Update NVIDIA --gpu requests using CDI to match AMD logic#52145

Open
elezar wants to merge 3 commits intomoby:masterfrom
elezar:cdi-vendor-detection
Open

Update NVIDIA --gpu requests using CDI to match AMD logic#52145
elezar wants to merge 3 commits intomoby:masterfrom
elezar:cdi-vendor-detection

Conversation

@elezar
Copy link
Contributor

@elezar elezar commented Mar 5, 2026

- What I did

Refactored the handling of --gpus requests for NVIDIA devices to align with AMD device requests. See #52048.

- How I did it

Introduced a cdiCacheInjector and update the AMD-specific code to make use of this type. I then migrated the NVIDIA implementation to this.

- How to verify it

- Human readable description for the release notes

- A picture of a cute animal (not mandatory but encouraged)

elezar added 3 commits March 5, 2026 15:02
Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change adds a cdiCacheInjector type for handling --gpus requests
as vendor-specific CDI requests. The handling of --gpus requests for
AMD devices is updated to use this type.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
This change aligns the handling of --gpus requests using CDI
for NVIDIA device with those for AMD devices. Instead of checking
for the existence of the nvidia-cdi-hook executable, available CDI
vendors are checked. This makes the implementation more robust w.r.t
changing implementation details in NVIDIA CDI specs.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
@github-actions github-actions bot added the area/daemon Core Engine label Mar 5, 2026
@elezar elezar changed the title Cdi vendor detection Update NVIDIA --gpu requests using CDI to match AMD logic Mar 5, 2026
@vvoland vvoland added the kind/refactor PR's that refactor, or clean-up code label Mar 5, 2026
@vvoland vvoland added this to the 29.3.1 milestone Mar 5, 2026
Comment on lines 10 to 12
// RegisterGPUDeviceDrivers registers GPU device drivers.
// If the cdiCache is provided, it is used to detect presence of CDI specs for AMD GPUs.
// For NVIDIA GPUs, presence of CDI specs is detected by checking for the nvidia-cdi-hook binary.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// RegisterGPUDeviceDrivers registers GPU device drivers.
// If the cdiCache is provided, it is used to detect presence of CDI specs for AMD GPUs.
// For NVIDIA GPUs, presence of CDI specs is detected by checking for the nvidia-cdi-hook binary.
// RegisterGPUDeviceDrivers registers GPU device drivers.
// If the cdiCache is provided, it is used to discover the vendor via available CDI specs
// and translate the GPU requests to CDI device requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/daemon Core Engine kind/refactor PR's that refactor, or clean-up code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants