Use cdi device driver to handle nvidia --gpus requests#50228
Use cdi device driver to handle nvidia --gpus requests#50228thaJeztah merged 2 commits intomoby:masterfrom
Conversation
78a7b0c to
7d75268
Compare
|
Related to our discussion in docker/buildx#3320, I wonder if it should be the other way around and use first |
Could you elaborate on what you mean by use Update: Sorry, looking at the diff now, I think I understand. The way it is implemented the presence of the Yes, this should change. |
6ea74f7 to
7aeb6d3
Compare
| } | ||
| return incompatibleDeviceRequest{dev.req.Driver, dev.req.Capabilities} | ||
| } | ||
|
|
There was a problem hiding this comment.
One question would be whether we wanted to check whether the requested devices are known at this stage and only then invoke the cdiDeviceDriver. Alternatively, we could simplyfy the fallback logic to always fallback to the setNvidiaGPUs handler for all errors.
There was a problem hiding this comment.
This is only for the fallback? I don't have a strong opinion on that; any preference on your side? pros/cons?
There was a problem hiding this comment.
I think we should check for know devices. I don't want to get into a state where a user can't run a container because there is a problem with the generation of the CDI specs -- or they are using a version of the NVIDIA Container Toolkit that doesn't yet generate the CDI specs. Once these are more common, then we can consider changing this behaviour in which case we would probably remove the hook-based injection entirely.
7aeb6d3 to
3386805
Compare
|
/cc |
3386805 to
c28f616
Compare
|
@thaJeztah I think I have spent enough time playing with the first pass of this. Some questions that I would have is what is required to allow users to opt-out of this behaviour? I think the detection should be robust enough to handle various combiniations, but it may be good to make things more explicit. |
c28f616 to
da71cc2
Compare
|
Could you do a rebase to have a fresh run of CI? |
fff0e36 to
109cff0
Compare
done. |
|
Probably unrelated; looks like we may have a new flaky test? Aded in 10c0fc4 / #51675 |
daemon/devices_nvidia_linux.go
Outdated
| hasNVIDIAExecutables := make(map[string]struct{}) | ||
| for _, e := range []string{nvidiaContainerRuntimeHookExecutableName, nvidiaCDIHookExecutableName} { | ||
| if _, err := exec.LookPath(e); err != nil { | ||
| continue | ||
| } | ||
| hasNVIDIAExecutables[e] = struct{}{} | ||
| } | ||
|
|
||
| if len(hasNVIDIAExecutables) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| nvidiaDrivers := make(map[string]*deviceDriver) | ||
| // Register a driver specific to the nvidia-container-runtime-hook if present. | ||
| // This has no capabilities associated with it so as to not inadvertently | ||
| // match requests. | ||
| if _, ok := hasNVIDIAExecutables[nvidiaContainerRuntimeHookExecutableName]; ok { | ||
| nvidiaDrivers["nvidia.runtime-hook"] = &deviceDriver{ | ||
| capset: nil, | ||
| updateSpec: setNvidiaGPUs, | ||
| } | ||
| } | ||
|
|
||
| // Register a driver specific to CDI if present. | ||
| // This has no capabilities associated with it so as to not inadvertently | ||
| // match requests. | ||
| if _, ok := hasNVIDIAExecutables[nvidiaCDIHookExecutableName]; ok { | ||
| nvidiaDrivers["nvidia.cdi"] = &deviceDriver{ | ||
| capset: nil, | ||
| updateSpec: (&cdiDeviceInjector{ | ||
| defaultCDIDeviceKind: "nvidia.com/gpu", | ||
| }).injectDevices, | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Minor nit; I guess we can skip the hasNVIDIAExecutables intermediate map here, and directly use the nvidiaDrivers map?
nvidiaDrivers := make(map[string]*deviceDriver)
if _, err := exec.LookPath(nvidiaCDIHookExecutableName); err == nil {
// Register a driver specific to CDI if present.
// This has no capabilities associated to not inadvertently match requests.
nvidiaDrivers["nvidia.cdi"] = &deviceDriver{
updateSpec: (&cdiDeviceInjector{
defaultCDIDeviceKind: "nvidia.com/gpu",
}).injectDevices,
}
}
if _, err := exec.LookPath(nvidiaContainerRuntimeHookExecutableName); err == nil {
// Register a driver specific to the nvidia-container-runtime-hook if present.
// This has no capabilities associated to not inadvertently match requests.
nvidiaDrivers["nvidia.runtime-hook"] = &deviceDriver{
updateSpec: setNvidiaGPUs,
}
}
if len(nvidiaDrivers) == 0 {
return nil
}Perhaps even the composite could be rolled up into the above, but it might be clearer to construct that below to express the "order is important here".
There was a problem hiding this comment.
Thanks for the proposal. I reworked this a bit and I believe it's cleaner now.
| // If the cdi device driver is not available then we return an error. | ||
| cdiDeviceDriver := deviceDrivers["cdi"] | ||
| if cdiDeviceDriver == nil { | ||
| return fmt.Errorf("no CDI device driver registered: %w", incompatibleDeviceRequest{dev.req.Driver, dev.req.Capabilities}) | ||
| } |
There was a problem hiding this comment.
Looks like this check is more lightweight than the code above; perhaps move this to the start of the method to allow an early return?
There was a problem hiding this comment.
I suppose the question is whether we want to raise an error if no CDI driver is registered for empty requests. I think we can meet in the middle, where we quick-return if the requested IDs are nil and then only construct cdiDeviceIDs if the driver is registered.
daemon/devices_nvidia_linux.go
Outdated
| cdiRequest := dev.req | ||
| cdiRequest.Count = 0 | ||
| cdiRequest.Options = nil | ||
| cdiRequest.DeviceIDs = cdiDeviceIDs | ||
|
|
||
| cdiDeviceInstance := deviceInstance{ | ||
| req: cdiRequest, | ||
| selectedCaps: nil, | ||
| } | ||
|
|
||
| return cdiDeviceDriver.updateSpec(s, &cdiDeviceInstance) |
There was a problem hiding this comment.
Looking at the container.DeviceRequest struct, I think the only thing we're preserving from the dev.req is the Driver field and the Capabilities; Wondering if it'd be more clear to construct a fresh request here (but not sure if that's correct; just from glancing over);
return cdiDeviceDriver.updateSpec(s, &deviceInstance{
req: container.DeviceRequest{
Driver: dev.req.Driver,
DeviceIDs: cdiDeviceIDs,
Capabilities: dev.req.Capabilities,
},
selectedCaps: nil,
})I don't think there's a need to de-reference the Capabilities? If there is, then perhaps a slices.Clone (but not sure that really helps as it's a slice of slices);
return cdiDeviceDriver.updateSpec(s, &deviceInstance{
req: container.DeviceRequest{
Driver: dev.req.Driver,
DeviceIDs: cdiDeviceIDs,
Capabilities: slices.Clone(dev.req.Capabilities),
},
selectedCaps: nil,
})
daemon/devices_nvidia_linux.go
Outdated
| // normalizeDeviceID ensures that the specified deviceID is a fully-qualified | ||
| // CDI device name. | ||
| // If the deviceID is already a fully-qualified CDI device name it is returned | ||
| // as-is, otherwise, the defailt CDI device kind (vendor/class) is used to |
daemon/devices_nvidia_linux.go
Outdated
| parts := strings.SplitN(deviceID, "=", 2) | ||
| if len(parts) == 2 { |
There was a problem hiding this comment.
Perhaps strings.Cut so that we don't allocate a new slice just for checking;
_, _, ok := strings.Cut(deviceID, "=")
if ok {
return deviceID| // NvidiaCLI is the path to the Nvidia helper binary | ||
| const NvidiaCLI = "nvidia-container-cli" |
There was a problem hiding this comment.
Interesting; looks like this was using a different binary than we have in the Moby code; is that correct?
There was a problem hiding this comment.
Yes, the ctr implementation was invoking the nvidia-container-cli directly in a hook instead of using the nvidia-container-runtime-hook. See https://github.com/containerd/containerd/pull/12537/changes#diff-59afc480d0fe54b9edf0bca811470767f059431f5e65cd89c297ffcdbb27ef62L90 where I also removed this from the ctr code.
This inconsistency is one of the motivations for switching to CDI -- which should be consistent across all runtimes.
| } | ||
| return incompatibleDeviceRequest{dev.req.Driver, dev.req.Capabilities} | ||
| } | ||
|
|
There was a problem hiding this comment.
This is only for the fallback? I don't have a strong opinion on that; any preference on your side? pros/cons?
4c30d65 to
daa1213
Compare
Signed-off-by: Evan Lezar <elezar@nvidia.com>
daa1213 to
f918651
Compare
thanks for the review @vvoland. Let me know if I need to rebase again to apply the fixup commit. |
This change updates the device driver for --gpu requests to handle --gpu requests as CDI device requests. This is only done if the nvidia-cdi-hook binary is present and falls back to the nvidia-container-runtime-hook if available. If a cdi deviceDriver is installed at the point where a request should be handled, the incoming device request is updated to refer to fully-qualified CDI device names and forwarded to the cdi deviceDriver. Signed-off-by: Evan Lezar <elezar@nvidia.com>
thaJeztah
left a comment
There was a problem hiding this comment.
LGTM
I'll do a quick squash of the "fixup" commit
f918651 to
86f122a
Compare
- What I did
Updated the
nvidiadevice driver to handle GPU requests (e.g. the--gpusflag) using thecdidevice driver if available, falling back to the existing prestart hook-based approach.- How I did it
I renamed the existing
nvidiadevice driver tonvidia.runtime-hookand added a new device drivernvidia.cdiif thenvidia-container-runtime-hookandnvidia-cdi-hookbinaries are present on the system, respectively. For thenvidia.cdidevice driver, the update spec handler converts the incoming device request to CDI device names using a predefined device kind (e.g.--gpus allwill map tonvidia.com/gpu=all) and forwards this request to thecdihandler if defined.I then construct an
nvidiadriver with an update spec handler that attempts to apply the update for the available drivers in the order:[nvidia.cdi, nvidia.runtime-hook].Note: When the changes from #50717 are merged, this would also allow users to explicitly select the
nvidia.runtime-hookdriver.- How to verify it
Ensure that the
--gpusflag continues to work on systems where thenvidia-container-runtime-hookis not available.nvidia-container-toolkit-basepackage to install thenvidia-ctkandnvidia-cdi-hookbinaries.nvidia-container-runtime-hookandamd-container-runtimeare not available on thePATH.nvidiadevices by running:Confirm that CDI injection fails if no CDI spec is found:
nvidia.com/gpuCDI specs:Check that we fallback to
nvidia-container-runtime-hook-based injection if CDI injection fails as above.nvidia-container-toolkitpackage in ensure that thenvidia-container-runtime-hookexecutable is present.nvidia-smiin a container requesting a single GPU:Confirm that CDI injection takes precedence over hook-based injection
- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)