fix: skip runtimeClassName injection when gpuPodRuntimeClassName is empty#1035
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…mpty When gpuPodRuntimeClassName is set to empty string, the admission webhook should not inject runtimeClassName into GPU pods. This allows environments where the nvidia runtime is already the default containerd runtime (e.g., GPU Operator v25.10.0+) to avoid triggering the management.nvidia.com CDI management path, which fails with "unresolvable CDI devices" on nodes without UUID-based CDI specs. The --gpu-pod-runtime-class-name flag help text already documents "Set to empty string to disable" but the Mutate function did not check for this case. Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
0d995ee to
4991337
Compare
|
Thanks for the review. The empty string is preserved through the operator pipeline because // pkg/apis/kai/v1/common/set_default.go
func SetDefault[T any](target *T, value *T) *T {
if target == nil {
return value
}
return target // *string("") is not nil, so it's kept
}Full flow when user sets
Without this fix, step 5 would still proceed to evaluate the pod and call The |
Merging this branch will increase overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. Changed unit test files
|
9675f37
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin v0.9
git worktree add -d .worktree/backport-1035-to-v0.9 origin/v0.9
cd .worktree/backport-1035-to-v0.9
git switch --create backport-1035-to-v0.9
git cherry-pick -x 9675f37d32d202b7eeb96485c85fdd399d37b12b |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin v0.10
git worktree add -d .worktree/backport-1035-to-v0.10 origin/v0.10
cd .worktree/backport-1035-to-v0.10
git switch --create backport-1035-to-v0.10
git cherry-pick -x 9675f37d32d202b7eeb96485c85fdd399d37b12b |
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin v0.12
git worktree add -d .worktree/backport-1035-to-v0.12 origin/v0.12
cd .worktree/backport-1035-to-v0.12
git switch --create backport-1035-to-v0.12
git cherry-pick -x 9675f37d32d202b7eeb96485c85fdd399d37b12b |
…mpty (#1035) Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
…mpty (#1035) Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
…mpty (#1035) Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
…mpty (#1035) Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
…mpty (#1035) Signed-off-by: Yuan Chen <yuanchen97@gmail.com>
Summary
When
gpuPodRuntimeClassNameis set to empty string (""), the admission webhook should not injectruntimeClassNameinto GPU pods. Currently, even with an empty value, theMutatefunction still proceeds to evaluate the pod and may setruntimeClassNameto an empty string.This fix adds an early return in
RuntimeEnforcement.Mutate()whengpuPodRuntimeClassNameis empty, completely skipping the runtimeClassName injection.Problem
With GPU Operator v25.10.0+,
nvidiais configured as the default containerd runtime. KAI scheduler's admission webhook injectingruntimeClassName: nvidiatriggers themanagement.nvidia.comCDI management path, causing pod startup failures:The
--gpu-pod-runtime-class-nameflag help text already documents "Set to empty string to disable", but theMutatefunction did not check for this case.Changes
runtime_enforcement.go: Add early return whengpuPodRuntimeClassNameis emptyruntime_enforcement_test.go: Add test case for empty string behaviorTest plan
gpuPodRuntimeClassNameis empty🤖 Generated with Claude Code