What happened?
In a PodCliqueSet YAML, users set spec.template.cliques[].spec.podSpec.schedulerName to choose the scheduler backend. Valid values are "default-scheduler" and "kai-scheduler".
This field is plain corev1.PodSpec.SchedulerName (a string) — it has no CRD-level enum constraint, so the API server does not reject unknown values like "volcano" at the schema layer.
If a user mistakenly sets this to an unsupported value or typo , the validating webhook runs validateSchedulerNames, which detects the mismatch and records a validation error. However, validateSchedulerNames only accumulates errors into a field.ErrorList; it does not return early or prevent the handler from continuing. validatePodCliqueSetWithBackend is called
unconditionally immediately after.
Inside that function, GetOrDefault("kaikai-scheduler") or GetOrDefault("volcano") returns nil because the name is non-empty but not in the registry, and the subsequent backend.ValidatePodCliqueSet() dereferences nil, panicking the webhook process before it can return the recorded validation error to the user.
In short: the webhook does validate the name, but that validation does not stop the code from reaching the nil-deref site.
suggest a PR #613
What did you expect to happen?
No response
Environment
- Kubernetes version
- Grove version
- Scheduler details
- Cloud provider or hardware configuration
- Tools that you are using Grove together with
- Anything else that is relevant
What happened?
In a PodCliqueSet YAML, users set spec.template.cliques[].spec.podSpec.schedulerName to choose the scheduler backend. Valid values are "default-scheduler" and "kai-scheduler".
This field is plain corev1.PodSpec.SchedulerName (a string) — it has no CRD-level enum constraint, so the API server does not reject unknown values like "volcano" at the schema layer.
If a user mistakenly sets this to an unsupported value or typo , the validating webhook runs validateSchedulerNames, which detects the mismatch and records a validation error. However, validateSchedulerNames only accumulates errors into a field.ErrorList; it does not return early or prevent the handler from continuing. validatePodCliqueSetWithBackend is called
unconditionally immediately after.
Inside that function, GetOrDefault("kaikai-scheduler") or GetOrDefault("volcano") returns nil because the name is non-empty but not in the registry, and the subsequent backend.ValidatePodCliqueSet() dereferences nil, panicking the webhook process before it can return the recorded validation error to the user.
In short: the webhook does validate the name, but that validation does not stop the code from reaching the nil-deref site.
suggest a PR #613
What did you expect to happen?
No response
Environment