Skip to content

webhook crashes on unknown scheduler name #619

@weizhoublue

Description

@weizhoublue

What happened?

In a PodCliqueSet YAML, users set spec.template.cliques[].spec.podSpec.schedulerName to choose the scheduler backend. Valid values are "default-scheduler" and "kai-scheduler".
This field is plain corev1.PodSpec.SchedulerName (a string) — it has no CRD-level enum constraint, so the API server does not reject unknown values like "volcano" at the schema layer.

If a user mistakenly sets this to an unsupported value or typo , the validating webhook runs validateSchedulerNames, which detects the mismatch and records a validation error. However, validateSchedulerNames only accumulates errors into a field.ErrorList; it does not return early or prevent the handler from continuing. validatePodCliqueSetWithBackend is called
unconditionally immediately after.

Inside that function, GetOrDefault("kaikai-scheduler") or GetOrDefault("volcano") returns nil because the name is non-empty but not in the registry, and the subsequent backend.ValidatePodCliqueSet() dereferences nil, panicking the webhook process before it can return the recorded validation error to the user.
In short: the webhook does validate the name, but that validation does not stop the code from reaching the nil-deref site.

suggest a PR #613

What did you expect to happen?

No response

Environment

  • Kubernetes version
  • Grove version
  • Scheduler details
  • Cloud provider or hardware configuration
  • Tools that you are using Grove together with
  • Anything else that is relevant

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions