Skip to content

Validation webhook prevents finalizer removal for PodCliqueSets created before auto-mnnvl feature #416

@shayasoolin

Description

@shayasoolin

What happened?

Describe the bug
A PodCliqueSet (PCS) created before the auto-mnnvl feature was introduced cannot be deleted. The resource remains stuck due to a finalizer. Attempts to manually remove the finalizer are blocked by the validation webhook because it incorrectly identifies the missing auto-mnnvl annotation as an "addition" of an immutable field during the patch request.

To Reproduce

  1. Have a PodCliqueSet created on an older version of Grove (pre-auto-mnnvl).
  2. Delete all child resources (Pods, Services, etc.) manually.
  3. Attempt to delete the PCS: kubectl delete pcs <name>. The command 'hangs' due to the grove.io/podcliqueset.grove.io.
  4. Attempt to manually remove the finalizer via kubectl patch or kubectl edit.
  5. Error: The validation webhook denies the request:

admission webhook "pcs.validating.webhooks.grove.io" denied the request: metadata.annotations.grove.io/auto-mnnvl: Forbidden: annotation grove.io/auto-mnnvl cannot be added after PodCliqueSet creation

  1. Attempt to scale the operator to 0 to bypass the logic.
  2. Error: The defaulting webhook now blocks the patch because the service endpoint is down:

Internal error occurred: failed calling webhook "pcs.defaulting.webhooks.grove.io": ... no endpoints available for service "grove-operator"

Expected behavior
The validation webhook should allow metadata/finalizer updates for existing resources, especially during deletion, even if the auto-mnnvl annotation is missing or being defaulted. It should not block the removal of finalizers on legacy resources.

Actual behavior
The resource is "deadlocked." The validation webhook blocks the manual fix while the operator is running, and the defaulting webhook blocks the manual fix when the operator is stopped.

Workaround
The only way to recover was to manually delete the ValidatingWebhookConfiguration for Grove, remove the finalizer from the PCS, and then restore/reinstall the webhook.

Environment:

  • Grove Version: v0.1.0-alpha.6
  • Installation Method: Helm

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions