What happened?
A PodClique delete reconcile can fail when the target cluster does not serve resource.k8s.io/v1 ResourceClaim, even if the workload is only trying to clean up.
Observed operator log:
{"level":"error","ts":"2026-05-12T22:13:19.622Z","msg":"Reconciler error","controller":"podclique-controller","controllerGroup":"grove.io","controllerKind":"PodClique","PodClique":{"name":"myllm-0-frontend","namespace":"sr-48d5ee24-b975-4a09-b46e-e7f8834f210b"},"namespace":"sr-48d5ee24-b975-4a09-b46e-e7f8834f210b","name":"myllm-0-frontend","reconcileID":"e6f82fe8-bf91-4c3d-8730-81dfbcbee889","error":"[Operation: Delete, Code: ERR_DELETE_PCLQ_RESOURCE_CLAIM] message: Error deleting PCLQ-level ResourceClaims for sr-48d5ee24-b975-4a09-b46e-e7f8834f210b/myllm-0-frontend, cause: no matches for kind \"ResourceClaim\" in version \"resource.k8s.io/v1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/controller/controller.go:474\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/controller/controller.go:421\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/controller/controller.go:296"}
The failing operation is:
Operation: Delete
Code: ERR_DELETE_PCLQ_RESOURCE_CLAIM
cause: no matches for kind "ResourceClaim" in version "resource.k8s.io/v1"
This is related to #543, but that issue was closed by upgrading the local development kind cluster. The underlying operator behavior still exists: Grove attempts to reconcile/delete ResourceClaim objects even when the resource.k8s.io/v1 API is not present in the apiserver.
Expected behavior
PodClique cleanup should not get stuck solely because the cluster does not serve resource.k8s.io/v1 ResourceClaim.
If DRA support is unavailable, Grove should either:
- detect that the API is absent and skip ResourceClaim cleanup as already gone/not applicable, or
- surface a clear prerequisite error before accepting/enabling DRA-backed resource sharing.
At minimum, delete cleanup should probably ignore NoKindMatchError/resource-not-found style errors for ResourceClaim cleanup, since there cannot be ResourceClaim objects to delete if the API is absent.
Notes
Grove currently uses the stable Kubernetes DRA API (k8s.io/api/resource/v1). That implies Kubernetes 1.34+ for the ResourceClaim sharing path, or equivalent clusters serving resource.k8s.io/v1.
What happened?
A PodClique delete reconcile can fail when the target cluster does not serve
resource.k8s.io/v1ResourceClaim, even if the workload is only trying to clean up.Observed operator log:
{"level":"error","ts":"2026-05-12T22:13:19.622Z","msg":"Reconciler error","controller":"podclique-controller","controllerGroup":"grove.io","controllerKind":"PodClique","PodClique":{"name":"myllm-0-frontend","namespace":"sr-48d5ee24-b975-4a09-b46e-e7f8834f210b"},"namespace":"sr-48d5ee24-b975-4a09-b46e-e7f8834f210b","name":"myllm-0-frontend","reconcileID":"e6f82fe8-bf91-4c3d-8730-81dfbcbee889","error":"[Operation: Delete, Code: ERR_DELETE_PCLQ_RESOURCE_CLAIM] message: Error deleting PCLQ-level ResourceClaims for sr-48d5ee24-b975-4a09-b46e-e7f8834f210b/myllm-0-frontend, cause: no matches for kind \"ResourceClaim\" in version \"resource.k8s.io/v1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/controller/controller.go:474\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/controller/controller.go:421\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func1.1\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.22.4/pkg/internal/controller/controller.go:296"}The failing operation is:
This is related to #543, but that issue was closed by upgrading the local development kind cluster. The underlying operator behavior still exists: Grove attempts to reconcile/delete
ResourceClaimobjects even when theresource.k8s.io/v1API is not present in the apiserver.Expected behavior
PodClique cleanup should not get stuck solely because the cluster does not serve
resource.k8s.io/v1ResourceClaim.If DRA support is unavailable, Grove should either:
At minimum, delete cleanup should probably ignore
NoKindMatchError/resource-not-found style errors for ResourceClaim cleanup, since there cannot be ResourceClaim objects to delete if the API is absent.Notes
Grove currently uses the stable Kubernetes DRA API (
k8s.io/api/resource/v1). That implies Kubernetes 1.34+ for the ResourceClaim sharing path, or equivalent clusters servingresource.k8s.io/v1.