Skip to content

Pruning deletes child resources created by other controllers that copy parent labels #1153

@lucpas

Description

@lucpas

Description

Observed Behavior:

When a kro-managed ExternalSecret creates a child Secret, and the same RGD also directly manages at least one Secret (putting the Secret GroupKind in the applyset prune scope), kro's prune logic deletes the ExternalSecret-created Secret on every reconcile. The ExternalSecret operator recreates it, then kro deletes it again, creating an infinite loop that prevents the Secret from ever being usable.

This happens because:

  1. kro applies an ExternalSecret via SSA, injecting the applyset.kubernetes.io/part-of label
  2. The external-secrets operator creates a Secret and copies all labels from the ExternalSecret to the Secret. This is documented, intentional behavior with no opt-out mechanism
  3. Because the RGD also directly manages a Secret, the Secret GroupKind is in the applyset's contains-group-kinds annotation
  4. On the next reconcile, kro's prune lists all Secrets with the applyset label
  5. The ExternalSecret-created Secret's UID is not in keepUIDs (only the ExternalSecret's UID is), so kro deletes it
  6. The external-secrets operator recreates the Secret, and the cycle repeats

This affects any controller that propagates labels from a parent resource to its child resources.

Expected Behavior:

kro should not prune resources whose ownerReferences point to a resource that kro manages (i.e., whose owner UID is in keepUIDs).

You could argue that this is not really a bug in kro, but an issue with External Secrets Operator blindly copying all labels to a resource it creates, without being able to opt out.

Reproduction Steps:

Prerequisites:

  • A cluster with external-secrets operator installed
  • A working ClusterSecretStore named my-store
  • A secret in the backing store at key test/my-secret with at least one key-value pair
  1. Apply the RGD:
apiVersion: kro.run/v1alpha1
kind: ResourceGraphDefinition
metadata:
  name: prune-bug-repro
spec:
  schema:
    apiVersion: v1alpha1
    kind: PruneBugRepro
    spec:
      secretStoreName: string
      secretKey: string
    status:
      ready: ${directSecret.metadata.name != ""}
  resources:
    # This ExternalSecret creates a child Secret that inherits the applyset label
    - id: externalSecret
      readyWhen:
        - ${externalSecret.status.?conditions.orValue([]).exists(c, c.type == "Ready" && c.status == "True")}
      template:
        apiVersion: external-secrets.io/v1
        kind: ExternalSecret
        metadata:
          name: ${schema.metadata.name}-es-secret
          namespace: ${schema.metadata.namespace}
        spec:
          refreshInterval: 1h
          secretStoreRef:
            kind: ClusterSecretStore
            name: ${schema.spec.secretStoreName}
          target:
            name: ${schema.metadata.name}-es-secret
            creationPolicy: Owner
          dataFrom:
            - extract:
                key: ${schema.spec.secretKey}
    # This directly-managed Secret puts the Secret GroupKind in the applyset
    # prune scope;  without it, kro never lists Secrets during prune and the
    # bug does not manifest.
    - id: directSecret
      template:
        apiVersion: v1
        kind: Secret
        metadata:
          name: ${schema.metadata.name}-direct
          namespace: ${schema.metadata.namespace}
        stringData:
          note: managed-by-kro
  1. Wait for the RGD to become Active:
kubectl get rgd prune-bug-repro
  1. Create an instance:
apiVersion: kro.run/v1alpha1
kind: PruneBugRepro
metadata:
  name: test-prune-bug
  namespace: default
spec:
  secretStoreName: my-store
  secretKey: test/my-secret
  1. Observe the ExternalSecret-created Secret being deleted in a loop:
# The secret is almost never visible; it gets deleted faster than it's recreated
for i in $(seq 1 20); do
  kubectl get secret test-prune-bug-es-secret --no-headers 2>&1
  sleep 1
done
  1. Verify the directly-managed Secret is fine:
# This one is stable because its UID IS in keepUIDs
kubectl get secret test-prune-bug-direct
  1. Check the applyset annotation confirms Secret is in prune scope:
kubectl get prunebugreproes.kro.run test-prune-bug -o \
  jsonpath='{.metadata.annotations.applyset\.kubernetes\.io/contains-group-kinds}'
# Output: ExternalSecret.external-secrets.io,Secret

Versions:

  • kro version: v0.8.5
  • Kubernetes Version: v1.33.8-eks

Involved Controllers:

Error Logs:

# kro controller logs show continuous cluster mutated requeue
dynamic-controller  Child triggered parent reconciliation  {"child": "/v1/secrets", "eventType": "delete", "childName": "test-prune-bug-es-secret"}
dynamic-controller  Requeue needed after delay  {"error": "cluster mutated", "delay": "3s"}
dynamic-controller  Child triggered parent reconciliation  {"child": "/v1/secrets", "eventType": "add", "childName": "test-prune-bug-es-secret"}
dynamic-controller  Requeue needed after delay  {"error": "cluster mutated", "delay": "3s"}

Root Cause:

In pkg/controller/instance/applyset/applyset.go, the prune() function filters candidates solely by checking keepUIDs.Has(obj.GetUID()). It does not account for resources that are owned (via ownerReferences) by a kro-managed resource. When a controller like external-secrets copies labels from its parent resource to the child, the child inherits the applyset.kubernetes.io/part-of label, making it visible to kro's prune, but its UID is not tracked by kro.

Suggested Fix:

Before adding a resource to the prune candidate list, also check if any of its ownerReferences point to a UID in keepUIDs:

if !keepUIDs.Has(obj.GetUID()) && !ownedByKeepUID(obj, keepUIDs) {
    local = append(local, pruneCandidate{obj: obj, gvr: task.gvr})
}
func ownedByKeepUID(obj *unstructured.Unstructured, keepUIDs sets.Set[types.UID]) bool {
    for _, ref := range obj.GetOwnerReferences() {
        if keepUIDs.Has(ref.UID) {
            return true
        }
    }
    return false
}

Related Issues:


  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Which option describes the most your issue?

Instance (Create, Update, Deletion)

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions