Skip to content

XR status is not updated if XR is patched from composition functions #4968

@mproffitt

Description

@mproffitt

When running compositions in pipeline mode, an issue arises as a result of the patching carried out on the XR as part of the compoosition function runner here:

if err := c.client.Status().Patch(ctx, xr, client.Apply, client.ForceOwnership, client.FieldOwner(FieldOwnerXR)); err != nil {
return CompositionResult{}, errors.Wrap(err, errApplyXRStatus)
}

When the patch is applied, the XR resourceVersion field is updated on the managed resource but is never reflected back into the XR as it's handed back to the main reconile loop.

The outcome of this is that when crossplane tries to update the conditions inside the reconciler here

return reconcile.Result{RequeueAfter: r.pollInterval}, errors.Wrap(r.client.Status().Update(ctx, xr), errUpdateStatus)

it is unable to proceed with the error being masked inside the reconcile.Result.

The error returned from the client here is the object has been modified; please apply your changes to the latest version and try again which seems to get lost by the client logs making this difficult to trace.

This manifests inside the XR as "Stuck" status fields similar to that reported in slack here: https://crossplane.slack.com/archives/CEG3T90A1/p1699029022000269 despite their underlying resources being marked ready, for example when I run my pipeline, most of the resources become ready except 2 of them:

  • eks-cluster-vpc (vpcs.ec2.aws.upbound.io)
  • awsmanagedcontrolplane (objects.kubernetes.crossplane.io)

In the crossplane beta trace tree, these objects show as ready but unready against the XR:

$ crossplane beta trace xi sample-customer-zkrp6
NAME                                                 SYNCED   READY   STATUS
CompositeImport/sample-customer-zkrp6                True     False   Unready resources: awsmanagedcontrolplane, eks-cluster-vpc
...
├─ VPC/sample-customer                               True     True    Available
├─ ClusterAuth/sample-customer-eks-cluster-auth      True     True    Available
...
├─ Object/sample-customer-awsmanagedcontrolplane     True     True    Available
...

Another side effect I've seen from this is composition function error status being reflected from transient timeouts that don't seem to clear, for example if a running function temporarily became incommunicado, e.g. through pod restart, this presents as an rpc error even if the pod is subsequently running correctly.

  conditions:
  - lastTransitionTime: "2023-11-04T14:34:22Z"
    message: 'cannot compose resources: cannot run Composition pipeline step "generate-subnets":
      cannot run Function "function-generate-subnets": rpc error: code = DeadlineExceeded
      desc = context deadline exceeded'
    reason: ReconcileError
    status: "False"
    type: Synced
  - lastTransitionTime: "2023-11-03T07:16:09Z"
    message: 'Unready resources: awsmanagedcontrolplane, eks-cluster-vpc'
    reason: Creating
    status: "False"
    type: Ready

Examining both the function pod logs and the crossplane pod logs shows the function working and behaving normally.

To verify the behaviour I was seeing, I looked at modifying the behaviour of the reconciler at the end of the Reconcile function to retrieve the last applied resourceVersion and this seems to have a positive effect:

	nxr := composite.New(composite.WithGroupVersionKind(r.gvk))
	if err := r.client.Get(ctx, req.NamespacedName, nxr); err != nil {
		log.Debug(errGet, "error", err)
		return reconcile.Result{}, errors.Wrap(resource.IgnoreNotFound(err), errGet)
	}
	
	xr.SetResourceVersion(nxr.GetResourceVersion())
	if err = r.client.Status().Update(ctx, xr); err != nil {
		log.Debug(errUpdateStatus, "error", err)
	}
	return reconcile.Result{RequeueAfter: r.pollInterval}, errors.Wrap(err, errUpdateStatus)

Whilst re-fetching the XR is neither a clean or elegant solution, I wasn't able to determine what the new resourceVersion would be without doing so

$ crossplane beta trace xi sample-customer-zkrp6
NAME                                                 SYNCED   READY   STATUS
CompositeImport/sample-customer-zkrp6                True     True    Available
...
├─ VPC/sample-customer                               True     True    Available
...
├─ Object/sample-customer-awsmanagedcontrolplane     True     True    Available
...

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions