-
Notifications
You must be signed in to change notification settings - Fork 1.2k
XR status is not updated if XR is patched from composition functions #4968
Description
When running compositions in pipeline mode, an issue arises as a result of the patching carried out on the XR as part of the compoosition function runner here:
crossplane/internal/controller/apiextensions/composite/composition_functions.go
Lines 391 to 393 in 603e4ca
| if err := c.client.Status().Patch(ctx, xr, client.Apply, client.ForceOwnership, client.FieldOwner(FieldOwnerXR)); err != nil { | |
| return CompositionResult{}, errors.Wrap(err, errApplyXRStatus) | |
| } |
When the patch is applied, the XR resourceVersion field is updated on the managed resource but is never reflected back into the XR as it's handed back to the main reconile loop.
The outcome of this is that when crossplane tries to update the conditions inside the reconciler here
| return reconcile.Result{RequeueAfter: r.pollInterval}, errors.Wrap(r.client.Status().Update(ctx, xr), errUpdateStatus) |
it is unable to proceed with the error being masked inside the reconcile.Result.
The error returned from the client here is the object has been modified; please apply your changes to the latest version and try again which seems to get lost by the client logs making this difficult to trace.
This manifests inside the XR as "Stuck" status fields similar to that reported in slack here: https://crossplane.slack.com/archives/CEG3T90A1/p1699029022000269 despite their underlying resources being marked ready, for example when I run my pipeline, most of the resources become ready except 2 of them:
- eks-cluster-vpc (vpcs.ec2.aws.upbound.io)
- awsmanagedcontrolplane (objects.kubernetes.crossplane.io)
In the crossplane beta trace tree, these objects show as ready but unready against the XR:
$ crossplane beta trace xi sample-customer-zkrp6
NAME SYNCED READY STATUS
CompositeImport/sample-customer-zkrp6 True False Unready resources: awsmanagedcontrolplane, eks-cluster-vpc
...
├─ VPC/sample-customer True True Available
├─ ClusterAuth/sample-customer-eks-cluster-auth True True Available
...
├─ Object/sample-customer-awsmanagedcontrolplane True True Available
...
Another side effect I've seen from this is composition function error status being reflected from transient timeouts that don't seem to clear, for example if a running function temporarily became incommunicado, e.g. through pod restart, this presents as an rpc error even if the pod is subsequently running correctly.
conditions:
- lastTransitionTime: "2023-11-04T14:34:22Z"
message: 'cannot compose resources: cannot run Composition pipeline step "generate-subnets":
cannot run Function "function-generate-subnets": rpc error: code = DeadlineExceeded
desc = context deadline exceeded'
reason: ReconcileError
status: "False"
type: Synced
- lastTransitionTime: "2023-11-03T07:16:09Z"
message: 'Unready resources: awsmanagedcontrolplane, eks-cluster-vpc'
reason: Creating
status: "False"
type: Ready
Examining both the function pod logs and the crossplane pod logs shows the function working and behaving normally.
To verify the behaviour I was seeing, I looked at modifying the behaviour of the reconciler at the end of the Reconcile function to retrieve the last applied resourceVersion and this seems to have a positive effect:
nxr := composite.New(composite.WithGroupVersionKind(r.gvk))
if err := r.client.Get(ctx, req.NamespacedName, nxr); err != nil {
log.Debug(errGet, "error", err)
return reconcile.Result{}, errors.Wrap(resource.IgnoreNotFound(err), errGet)
}
xr.SetResourceVersion(nxr.GetResourceVersion())
if err = r.client.Status().Update(ctx, xr); err != nil {
log.Debug(errUpdateStatus, "error", err)
}
return reconcile.Result{RequeueAfter: r.pollInterval}, errors.Wrap(err, errUpdateStatus)Whilst re-fetching the XR is neither a clean or elegant solution, I wasn't able to determine what the new resourceVersion would be without doing so
$ crossplane beta trace xi sample-customer-zkrp6
NAME SYNCED READY STATUS
CompositeImport/sample-customer-zkrp6 True True Available
...
├─ VPC/sample-customer True True Available
...
├─ Object/sample-customer-awsmanagedcontrolplane True True Available
...