Skip to content

Proposal: New Healthy condition for claims and XRs #5643

@negz

Description

@negz

Executive summary

I propose we add a new Healthy status condition to claims and XRs. This condition would indicate whether the resource and all of its child resources were healthy - i.e. weren't currently encountering any errors.

This would make it possible to quickly determine whether a claim isn't ready because it's hitting an error, or because it just needs more time to become ready.

Importantly, the new condition wouldn't include the specific error(s) child resources were encountering. It would only indicate that there were or weren't errors.

What problem are you facing?

Claims, XRs, and MRs have the following status conditions:

  • Ready - Indicates that Crossplane believes the resource is ready to use
  • Synced - Indicates that Crossplane successfully synced the resource on the last reconcile

Ready doesn't mean "Crossplane is successfully reconciling this resource". It means "the thing this resource represents appears to be ready to use". For example if a claim represents a database, the Ready condition indicates that Crossplane believes the database is ready to use. It's ready for some application to connect to it and store data.

Ready is cumulative. A claim is only Ready when its underlying XR is Ready. The XR is only Ready when all its composed resources are Ready.

Synced does mean "Crossplane is successfully reconciling this resource". It isn't cumulative, though. It only means "this controller reconciled its most recent desired state successfully". For example an XR could reconcile its desired state by creating three misconfigured composed resources that would never become Synced or Ready. The XR is still Synced though, because its controller did its job. It successfully synced its desired state by creating composed resources.

I think these two conditions make sense. For example:

  • A resource could be Synced because Crossplane has successfully applied its desired state, but not yet Ready to use.
  • A resource could be Ready to use (as far as Crossplane knows), but no longer Synced because Crossplane is hitting an error.

There's an information gap though. Today if you look at a newly created claim you'll likely see:

  • Synced: True - The claim successfully created and is syncing with an XR
  • Ready: False - The conceptual resource the claim represents (e.g. a database) isn't ready to use yet

It's possible the XR controller, and all composed resource controllers are working fine but some MR isn't ready to use yet - e.g. the RDS API is reporting that a composed RDS instance isn't ready to use. It's also possible that one of the controllers is stuck due to an error.

Today there's no way to distinguish between these two cases without inspecting the next layer down.

Folks have been feeling the pain of this gap for some time, but I've been reluctant to address it because the proposed solution has always been "take the errors from lower level controllers and propagate them to the higher level controllers" - for example #3262.

@vfarcic explains why that isn't a great solution in https://www.youtube.com/watch?v=xAl3TAfFE_M. Claims, XRs, and MRs are all different layers of abstraction, with different intended audiences. An error that makes sense at one layer probably won't make sense at a higher layer of abstraction.

Consider for example an XR error due to an invalid Composition. I don't think it makes sense to copy that error verbatim to the claim. Assuming the claim author isn't writing their own Composition, the error isn't actionable for them.

One way to address this is to build a presentation layer that allows Composition authors to choose which errors were surfaced to the claim and how. That's what we're working on in #5402.

The downside of the presentation layer approach is that the information gap still exists by default. If the Composition author doesn't actively "present" errors to the claim, there's still no way to distinguish between a claim that's not ready due to an error, or a claim that's not ready because the cloud provider is still working on creating one of the composed resources.

How could Crossplane help solve your problem?

I propose we add a new Healthy status condition to claims and XRs.

A claim would be Healthy only when:

  • The claim controller was Synced
  • The underlying XR controller was Synced and Healthy

An XR would be Healthy only when:

  • The XR controller was Synced
  • All composed resources were Synced, and if applicable (e.g. the composed resource is an XR) Healthy

The Healthy condition would in effect be a cumulative version of the Synced condition. I'm proposing a new condition instead of proposing that we change the meaning of the Synced condition because changing the meaning would be a significant behavior change.

Importantly, I'm not proposing that we copy error messages from child to parent resources. For example if a claim wasn't Healthy I'd expect the status condition message to say something generic like "N/M composed resources are not synced" or just "Some resources encountered errors".

Under this proposal a claim status might look like this:

status:
  conditions:
  - type: "Ready"
    status: "False"
    reason: "BindCompositeResource"
    message: "The composite resource is not yet ready"
  - type: "Healthy"
    status: "False"
    reason: "UnhealthyCompositeResource"
    message: "The composite resource is not healthy"
  - type: "Synced"
    status: "True"
    reason: "ReconcileSuccess"

An XR status might look like this:

status:
  conditions:
  - type: "Ready"
    status: "False"
    reason: "Creating"
    message: "Unready resources: some-composed-resource, another-composed-resource"
  - type: "Healthy"
    status: "False"
    reason: "UnhealthyComposedResources"
    message: "Unhealthy resources: some-composed-resource"
  - type: "Synced"
    status: "True"
    reason: "ReconcileSuccess"

Metadata

Metadata

Assignees

No one assigned

    Labels

    compositionenhancementNew feature or requestproposalroadmapIssues that have priority and are included in the roadmap, or are candidates to add to the roadmapuser experience

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions