Skip to content

feat: add posibility for filtering applied crds when installing provider#3987

Closed
akesser wants to merge 1 commit intocrossplane:masterfrom
akesser:feature/crd-filtering
Closed

feat: add posibility for filtering applied crds when installing provider#3987
akesser wants to merge 1 commit intocrossplane:masterfrom
akesser:feature/crd-filtering

Conversation

@akesser
Copy link
Copy Markdown

@akesser akesser commented Apr 14, 2023

Description of your changes

As a prove of concept for filtering crds when installing providers we updated Provider spec to include two new array fields includeCrds and excludeCrds that can contain regex values:

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: crossplane-aws
spec:
  controllerConfigRef:
    name: crossplane-aws
  ignoreCrossplaneConstraints: false
  package: #see pr in crossplane-contrib/aws https://github.com/crossplane-contrib/provider-aws/pull/1727
  packagePullPolicy: IfNotPresent
  revisionActivationPolicy: Automatic
  revisionHistoryLimit: 1
  skipDependencyResolution: false
  excludeCrds:
    - \.aws\.crossplane\.io
  includeCrds:
    - securitygroup\.ec2\.aws\.crossplane\.io
    - securitygrouprules\.ec2\.aws\.crossplane\.io
    - \.eks\.aws\.crossplane\.io

This poc avoids the proposal of breaking larges providers by service/group/type and shows the possibility to filter applied crds.

When applying a provider with these optional fields set, the list of applied crd gets filtered. If a crd matches a value in excludeCrds, it is ignored, unless it matches a value in includeCrds. Crds that are not matched by values in excludeCrds get applied. If excludeCrds is empty, all crds will be applied.

The deployment for the provider is created with an additional env variable that is used by the provider to filter the activated controllers.

This is described and can be seen in the PR in crosplane-contrib/provider-aws: crossplane-contrib/provider-aws#1727

The poc does not cover the following:

  • The deletion of crds when a provider is reconfigured is not part of this poc - perhaps this should be done by a user to not destroy resources by accident
  • Applying the same provider with different names and not overlapping groups of crds could be applied at the same time (switching single resource versions indepently) -> needs to be validated

#3939 #3754

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

Signed-off-by: André Kesser <andre.kesser@dkb.de>
@negz
Copy link
Copy Markdown
Member

negz commented Apr 14, 2023

Cross-linking #2646, which was a similar POC to this.

The poc does not cover the following

Unfortunately I don't think this POC is very useful unless it does cover those problems. A large part of the problem with filtering is that while folks believe on the surface that it's a simple change to make, the devil is very much in the details.

Any filtering implementation will also need to consider:

  • What is the default behavior when filtering is not configured?
    • It cannot be installing all CRDs - that would make the default user experience of installing upbound/provider-aws "your cluster breaks".
    • It also cannot be installing no CRDs - that would uninstall all your CRDs if you upgraded to a provider that supported filtering and didn't know to set the (optional) filtering field(s).
  • What happens if I configure filtering in Crossplane, but the provider has not been updated to be aware of filtering. Does it just not filter despite me asking it to? Is it okay that the default behavior of a provider that doesn't support filtering (installing all CRDs) differs from one that does?
  • What happens with dependencies? For example platform-ref-aws depends on provider-aws today because it wants to use RDS DatabaseInstance, EKS Clusters, and various EC2 networking types. Should that dependency be satisfied if provider-aws is installed, but not configured with these types enabled?

@nabuskey
Copy link
Copy Markdown

I tend to agree with Nic here around upgrading and updates to configuration. It is something we need to think about carefully and can get hairy very quickly. I think breaking up providers would be easier in that regard because it will likely be one-time significant change that users have to make. I still worry about potential impact to perception of Crossplane though.

@akesser
Copy link
Copy Markdown
Author

akesser commented Apr 15, 2023

What is the default behavior when filtering is not configured?

To solve this you can simply add the following to the documentation of crossplane and the in a similar form to the documentation of the provider:

Some bigger providers allow to reduce the number of applied crd when installing them. Look at the documentation of the provider you want to install if supports this feature.

To install only the subset of crds you need, filter them using the two properties excludeCrds and includeCrds. Leaving these properties empty will install all CRDs. You can limit the number of managed CRDs later on, but you have to remove them manually, if you decide to do so. We recommend to install providers that support filtering of crds without doing so only for testing. If you are not sure which resources you need, start with excluding all crds and add those you need later on by updating the applied provider yaml. Crossplane will install the now matching CRDs automatically and will update the deployment for the provider to setup the corresponding reconcilers correctly.

What happens if I configure filtering in Crossplane, but the provider has not been updated to be aware of filtering. Does it just not filter despite me asking it to? Is it okay that the default behavior of a provider that doesn't support filtering (installing all CRDs) differs from one that does?

The answer is simple: The provider is not able to start because the crds are not applied. the Errormessages you get will explain this quite good. This behavior is not different from other misconfigurations. It is the same if you miss to install one ore many of splitter providers to need to manage a resource, it will just not work because you misconfigured it.

To help a user to identify crds he can safely delete, because crossplane is no longer managing them due to to an updated provider yaml, the cli tool could be updated to apply the filter to the currently available crds and list them, perhaps by also give a warning if there are resources for the now no longer managed resources.

I think breaking up providers would be easier in that regard because it will likely be one-time significant change that users have to make.

Implementing filtering would be mean no change for users, the installed crossplane instance would work like before, only possibilities for the users are added.

@negz
Copy link
Copy Markdown
Member

negz commented Apr 17, 2023

To solve this you can simply add the following to the documentation of crossplane and the in a similar form to the documentation of the provider

I'm sorry, but "the most popular providers breaks by default unless you've read the documentation and know to supply optional configuration" seems like a really bad user experience to me. I know there are downsides to all options being considered here but this just seems like a non-starter level of bad UX to me.

The answer is simple: The provider is not able to start because the crds are not applied. the Errormessages you get will explain this quite good.

Can you show me what these error messages look like today? I'm pretty sure they're going to be something pretty obscure and non-obvious like:

main: error: Cannot start controller manager: failed to wait for managed/user.iam.aws.crossplane.io caches to sync: timed out waiting for cache to be synced

@jbw976
Copy link
Copy Markdown
Member

jbw976 commented Apr 18, 2023

"the most popular providers breaks by default unless you've read the documentation and know to supply optional configuration"

Even though filtering would be a more powerful and granular approach, I also support @negz perspective here that essentially "broken by default" is not a user experience we want to enable. That would unfortunately not be taking a step forward experience wise for a large portion of adopters IMHO 😿

I do of course though appreciate the initiative here to share your thoughts around the implementation and engaging with us in discussion, thank you for taking the time to do that! 🤩 🙇‍♂️

@negz
Copy link
Copy Markdown
Member

negz commented Apr 21, 2023

@akesser @haarchri I recently pushed e2227b6, which expands on all the things I believe a filtering design would need to address to be successful.

Please take a look and let me know what you'd like to do with this POC.

@dr460neye
Copy link
Copy Markdown

dr460neye commented Apr 21, 2023

Filtering should not be any precondition to configure and be an optional step.
This then would leave the possibility for users to choose if they rollout the full provider or only needed parts.
Anyhow, the current situation is, that this incredible nice concept is not usable by thousands of people, as providers are to heavy and enforce always to install everything.

e.g. i would suggest to split providers when they reach a specific size or provide other best practices in the provider development hints

@negz
Copy link
Copy Markdown
Member

negz commented Apr 21, 2023

Filtering should not be any precondition to configure and be an optional step.

@dr460neye please refer to #3939, where I discuss the constraints any filtering design would have to operate under at some length.

I don't think anyone is suggesting filtering be required. It must be optional. Making it required would be a breaking API change. One of the biggest challenges is what to do when a filter is not specified. "Just install all the CRDs" is not on the table as a solution. The default behaviour of our software cannot be breaking clusters due to installing too many CRDs.

@akesser
Copy link
Copy Markdown
Author

akesser commented Apr 22, 2023

I'm sorry, but "the most popular providers breaks by default unless you've read the documentation and know to supply optional configuration" seems like a really bad user experience to me. I know there are downsides to all options being considered here but this just seems like a non-starter level of bad UX to me.

This statement is simply not true, and I don't know where you have taken it from. Without the attributes for filtering simply all CRDs would be applied leading to the current behavior and state crossplane has now, that I definitely don't describe as broken. Applying no filtering to providers that support filtering would in my opinion speed up the setup of a provider for a test system, but should be considered when moving toward more mature usage of crossplane, and the documentation should reflect this. The "Getting Started" section should never be the only documentation read when running a software in production.

If the this POC is used with a provider that does not support filtering via the used environment variable, the error message shown in the provider is the default error message, kubernetes prints if a CRD is unknown when registering a reconciler and contains all the information to inform the user that a CRD is missing the provider needs.

ERROR	controller-runtime.source	if kind is a CRD, it should be installed before calling Start	{"kind": "Instance.ec2.aws.crossplane.io", "error": "no matches for kind \"Instance\" in version \"ec2.aws.crossplane.io/v1alpha1\""}

The problem of filtering out CRDs for providers that do not support this can easily circumvented, if the Provider yaml that is shipped alongside the CRDS of the provider inside the gz file that again is downloaded when installing a provider. A default annotation could be used to signal crossplane that the provider supports filtering, something like crossplane.io/crdfiltering: supported, and would enable crossplane to only print a warning and install all CRDs if the property is set for a provider that does not support filtering.

So "breaking by default" is clearly not part of this POC.

@negz
Copy link
Copy Markdown
Member

negz commented Apr 22, 2023

This statement is simply not true, and I don't know where you have taken it from. Without the attributes for filtering simply all CRDs would be applied leading to the current behavior and state crossplane has now, that I definitely don't describe as broken.

Today some very popular and widely used providers like upbound/provider-aws have up to 900 CRDs. These providers are often the first ones people try when they use Crossplane. Over time we expect that (unless we break them up) more and more providers will deliver a similar number of CRDs.

We know that installing upbound/provider-aws alone breaks some clusters. For example we've seen it break newly provisioned GKE clusters, making them completely unresponsive until they repair (a process that takes tens of minutes). I believe this is explained in the Background section of #3939.

We've also been hearing this from a lot of folks at Kubecon. "I installed Crossplane, I installed a provider, it degraded my cluster, so I decided Crossplane wasn't for me."

Under this proposal, all of the above would be true unless you configured a filter before you ever installed your provider. This is what I mean by "broken by default".

@negz
Copy link
Copy Markdown
Member

negz commented Jun 13, 2023

I'm going to close this for now since I don't think it's being actively worked on.

We're open to this feature being added if we can land on a design that satisfies the constraints we've identified (it's likely that may be easier once we don't have to worry about any one provider installing too many CRDs by default).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants