Proposal: Break Up Large Providers by Service#3939
Conversation
| * `upbound/provider-aws` - Becomes ~150 smaller providers. | ||
| * `upbound/provider-azure` - Becomes ~100 smaller providers. | ||
| * `upbound/provider-gcp` - Becomes ~75 smaller providers. |
There was a problem hiding this comment.
It's outside the scope of this design, but for the record we have commitment from folks at Upbound to update:
- Any large official providers we maintain (i.e. AWS, GCP, and Azure).
- The
upjettooling. - The Upbound marketplace
|
I'm happy to see a solution for the short-term. It seems like filtering is the ideal solution to shoot for, but one for the long-term (2.0?). It'd be great if it could be targeted for later though. For a multi-cloud or operator-heavy cluster I could see them hitting 500 CRDs pretty frequently, and I assume the number of supported resources per-provider is only going to continue growing in a lot of cases. |
Just to be explicit, filtering is not something I propose we support long term. Or rather, while I don't want to rule anything out long term I'm not proposing we break providers up as a stop-gap proposal. I'm proposing it because I think it's the best option.
Based on my projections and surveys of the community it doesn't seem that likely folks would hit 500 CRDs. FWIW the 500 number is also quite conservative (i.e. it's not like you hit 500 CRDs and everything starts to degrade immediately). |
Understood. And I agree it's the most pragmatic solution, but we'll have to see how painful the break-up turns out to be later on. Perhaps a way to make it more user-friendly would be to include a "recommended" meta-providerconfig with the most commonly used set of providers per cloud-provider? Or maybe some other way for a user to understand which of the provider configs are going to be required for a given configuration? I'm imagining it being pretty confusing for someone new to crossplane. |
Yeah, we considered something like that. I think the tricky part would be landing on what the most commonly used bits are (given a broad audience) and evolving that over time without it getting too bloated. One similar thing I am really excited about though is the fact that each org or team could put together their own meta-provider that has just the services they use. I could imagine starting small with 1-2 services and adding more over time as you learn what services you need. I'm pretty sure that if you're just doing this for your own org you'll almost certainly end up with more tightly scoped providers (i.e. fewer CRDs) than if we as a community tried to curate "one-size-fits-most" providers. |
|
if we take the
so we see that 126 crds are directly installed from the 3 splitted providers (ec2, iam, eks) - but we only use 15 of those crds - if we would use a crd filter mechanism like @blakebarnett mentioned we would end up with:
so you can see by splitting the providers we would install 111 CRDs that are not needed - so clearly this simple
so lets have a look in our companies crossplane installation we would have the following aws-providers installed after the split:
so we see that 303 crds are directly installed from the 25 splitted providers (...) - but we only use 97 of those crds - so you can see by splitting the providers we would install 206 CRDs that are not needed so at the moment i don't really see that the effort to split the providers will help to decrease the number of installed CRDs in a real world scenario this much ?! - also the effort needed by the operation teams to manage more (in our case 25) providers only to manage aws resources and the ability of crossplane to run smootly with this number of providers is not even part of this discussion - is someone from our community folks running crossplane installations with more then 10 or 20 providers ? |
|
In my opinion the numbers shown by @haarchri clearly support the filtering solution proposed by @blakebarnett. Only this approach would allow a decrease of the number installed crds by one order. Additional there would be no need to update the mechanism for resource cross references in the short therm, as all referenced resources live in the same provider. In short term the filtering could happen in crossplane-core by not applying crd as configured by the user when installing a provider. In the long term, if a more generic approach for resource cross references would be established, a provider could be updated to also ignore these resources, even if he knows them. In that way, also several pods could be used to spread the work by installing a provider several times with different configurations, eg. one pod only running reconcilers for *.ec2 resources, one only for *.rds resources and one for all other resources of the provider. I would not count the following in favor of splitting the providers
The proposed splitting sound like well designed for clusters one shows in live demos of talks with perhaps three resources and therefore threes (sub-) providers installed. In a real world cluster where the goal is to really move to IaC, this only adds the burden to manage more providers and to keep them in sync. I would favor to wait some months to implement a change in a reliable and correct way that eases the use of crossplane instead of complicating the path to a functioning installation by breaking providers up into pieces that in their majority are now working as standalone providers. |
|
@haarchri In the numbers you've crunched it seems to me that we're seeing quite a big improvement in the ratio of installed-to-used CRDs. This is also what I saw when I ran through similar scenarios. Today for example the official provider-aws has 900 CRDs, so:
These numbers seem well within the capabilities of Kubernetes. I would not expect to be seeing the API server struggle with excess compute resource consumption, and I would not expect Is the only acceptable result as far as you're concerned to completely optimize the ratio of installed-to-used CRDs? That is, to make sure Crossplane never installs a CRD that you don't intend to use? If so, I would ask why? Also, do you use other CRD-based tools? Do those tools allow you to leave out the CRDs you don't need? What about built-in Kubernetes types? I think based on #2869 you've previously answered my first question:
I'd love to hear more about it. Perhaps we should setup a call? My guess is you have fairly extreme defense-in-depth requirements? My thinking is:
At the end of the day if it's a hard requirement that Crossplane never installs a CRD that you don't want then yes the only tenable option is down-to-the-type level filtering (or compiling your own providers as you currently do). I wonder how many folks have this requirement. I believe "just adding filtering" is not as simple as it first sounds to most folks and in fact increases cognitive complexity for folks learning Crossplane quite a lot, so I think there should be a pretty high bar for doing it.
I had intended to cover this in the design - see for example discussion of ProviderConfigs, compute resources, etc. Are there other areas you'd like me to cover? If so, let me know. |
@akesser Can you help me understand why you need the number of installed CRDs to be so dramatically reduced?
It is unfortunately not that simple. I could expand on this in the existing "Alternatives Considered" section if that would help.
The design doc includes a survey of ~40 community members and their real usage - I would not be proposing something if I didn't think it would work in the real-world. It would be helpful if you could elaborate on what concerns you have around management burden. |
|
@negz I talked about decreasing the numbers of installed CRDs because you explicitly mentioned that most tools are designed to work well with clusters containing 50 to 100 CRDs and the suggested approach to split the providers does not decrease the numbers of CRDs in a way to reach this.
By having ten to hundred times more providers you have to keep them updated, you have to keep them running and you need more effort until you have the code on the cluster, e.g. by four eye principle for merging into your codebase |
Got it. A lot of work has been done to the ecosystem since then, so per the design the very conservative number is more like 500. (So probably more like 600, 700 in reality).
FWIW by my estimates a hundred times more providers would be quite a rare edge case. Isn't keeping pods running Kubernetes's bread and butter? I'm not arguing it's zero additional operational burden, but I'm not sure its meaningful. |
|
Putting technical details aside, how does the community feel about the risk? What if this approach does not ultimately solve the problem for one reason or another? If Kubernetes improves its CRD handling drastically and can scale CRDs to 100000, do these providers need to be kept broken down? Breaking up providers or going back to the single provider model in isolation is fine. However, taking recent changes to providers into consideration, I can't say with confidence it won't be a problem. For context, recent changes I am referring to are:
In my opinion these changes caused fragmentation and confusion in reference docs, blog posts, and examples. I understand providers themselves do not define what Crossplane is, but I worry what kind of impression people get about Crossplane. I'd love to hear others' thoughts here. |
As far as I'm aware there's only two feasible solutions to the problem at hand - breaking up providers or adding support for filtering the types providers install. In either of those paths we should consider a way to ship the feature/fix and get feedback before we fully commit. With filtering it would be the typical alpha feature lifecycle we use for Crossplane (complicated by the fact that the functionality needs to be added at both the core and provider level). So the feature would be behind a feature flag and off by default for some time to allow folks to try it out before we commit. If we found it didn't work well anyone using it would be disrupted when we removed it or changed in a breaking way. If we instead break up providers I could imagine publishing some 'alpha' service-scoped providers to let folks try them out I suspect given how many folks are impacted by the problem this proposal solves that a lot of folks will opt-in to the alpha (whether it be filtering or smaller providers) and thus a lot of folks will be disrupted if we found we didn't get it right up-front. It doesn't seem to me like either option is significantly less disruptive to "roll back". This topic is probably worth a section in the proposal - I'll add one.
This is an interesting thought experiment. I will say that given the multi-faceted nature of the issues and the slow, incremental pace of improvements I don't expect to see this be the case for at least 24 months. Keep in mind that we have pursued the "fix Kubernetes" approach for well over a year with limited success. I don't see any reason to believe things will change in the foreseeable future. |
|
Our Team created two PRs to show filtering of CRDs in crossplane-core and crossplane-contrib/provider-aws: #3987 Feedback is wellcome |
In particular that addressing any security concern is a non-goal. Signed-off-by: Nic Cope <nicc@rk0n.org>
I plan to expand on a few of these Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Adds a lot more detail on the constraints and nuances that may not be obvious to folks. I'm also explicit about why I don't recommend it; not because its hard but because it makes reasoning about Crossplane harder. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
We're going to hold off on this at first - it's a further optimization we can make (at the expense of more complexity) if/when needed. Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Nic Cope <nicc@rk0n.org>
- This PR implements the upstream proposal at: crossplane/crossplane#3939 - Subpackages belonging to each API group is produced. An example is: provider-gcp-cloudplatform. - ProviderConfig, ProviderConfigUsage and StoreConfig are part of a config package named provider-gcp-config. - The monolith package (containing all the CRDs and associated controllers) is still produced. - Each produced package except for the monolith package has the `pkg.crossplane.io/provider-family` label in its package metadata. - Each service package except for the config package declares a dependency to the config package. Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- This PR implements the upstream proposal at: crossplane/crossplane#3939 - Subpackages belonging to each API group is produced. An example is: provider-gcp-cloudplatform. - ProviderConfig, ProviderConfigUsage and StoreConfig are part of a config package named provider-gcp-config. - The monolith package (containing all the CRDs and associated controllers) is still produced. - Each produced package except for the monolith package has the `pkg.crossplane.io/provider-family` label in its package metadata. - Each service package except for the config package declares a dependency to the config package. Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- This PR implements the upstream proposal at: crossplane/crossplane#3939 - Subpackages belonging to each API group is produced. An example is: provider-gcp-cloudplatform. - ProviderConfig, ProviderConfigUsage and StoreConfig are part of a config package named provider-gcp-config. - The monolith package (containing all the CRDs and associated controllers) is still produced. - Each produced package except for the monolith package has the `pkg.crossplane.io/provider-family` label in its package metadata. - Each service package except for the config package declares a dependency to the config package. Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- This PR implements the upstream proposal at: crossplane/crossplane#3939 - Subpackages belonging to each API group is produced. An example is: provider-gcp-cloudplatform. - ProviderConfig, ProviderConfigUsage and StoreConfig are part of a config package named provider-gcp-config. - The monolith package (containing all the CRDs and associated controllers) is still produced. - Each produced package except for the monolith package has the `pkg.crossplane.io/provider-family` label in its package metadata. - Each service package except for the config package declares a dependency to the config package. Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- This PR implements the upstream proposal at: crossplane/crossplane#3939 - Subpackages belonging to each API group is produced. An example is: provider-gcp-cloudplatform. - ProviderConfig, ProviderConfigUsage and StoreConfig are part of a config package named provider-gcp-config. - The monolith package (containing all the CRDs and associated controllers) is still produced. - Each produced package except for the monolith package has the `pkg.crossplane.io/provider-family` label in its package metadata. - Each service package except for the config package declares a dependency to the config package. Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
crossplane#3939 The above design proposes we break up larger providers like provider-aws by service. We'd group these providers into a 'family' for two reasons: 1. So they could all share one ProviderConfig (and StoreConfig etc). 2. So they could cross-resource-reference each other. We believe 1 will always be true. Hopefully 2 will eventually go away with crossplane#1770. During testing we realised providers in the same family would need RBAC access to read all types in their family, i.e. to reference an MR or read a ProviderConfig. Signed-off-by: Nic Cope <nicc@rk0n.org> (cherry picked from commit 68a81f7)
crossplane#3939 The above design proposes we break up larger providers like provider-aws by service. We'd group these providers into a 'family' for two reasons: 1. So they could all share one ProviderConfig (and StoreConfig etc). 2. So they could cross-resource-reference each other. We believe 1 will always be true. Hopefully 2 will eventually go away with crossplane#1770. During testing we realised providers in the same family would need RBAC access to read all types in their family, i.e. to reference an MR or read a ProviderConfig. Signed-off-by: Nic Cope <nicc@rk0n.org> Signed-off-by: Andrew Chubatiuk <andrew.chubatiuk@motional.com>
crossplane#3939 The above design proposes we break up larger providers like provider-aws by service. We'd group these providers into a 'family' for two reasons: 1. So they could all share one ProviderConfig (and StoreConfig etc). 2. So they could cross-resource-reference each other. We believe 1 will always be true. Hopefully 2 will eventually go away with crossplane#1770. During testing we realised providers in the same family would need RBAC access to read all types in their family, i.e. to reference an MR or read a ProviderConfig. Signed-off-by: Nic Cope <nicc@rk0n.org>
Description of your changes
Fixes #3754
This design document proposes that the 6-7 largest Crossplane providers be broken down into smaller, service-scoped ones. This would help folks install fewer CRDs, thus improving the ratio of installed-to-used Crossplane CRDs. Installing fewer CRDs is necessary to workaround performance issues in the Kubernetes API server and Kubernetes clients.
I have:
make reviewableto ensure this PR is ready for review.backport release-x.ylabels to auto-backport this PR if necessary.How has this code been tested
I proof-read it. 😄