One-pager for CRD scaling discussions in Crossplane#2918
One-pager for CRD scaling discussions in Crossplane#2918muvaf merged 11 commits intocrossplane:masterfrom
Conversation
ad20bcb to
e75a7e6
Compare
design/one-pager-crd-scaling.md
Outdated
| * Status: Draft | ||
|
|
||
| ## Background | ||
| With the release of the [Terrajet](https://github.com/crossplane/terrajet) based providers, the Crossplane community has become more aware of some upstream scaling issues related to custom resource definitions. We did some early analysis such as [[1]] and [[2]] to get a better understanding of these issues and as we will discuss in more detail in the “Issues” section, the broader K8s community has already been aware of especially the client-side throttling problems for some time. It’s also not Crossplane alone. [Azure Service Operator](https://github.com/Azure/azure-service-operator), or [GCP Config Connector](https://github.com/GoogleCloudPlatform/k8s-config-connector) are projects that rely on Kubernetes [custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) as an extension mechanism and have many CRDs representing associated Cloud resources. |
There was a problem hiding this comment.
Could you please wrap this markdown at 80 chars? In part because it's consistent with our other designs, and in part because it allows more 'targeted' comments in GitHub - currently it's only possible to comment on entire paragraphs (rather than a line within a paragraph) because each paragraph is one long line.
There's a few tools that can do this automatically - I use https://marketplace.visualstudio.com/items?itemName=stkb.rewrap
There was a problem hiding this comment.
FWIW I reached out to the ASO and ACK folks and that haven't seen these scaling issues yet, but I fully expect they will in time. ACK is somewhat insulated since they're bundles of smaller controllers, but I believe ASO and KCC take the some "one controller manager for all of a cloud's resources" approach we do.
There was a problem hiding this comment.
Thanks Nic for the tool suggestion! It was very helpful. I have also removed most of the inline references.
There was a problem hiding this comment.
Dropping by to say: @negz was correct: Azure/azure-service-operator#2920
design/one-pager-crd-scaling.md
Outdated
| ## Background | ||
| With the release of the [Terrajet](https://github.com/crossplane/terrajet) based providers, the Crossplane community has become more aware of some upstream scaling issues related to custom resource definitions. We did some early analysis such as [[1]] and [[2]] to get a better understanding of these issues and as we will discuss in more detail in the “Issues” section, the broader K8s community has already been aware of especially the client-side throttling problems for some time. It’s also not Crossplane alone. [Azure Service Operator](https://github.com/Azure/azure-service-operator), or [GCP Config Connector](https://github.com/GoogleCloudPlatform/k8s-config-connector) are projects that rely on Kubernetes [custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) as an extension mechanism and have many CRDs representing associated Cloud resources. | ||
|
|
||
| Kubernetes is a complex ecosystem with many moving parts and we need a deeper understanding for the issues around scaling in the dimension of the total number of CRDs per-cluster. This dimension is not yet officially considered in the scalability thresholds document [[3]] but it will be with good probability. So, as the Crossplane community, we would like to have our use cases considered in relevant contexts, and we would like to gain a good understanding so that we can: |
There was a problem hiding this comment.
This dimension is not yet officially considered in the scalability thresholds document [[3]] but it will be with good probability.
I think you asked upstream about this right? Is there an issue tracking getting CRDs added to that doc?
There was a problem hiding this comment.
Not yet Nic. It would be good to have Crossplane use cases clarified to have them on the table for those discussions, and probably use them as a guideline in those discussions. Also, as you mentioned above, it would be great to involve ASO and KCC folks. There was a previous community survey asking for CRD use cases and I believe its results were incorporated in the GA scalability targets for CRDs. It looks like projects like Crossplane, ASO, KCC have more demanding new use cases which could be the motivation in those discussions.
design/one-pager-crd-scaling.md
Outdated
|
|
||
|
|
||
| ### Client-side Throttling | ||
| `kubectl` maintains a discovery cache for the discovered server-side resources under the (default) filesystem path of `$HOME/.kube/cache/discovery/<host_port>/`. Here, `<host_port>` is a string derived from the API server host and the port number it’s listening on. An example path would be `$HOME/.kube/cache/discovery/exampleaks_8e092dad.hcp.eastus.azmk8s.io_443`, or `$HOME/.kube/cache/discovery/EB788B3B801893B684B4579B2ADF0171.gr7.us_east_1.eks.amazonaws.com`. Under this cache, we have the `servergroups.json` file, which is a JSON-serialized [`v1.APIGroupList`](https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1#APIGroupList) object. Thus, the cache file `$HOME/.kube/cache/discovery/<host_port>/servergroups.json` holds all of the discovered API GroupVersions (GVs) together with their preferred versions from that API service. And for each discovered API GroupVersion, we have a `serverresources.json` that caches metadata about the discovered resources under that GroupVersion, JSON-serialized as a [`v1.APIResourceList`](https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1#APIResourceList). This metadata about resources is crucial for various tasks, such as: |
There was a problem hiding this comment.
I want to note that that while fixing kubectl would be very impactful, we may want to go down a level and try get this fixed in client-go so that all (Go based) clients that do discovery are fixed.
There was a problem hiding this comment.
Agreed. Cache busting for the discovery client is under discussion now. Maybe we can extend this one-pager by mentioning some options considered (cache busting, disabling client-side throttling, etc.).
design/one-pager-crd-scaling.md
Outdated
|
|
||
| ## Action Items | ||
| - We can consider cherry-picking [[5]] to all active release branches `v1.23`, `v1.22`, `v1.21`, as the anticipated release date for the `v1.24` release is in April, 2022. | ||
| - Open issues regarding API service disruptions for managed control-planes (GKE regional, AKS, EKS, etc.), where we expect high-availability. |
There was a problem hiding this comment.
Some folks on today's community meeting (namely @haarchri and Jillian Hill from Guidewire) mentioned that their AWS professional services teams have noticed increase resource usage even when EKS appears to be working well and reached out to investigate. It sounds like in some cases they're also seeing EKS clusters choke when installing providers with a lot of CRDs - one theory is that some regions have smaller EKS control planes.
Either way it sounds like we might be able to reach out either to our own contacts at AWS or through a company in the Crossplane community to get some insight into how EKS is handling this load.
There was a problem hiding this comment.
That would be great Nic. The support issue we opened for GKE regional clusters was not fruitful, the outcome was it works as expected. What we wanted to learn was basically the metrics GKE autoscalers are using when deciding to scale their managed control-planes up (CPU/memory usage/utilization of control-plane components, kube-apiserver or other component failures, # of CRDs, or some other SLIs, etc.?). One thing I still wonder is how SLAs come into play as Crossplane users install three big providers in their clusters. @AaronME, do you know if it's possible, in cloud consoles or by some other means (like a support ticket), we can inquiry about these SLAs?
|
|
1735f4b to
c5986bb
Compare
| we should be careful. | ||
|
|
||
|
|
||
| ### Client-side Throttling |
There was a problem hiding this comment.
is it possible to very clearly state (maybe with a table?) which versions of kubectl are released (or targeting releases) that have all the client-side throttling fixes and we can recommend to the community?
There was a problem hiding this comment.
e.g. it's not super obvious to me (as a casual reader) which versions of kubectl (and also k8s-api) i would need to have a good experience with jet providers. Maybe we can make that super obvious at the top of the doc so folks coming to this document just looking for guidance on what versions to use to have a good experience can easily find that info without getting lost in the details. what do you think @ulucinar?
There was a problem hiding this comment.
Would an executive summary section about the things we'll do be helpful? Maybe one-liners about the problems and links to the task issues, though that could be duplicate of Action Items
There was a problem hiding this comment.
I think it's a good idea @jbw976. Added a kubectl version table with their release dates as a recommendation at the beginning of the Client-side Throttling section.
There was a problem hiding this comment.
awesome @ulucinar, that table is really helpful!
do you think we should do the same for server side issues? e.g. so casual readers know with a quick glance what versions of k8s api-server have all the fixes and should perform well? that table could also be valuable :)
There was a problem hiding this comment.
Thanks @jbw976. I've also added a table of kube-apiserver versions to the beginning of the API Server Resource Consumption section explicitly mentioning about the OpenAPI v2 spec lazy-marshaling change and the related upstream issues.
There was a problem hiding this comment.
that is super helpful @ulucinar, thank you for making this document more accessible for a broader audience!! 🙇♂️
7377cf0 to
47c0ef0
Compare
design/one-pager-crd-scaling.md
Outdated
| issues in kube-apiserver and possibly in other control-plane components. | ||
| - Initiate further discussions with Kubernetes [sig-scalability] community | ||
| regarding CRD-scalability and bring agreed-upon Crossplane scenarios into | ||
| their attention. |
There was a problem hiding this comment.
As we discussed previously, I would like to explore possibilities like an optional lazy serving for CRDs. I would love to discuss its feasibility upstream once we have an upstream issue framing the problem with server-side resource consumption. Until we have that issue where we can comment as a possible solution, copying the proposal here just to capture it somewhere public.
What you guys thinking about proposing an upstream Kubernetes change for lazy serving CRDs?
- Introduce an optional
lazyServefield to CRD spec next to the existingservedfield.- Extend existing notFoundHandler in a way that it checks if there is a non-served CRD for the endpoint that was hit and if yes, set
servedtrue on that CRD and return 503 (or do some trick, wait a bit and redirect to the same url).On Crossplane and controller side, we would also need to change how we are starting the controllers. Instead of starting all at once, we would also need to watch CRDs and only start a controller of a type when its CRDs are served.
Assuming most clients already have built-in retries for 503 errors, this could provide the following user experience for Crossplane:
- Install providers with any number of CRDs.
- No significant load on the system since none of the CRDs served.
- A CRD will be started to be served only after the user creates a resource.
It is a bit assertive but it sounds feasible based on my thought experiments so far.
| `kubectl` behaves when there is a large number of CRDs installed in the | ||
| cluster and how the discovery cache and the discovery client affect the | ||
| perceived performance of the `kubectl` commands run. | ||
| - Server-side issues: We also have some observations on the control-plane |
There was a problem hiding this comment.
Would it make sense to have a short section about the open api aggregation problem that was fixed a few months earlier? Just a summary of the problem and link to the PR.
There was a problem hiding this comment.
Hi @muvaf,
Thank you for the comments.
I have extended the paragraph where we discuss this change with more details.
| we should be careful. | ||
|
|
||
|
|
||
| ### Client-side Throttling |
There was a problem hiding this comment.
Would an executive summary section about the things we'll do be helpful? Maybe one-liners about the problems and links to the task issues, though that could be duplicate of Action Items
| improves as the burstiness and the fill rate of the token bucket rate limiter | ||
| are increased in `kubectl@v1.24`. | ||
| 2. Installation of multiple providers, such as `provider-jet-{aws,gcp,azure}` | ||
| preview editions, into the same cluster results in ~370 GVs being served by |
There was a problem hiding this comment.
If we bump the current burst limit from 300 to 500, do you think this goal would be achieved? If so, I think we can set this as a goal and track bumping that number in client-go and kubectl as an easy-to-get change.
There was a problem hiding this comment.
As discussed in the Client-side Throttling section, experiments done with a custom build of kubectl that allows us to specify discovery-client's tbrl parameters reveal that, with tbrl(b=400, r=50.0 qps), client-side throttling is no longer a bottleneck for our current GV counts (despite a delay of ~18s. on a cold cache). This is subject to change if the number of GVs increase because of different combinations of providers or because of API regroupings etc. However, the upstream community sees bumping these limits to cover certain use-cases as increasing the debt ceiling.
| ``` | ||
|
|
||
|
|
||
| ## Criteria Set for Ideal State |
There was a problem hiding this comment.
Can we keep this section short and precise with exact goals and then give details in another section? For example, two main goals are listed, so having their technical definition in one sentence or in a table may make it easier to understand at a first glance. Or example use scenarios could be helpful as well.
There was a problem hiding this comment.
I've added a table of benchmark provider installation scenarios summarizing the criteria set under the Criteria Set for Ideal State section.
| - Initiate further discussions with Kubernetes [sig-scalability] community | ||
| regarding CRD-scalability and bring agreed-upon Crossplane scenarios into | ||
| their attention. | ||
|
|
There was a problem hiding this comment.
I think we can add client-go change here, too, as an action item.
There was a problem hiding this comment.
I've added a new action item for bumping the burst limit of the default discovery client in client-go.
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…ree provider clusters Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- Remove inline references Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- Add OpenAPI v2 spec lazy-marshaling fix versions table Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…the criteria set Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…to 300 Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Description of your changes
Fixes #2895
With this one-pager proposal, we would like to establish a common understanding in the Crossplane community on the issues around CRD scaling, and the paths to possible solutions to these issues.
I have:
make reviewableto ensure this PR is ready for review.backport release-x.ylabels to auto-backport this PR if necessary.How has this code been tested
N.A.
[contribution process]: https://git.io/fj2m9