Skip to content

One-pager for CRD scaling discussions in Crossplane#2918

Merged
muvaf merged 11 commits intocrossplane:masterfrom
ulucinar:fix-2895
Apr 21, 2022
Merged

One-pager for CRD scaling discussions in Crossplane#2918
muvaf merged 11 commits intocrossplane:masterfrom
ulucinar:fix-2895

Conversation

@ulucinar
Copy link
Copy Markdown
Contributor

@ulucinar ulucinar commented Feb 21, 2022

Description of your changes

Fixes #2895

With this one-pager proposal, we would like to establish a common understanding in the Crossplane community on the issues around CRD scaling, and the paths to possible solutions to these issues.

I have:

  • Read and followed Crossplane's [contribution process].
  • Run make reviewable to ensure this PR is ready for review.
  • Added backport release-x.y labels to auto-backport this PR if necessary.

How has this code been tested

N.A.
[contribution process]: https://git.io/fj2m9

@ulucinar ulucinar marked this pull request as draft February 21, 2022 07:59
@ulucinar ulucinar force-pushed the fix-2895 branch 3 times, most recently from ad20bcb to e75a7e6 Compare February 21, 2022 14:25
Copy link
Copy Markdown
Member

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @ulucinar! Great start.

* Status: Draft

## Background
With the release of the [Terrajet](https://github.com/crossplane/terrajet) based providers, the Crossplane community has become more aware of some upstream scaling issues related to custom resource definitions. We did some early analysis such as [[1]] and [[2]] to get a better understanding of these issues and as we will discuss in more detail in the “Issues” section, the broader K8s community has already been aware of especially the client-side throttling problems for some time. It’s also not Crossplane alone. [Azure Service Operator](https://github.com/Azure/azure-service-operator), or [GCP Config Connector](https://github.com/GoogleCloudPlatform/k8s-config-connector) are projects that rely on Kubernetes [custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) as an extension mechanism and have many CRDs representing associated Cloud resources.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please wrap this markdown at 80 chars? In part because it's consistent with our other designs, and in part because it allows more 'targeted' comments in GitHub - currently it's only possible to comment on entire paragraphs (rather than a line within a paragraph) because each paragraph is one long line.

There's a few tools that can do this automatically - I use https://marketplace.visualstudio.com/items?itemName=stkb.rewrap

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I reached out to the ASO and ACK folks and that haven't seen these scaling issues yet, but I fully expect they will in time. ACK is somewhat insulated since they're bundles of smaller controllers, but I believe ASO and KCC take the some "one controller manager for all of a cloud's resources" approach we do.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Nic for the tool suggestion! It was very helpful. I have also removed most of the inline references.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropping by to say: @negz was correct: Azure/azure-service-operator#2920

## Background
With the release of the [Terrajet](https://github.com/crossplane/terrajet) based providers, the Crossplane community has become more aware of some upstream scaling issues related to custom resource definitions. We did some early analysis such as [[1]] and [[2]] to get a better understanding of these issues and as we will discuss in more detail in the “Issues” section, the broader K8s community has already been aware of especially the client-side throttling problems for some time. It’s also not Crossplane alone. [Azure Service Operator](https://github.com/Azure/azure-service-operator), or [GCP Config Connector](https://github.com/GoogleCloudPlatform/k8s-config-connector) are projects that rely on Kubernetes [custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) as an extension mechanism and have many CRDs representing associated Cloud resources.

Kubernetes is a complex ecosystem with many moving parts and we need a deeper understanding for the issues around scaling in the dimension of the total number of CRDs per-cluster. This dimension is not yet officially considered in the scalability thresholds document [[3]] but it will be with good probability. So, as the Crossplane community, we would like to have our use cases considered in relevant contexts, and we would like to gain a good understanding so that we can:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dimension is not yet officially considered in the scalability thresholds document [[3]] but it will be with good probability.

I think you asked upstream about this right? Is there an issue tracking getting CRDs added to that doc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet Nic. It would be good to have Crossplane use cases clarified to have them on the table for those discussions, and probably use them as a guideline in those discussions. Also, as you mentioned above, it would be great to involve ASO and KCC folks. There was a previous community survey asking for CRD use cases and I believe its results were incorporated in the GA scalability targets for CRDs. It looks like projects like Crossplane, ASO, KCC have more demanding new use cases which could be the motivation in those discussions.



### Client-side Throttling
`kubectl` maintains a discovery cache for the discovered server-side resources under the (default) filesystem path of `$HOME/.kube/cache/discovery/<host_port>/`. Here, `<host_port>` is a string derived from the API server host and the port number it’s listening on. An example path would be `$HOME/.kube/cache/discovery/exampleaks_8e092dad.hcp.eastus.azmk8s.io_443`, or `$HOME/.kube/cache/discovery/EB788B3B801893B684B4579B2ADF0171.gr7.us_east_1.eks.amazonaws.com`. Under this cache, we have the `servergroups.json` file, which is a JSON-serialized [`v1.APIGroupList`](https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1#APIGroupList) object. Thus, the cache file `$HOME/.kube/cache/discovery/<host_port>/servergroups.json` holds all of the discovered API GroupVersions (GVs) together with their preferred versions from that API service. And for each discovered API GroupVersion, we have a `serverresources.json` that caches metadata about the discovered resources under that GroupVersion, JSON-serialized as a [`v1.APIResourceList`](https://pkg.go.dev/k8s.io/apimachinery/pkg/apis/meta/v1#APIResourceList). This metadata about resources is crucial for various tasks, such as:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to note that that while fixing kubectl would be very impactful, we may want to go down a level and try get this fixed in client-go so that all (Go based) clients that do discovery are fixed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Cache busting for the discovery client is under discussion now. Maybe we can extend this one-pager by mentioning some options considered (cache busting, disabling client-side throttling, etc.).


## Action Items
- We can consider cherry-picking [[5]] to all active release branches `v1.23`, `v1.22`, `v1.21`, as the anticipated release date for the `v1.24` release is in April, 2022.
- Open issues regarding API service disruptions for managed control-planes (GKE regional, AKS, EKS, etc.), where we expect high-availability.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some folks on today's community meeting (namely @haarchri and Jillian Hill from Guidewire) mentioned that their AWS professional services teams have noticed increase resource usage even when EKS appears to be working well and reached out to investigate. It sounds like in some cases they're also seeing EKS clusters choke when installing providers with a lot of CRDs - one theory is that some regions have smaller EKS control planes.

Either way it sounds like we might be able to reach out either to our own contacts at AWS or through a company in the Crossplane community to get some insight into how EKS is handling this load.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great Nic. The support issue we opened for GKE regional clusters was not fruitful, the outcome was it works as expected. What we wanted to learn was basically the metrics GKE autoscalers are using when deciding to scale their managed control-planes up (CPU/memory usage/utilization of control-plane components, kube-apiserver or other component failures, # of CRDs, or some other SLIs, etc.?). One thing I still wonder is how SLAs come into play as Crossplane users install three big providers in their clusters. @AaronME, do you know if it's possible, in cloud consoles or by some other means (like a support ticket), we can inquiry about these SLAs?

@ulucinar ulucinar marked this pull request as ready for review February 25, 2022 21:49
@ulucinar
Copy link
Copy Markdown
Contributor Author

ulucinar commented Mar 2, 2022

release-1.23 cherry-pick for the kubectl configuration fix is here: kubernetes/kubernetes#108401

we should be careful.


### Client-side Throttling
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to very clearly state (maybe with a table?) which versions of kubectl are released (or targeting releases) that have all the client-side throttling fixes and we can recommend to the community?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. it's not super obvious to me (as a casual reader) which versions of kubectl (and also k8s-api) i would need to have a good experience with jet providers. Maybe we can make that super obvious at the top of the doc so folks coming to this document just looking for guidance on what versions to use to have a good experience can easily find that info without getting lost in the details. what do you think @ulucinar?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an executive summary section about the things we'll do be helpful? Maybe one-liners about the problems and links to the task issues, though that could be duplicate of Action Items

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good idea @jbw976. Added a kubectl version table with their release dates as a recommendation at the beginning of the Client-side Throttling section.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome @ulucinar, that table is really helpful!

do you think we should do the same for server side issues? e.g. so casual readers know with a quick glance what versions of k8s api-server have all the fixes and should perform well? that table could also be valuable :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jbw976. I've also added a table of kube-apiserver versions to the beginning of the API Server Resource Consumption section explicitly mentioning about the OpenAPI v2 spec lazy-marshaling change and the related upstream issues.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is super helpful @ulucinar, thank you for making this document more accessible for a broader audience!! 🙇‍♂️

@ulucinar ulucinar force-pushed the fix-2895 branch 3 times, most recently from 7377cf0 to 47c0ef0 Compare March 15, 2022 15:01
issues in kube-apiserver and possibly in other control-plane components.
- Initiate further discussions with Kubernetes [sig-scalability] community
regarding CRD-scalability and bring agreed-upon Crossplane scenarios into
their attention.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed previously, I would like to explore possibilities like an optional lazy serving for CRDs. I would love to discuss its feasibility upstream once we have an upstream issue framing the problem with server-side resource consumption. Until we have that issue where we can comment as a possible solution, copying the proposal here just to capture it somewhere public.

What you guys thinking about proposing an upstream Kubernetes change for lazy serving CRDs?

  1. Introduce an optional lazyServe field to CRD spec next to the existing served field.
  2. Extend existing notFoundHandler in a way that it checks if there is a non-served CRD for the endpoint that was hit and if yes, set served true on that CRD and return 503 (or do some trick, wait a bit and redirect to the same url).

On Crossplane and controller side, we would also need to change how we are starting the controllers. Instead of starting all at once, we would also need to watch CRDs and only start a controller of a type when its CRDs are served.

Assuming most clients already have built-in retries for 503 errors, this could provide the following user experience for Crossplane:

  1. Install providers with any number of CRDs.
  2. No significant load on the system since none of the CRDs served.
  3. A CRD will be started to be served only after the user creates a resource.

It is a bit assertive but it sounds feasible based on my thought experiments so far.

`kubectl` behaves when there is a large number of CRDs installed in the
cluster and how the discovery cache and the discovery client affect the
perceived performance of the `kubectl` commands run.
- Server-side issues: We also have some observations on the control-plane
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have a short section about the open api aggregation problem that was fixed a few months earlier? Just a summary of the problem and link to the PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @muvaf,
Thank you for the comments.
I have extended the paragraph where we discuss this change with more details.

we should be careful.


### Client-side Throttling
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an executive summary section about the things we'll do be helpful? Maybe one-liners about the problems and links to the task issues, though that could be duplicate of Action Items

improves as the burstiness and the fill rate of the token bucket rate limiter
are increased in `kubectl@v1.24`.
2. Installation of multiple providers, such as `provider-jet-{aws,gcp,azure}`
preview editions, into the same cluster results in ~370 GVs being served by
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we bump the current burst limit from 300 to 500, do you think this goal would be achieved? If so, I think we can set this as a goal and track bumping that number in client-go and kubectl as an easy-to-get change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the Client-side Throttling section, experiments done with a custom build of kubectl that allows us to specify discovery-client's tbrl parameters reveal that, with tbrl(b=400, r=50.0 qps), client-side throttling is no longer a bottleneck for our current GV counts (despite a delay of ~18s. on a cold cache). This is subject to change if the number of GVs increase because of different combinations of providers or because of API regroupings etc. However, the upstream community sees bumping these limits to cover certain use-cases as increasing the debt ceiling.

```


## Criteria Set for Ideal State
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep this section short and precise with exact goals and then give details in another section? For example, two main goals are listed, so having their technical definition in one sentence or in a table may make it easier to understand at a first glance. Or example use scenarios could be helpful as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a table of benchmark provider installation scenarios summarizing the criteria set under the Criteria Set for Ideal State section.

- Initiate further discussions with Kubernetes [sig-scalability] community
regarding CRD-scalability and bring agreed-upon Crossplane scenarios into
their attention.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can add client-go change here, too, as an action item.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new action item for bumping the burst limit of the default discovery client in client-go.

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…ree provider clusters

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- Remove inline references

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
- Add OpenAPI v2 spec lazy-marshaling fix versions table

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…the criteria set

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
…to 300

Signed-off-by: Alper Rifat Ulucinar <ulucinar@users.noreply.github.com>
Copy link
Copy Markdown
Member

@muvaf muvaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great! @ulucinar thank you for tackling such a problem that expands to multiple components in the upstream and tell the story in a consumable way. This is really helpful for both Crossplane and Kubernetes communities!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consolidation of CRD Scaling Issues

6 participants