Write about best-practice high availability and scaling of cert-manager components#1330
Conversation
✅ Deploy Preview for cert-manager-website ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
7137207 to
f417916
Compare
Signed-off-by: Richard Wall <richard.wall@venafi.com>
785560a to
69ec316
Compare
| - maxSkew: 1 | ||
| topologyKey: kubernetes.io/hostname | ||
| whenUnsatisfiable: ScheduleAnyway | ||
| labelSelector: | ||
| matchLabels: | ||
| app.kubernetes.io/instance: cert-manager | ||
| app.kubernetes.io/component: webhook |
There was a problem hiding this comment.
I doubt this is actually necessary because the documentation says:
the scheduler automatically tries to spread the Pods in a ReplicaSet across nodes in a single-zone cluster (to reduce the impact of node failures, see kubernetes.io/hostname). With multiple-zone clusters, this spreading behavior also applies to zones (to reduce the impact of zone failures). This is achieved via SelectorSpreadPriority.
There was a problem hiding this comment.
I'm easy here, not having it is less YAML.
Having it means it's explicit.
There was a problem hiding this comment.
I think if we can verify that this works by default, we should not tell people to configure this (because we want to keep our docs as simple as possible). Instead, we can just link to the documentation you shared and say that this works correctly by default.
There was a problem hiding this comment.
I agree that according to the docs the desired anti-affinity scheduling should happen by default.
I've updated this paragraph.
| so as to reduce the load on the Kubernetes API server. | ||
|
|
||
| For example, if the cluster contains very many CertificateRequest resources, | ||
| you will need to increase the memory limit of the controller Pod. |
There was a problem hiding this comment.
I might say something about the memory optimizations for cainjector e.g.
Fix cainjector's --namespace flag. Users who want to prevent cainjector from reading all Secrets and Certificates in all namespaces (i.e to prevent excessive memory consumption) can now scope it to a single namespace using the --namespace flag. A cainjector that is only used as part of cert-manager installation only needs access to the cert-manager installation namespace. (#5694, @irbekrm)
-- https://deploy-preview-1330--cert-manager-website.netlify.app/docs/releases/release-notes/release-notes-1.11/
There was a problem hiding this comment.
Is it best practice to limit this to only the installation namespace?
There was a problem hiding this comment.
I suppose it's not technically "best practice" but I've added it anyway.
See what you think.
hawksight
left a comment
There was a problem hiding this comment.
Very well written, just a few minor notes to break up the text a little.
For reference, I just checked for the labels on my GKE nodes:
kubernetes.io/hostname=gke-demo-istio-5d390798-5v7i,kubernetes.io/os=linux,node.kubernetes.io/instance-type=e2-medium,topology.gke.io/zone=europe-west1-b,topology.kubernetes.io/region=europe-west1,topology.kubernetes.io/zone=europe-west1-b
So the labels chosen looks appropriate, although I have not checked any other managed offering.
| so as to reduce the load on the Kubernetes API server. | ||
|
|
||
| For example, if the cluster contains very many CertificateRequest resources, | ||
| you will need to increase the memory limit of the controller Pod. |
There was a problem hiding this comment.
Is it best practice to limit this to only the installation namespace?
| - maxSkew: 1 | ||
| topologyKey: kubernetes.io/hostname | ||
| whenUnsatisfiable: ScheduleAnyway | ||
| labelSelector: | ||
| matchLabels: | ||
| app.kubernetes.io/instance: cert-manager | ||
| app.kubernetes.io/component: webhook |
There was a problem hiding this comment.
I'm easy here, not having it is less YAML.
Having it means it's explicit.
Thanks @jsoref for correcting these mistakes Co-authored-by: Josh Soref <2119212+jsoref@users.noreply.github.com> Signed-off-by: Richard Wall <wallrj@users.noreply.github.com>
Thanks @hawksight. I agree. Co-authored-by: Peter Fiddes <hawksight@users.noreply.github.com> Signed-off-by: Richard Wall <wallrj@users.noreply.github.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
Link to Google GKE documentation which talks about webhook disruptions Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
…quate Signed-off-by: Richard Wall <richard.wall@venafi.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
Co-authored-by: Tim Ramlot <42113979+inteon@users.noreply.github.com> Signed-off-by: Richard Wall <wallrj@users.noreply.github.com>
Co-authored-by: Tim Ramlot <42113979+inteon@users.noreply.github.com> Signed-off-by: Richard Wall <wallrj@users.noreply.github.com>
Signed-off-by: Richard Wall <richard.wall@venafi.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hawksight, inteon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Richard Wall <richard.wall@venafi.com>
|
/lgtm |
Preview: https://deploy-preview-1330--cert-manager-website.netlify.app/docs/installation/best-practice/#high-availability
We've added various Helm chart values which allow users to configure cert-manager for HA and scalability,
but we've never documented any recommendations explaining how these settings should be used in production.
My original / ultimate plan was to change some of the Helm chart defaults to include useful default topology constraints,
as first suggested by @ThatsMrTalbot in a cert-manager dev meeting and discussed further in https://kubernetes.slack.com/archives/CDEQJ0Q8M/p1697561450000799
But first I want to document what we think the best practice settings are.