WIP: DRA e2e: instructions for setting up cluster with autoscaler support#123078
WIP: DRA e2e: instructions for setting up cluster with autoscaler support#123078pohly wants to merge 1 commit intokubernetes:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pohly The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| + scheduler: | ||
| + extraArgs: | ||
| + feature-gates: DynamicResourceAllocation=true,ContextualLogging=true | ||
| + # TODO: enable features in kubelet |
There was a problem hiding this comment.
Should be solved by just adding kubeletExtraArgs to the initConfiguration (and joinConfiguration?) below?
| ! # Currently ignored and/or overwritten? | ||
| ! # /var/run/kubeadm/kubeadm.yaml on the control plane container doesn't have it. |
There was a problem hiding this comment.
This works (at least for me) if you just patch the quick-start-control-plane KubeadmControlPlaneTemplate. So these comments could be dropped?
There was a problem hiding this comment.
Yes, seems to work now. Removed.
| - machinePools: | ||
| - - class: default-worker | ||
| - name: mp-0 | ||
| - replicas: 1 |
There was a problem hiding this comment.
I think this should be dropped (so that the autoscaler knows that it can control the field)
| - replicas: 1 |
There was a problem hiding this comment.
I am removing the entire "machinePools" section here because that is for a pool with an experimental API. What we want is just the "machineDeployments". This is what is left after patching:
workers:
machineDeployments:
- class: default-worker
name: md-0
replicas: 1
There was a problem hiding this comment.
There was a problem hiding this comment.
@pohly sorry my comment was off-by-a-few-lines. I think the replicas field should be removed for the machineDeployments
There was a problem hiding this comment.
As it stands, the cluster comes up with one worker node. Then autoscaler can scale up or down. I prefer that over not bringing up any worker node initially because then a problem with that only surfaces later.
Your comment was "so that the autoscaler knows that it can control the field" - I don't think that setting the initial value prevents that.
There was a problem hiding this comment.
IIRC in my testing it did. When the autoscaler sees that it's set it determines that it's controlled by some other entity and refuses to act on it. It default to 1 (if not set), IIRC
There was a problem hiding this comment.
You are right, the instructions are not enough to actually make the autoscaler do anything. It finds no node groups. But did you really get it to work as you suggested above?
What is missing is something else, the annotations on the MachineDeployment:
https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#enabling-autoscaling
@sbueringer: any suggestion how to get those annotations added automatically to the MachineDeployment?
Do we need --node-group-auto-discovery=clusterapi:clusterName=capi-quickstart or is it enabled by default?
https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#configuring-node-group-auto-discovery says "you must configure node group auto discovery" but https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#enabling-autoscaling says "The autoscaler will monitor any MachineSet, MachineDeployment, or MachinePool containing both of these annotations."
When the cluster comes up after following the instructions in this README.md, it has a generated machine deployment with a variable name. I could come up with a kubectl invocation that adds the annotations, but it would be nicer to include that in the cluster configuration.
Also, does the cloud provider in use here (Docker, from clusterctl init --infrastructure docker) support scale from zero automatically? https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/autoscaling#scale-from-zero-support documents some additional annotations, but it is not clear if they are needed.
There was a problem hiding this comment.
@sbueringer: any suggestion how to get those annotations added automatically to the MachineDeployment?
You can patch:
- class: default-worker
name: md-0
replicas: 1to be this instead:
- class: default-worker
name: md-0
metadata:
annotations:
cluster.x-k8s.io/cluster-api-autoscaler-node-group-min-size: "1"
cluster.x-k8s.io/cluster-api-autoscaler-node-group-max-size: "3"The result is:
- the annotations will be set on the MachineDeployment
- the MachineDeployment webhook will pick the value of min-size as initial value for MD.spec.replicas
If you are setting replicas and the autoscaler annotations here, the CAPI controller and the autoscaler would both try to write the MD.spec.replicas field continuously.
Do we need --node-group-auto-discovery=clusterapi:clusterName=capi-quickstart or is it enabled by default?
I'm not sure if you have to set --node-group-auto-discovery. In our e2e test we do it, but we also have other tests running in other namespaces. You can give it a try without it.
Also, does the cloud provider in use here (Docker, from clusterctl init --infrastructure docker) support scale from zero automatically?
CAPD falls under:
If your Cluster API provider does not have support for scaling from zero, you may still use this feature through the capacity annotations.
So you'll have to set the annotations. Also in CAPD you can't really define via DockerMachine spec which size your "Machine" has like e.g. in AWS. But you can check on the created Nodes via Nodes.status.capacity what the capacity of a Node is. Not really sure where that is coming from though.
There was a problem hiding this comment.
Reminder to self: this still needs to be included in the instructions.
|
/triage accepted |
|
/assign @SergeyKanzhelev |
|
/assign @bart0sh |
|
@pohly please address review comments, thanks |
|
@pohly still has 3 comments that needs response |
These instructions will be useful for developers who want to run Kubernetes with DRA or other experimental features enabled in a cluster that supports autoscaling.
ec6e2b2 to
7c6c17c
Compare
|
Answered on the thread. Otherwise lgtm as far as I can tell |
| # The control plane won’t be Ready until we install a CNI in the next step. | ||
|
|
||
| $ KUBECONFIG=capi-quickstart.kubeconfig kubectl --kubeconfig=./capi-quickstart.kubeconfig \ | ||
| apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/calico.yaml |
There was a problem hiding this comment.
you should use kindnetd is much more stable kubernetes-sigs/cluster-api@d0c495a and consume less resources
There was a problem hiding this comment.
While we're here. Thx for maintaining kind & kindnet. It's a very nice and stable foundation to build our own testing in CAPI on. Safes us so much time & effort.
There was a problem hiding this comment.
kind is Ben baby ,
appreciate the compliments ❤️
|
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
|
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
|
@k8s-triage-robot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What type of PR is this?
/kind documentation
What this PR does / why we need it:
These instructions will be useful for developers who want to run Kubernetes with DRA or other experimental features enabled in a cluster that supports autoscaling.
Does this PR introduce a user-facing change?
/assign @marquiz
You were already using these instructions. Okay to merge?