-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
What would you like added?
Currently, you can add annotations to every Pod in a RunnerDeployment by adding them to the RunnerDeployment Spec under
spec:
template:
metadata:
annotations:
I would like the ability to specify annotations to be added to Pods at the time the Pods are assigned jobs, so that idle Pods waiting for jobs do not have the same annotations as Pods running jobs.
Why is this needed?
Kubernetes cluster autoscaling solutions generally expect that a Pod runs a service that can be terminated on one Node and restarted on another with only a short duration needed to finish processing any in-flight requests. When the cluster is resized, the Cluster Autoscaler will do just that. However, GitHub Action Runner Jobs do not fit this model. If a Pod is terminated in the middle of a job, the job is lost. The likelihood of this happening is increased by the fact that the Action Runner Controller Autoscaler is expanding and contracting the size of the Runner Pool on a regular basis, causing the Cluster Autoscaler to more frequently want to scale up or scale down the EKS cluster, and, consequently, to move
Pods around.
In order to handle situations like this, cluster autoscalers typically allow Pods to indicate that they cannot be safely interrupted via an annotation. For the Kubernetes Cluster Autoscaler, you can add the annotation
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"For Karpenter, you can add the annotation
karpenter.sh/do-not-evict: "true"An annotation like this should be added to Pods running jobs, so that the job can finish.
However, we do not want this annotation on idle Pods waiting for jobs. Otherwise, the Cluster Autoscaler would be prevented from removing nodes where the idle Pods are waiting, which is exactly the opposite of what we want.
The obvious solution is to have the ARC add the annotation to the Pod once a job is assigned to it. In the case of persistent runners, the annotation should be removed once the job is finished.
Additional context
It is practically impossible to run very long jobs on a Runner which the Cluster Autoscaler can terminate and evict unless the cluster is very stable in its capacity. Currently the only acceptable solutions are:
- Set
minReplicas = 0and add the annotation to all pods, solving the problem by never leaving idle Pods deployed - Set up an ARC Autoscaler scheduled override to regularly drop
minReplicasto zero to allow the Cluster Autoscaler to reclaim the Node(s) the idle Pod(s) are on
These solutions are less desirable because of (1) the lack of idle Runners to pick up jobs quickly and (2) long periods of time where the Cluster Autoscaler is prevented from scaling down the cluster.