Add ability to add annotations to Runner Pods once they start running a job

### What would you like added?

Currently, you can add annotations to every Pod in a RunnerDeployment by adding them to the RunnerDeployment Spec under 

```
spec:
  template:
    metadata:
      annotations:
```

I would like the ability to specify annotations to be added to Pods at the time the Pods are assigned jobs, so that idle Pods waiting for jobs do not have the same annotations as Pods running jobs.

### Why is this needed?

Kubernetes cluster autoscaling solutions generally expect that a Pod runs a service that can be terminated on one Node and restarted on another with only a short duration needed to finish processing any in-flight requests. When the cluster is resized, the Cluster Autoscaler will do just that. However, GitHub Action Runner Jobs do not fit this model. If a Pod is terminated in the middle of a job, the job is lost. The likelihood of this happening  is increased by the fact that the Action Runner Controller Autoscaler is expanding and contracting the size of the Runner Pool on a regular basis, causing the Cluster Autoscaler to more frequently want to scale up or scale down the EKS cluster, and, consequently, to move
Pods around.

In order to handle situations like this, cluster autoscalers typically allow Pods to indicate that they cannot be safely interrupted via an annotation. For the [Kubernetes Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler), you can [add the annotation](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node)

```yaml
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
```

For [Karpenter](https://karpenter.sh/), you can add the [annotation](https://karpenter.sh/v0.27.3/concepts/deprovisioning/#pod-eviction)

```yaml
karpenter.sh/do-not-evict: "true"
```

An annotation like this should be added to Pods running jobs, so that the job can finish. 

However, we do not want this annotation on idle Pods waiting for jobs. Otherwise, the Cluster Autoscaler would be prevented from removing nodes where the idle Pods are waiting, which is exactly the opposite of what we want.

The obvious solution is to have the ARC add the annotation to the Pod once a job is assigned to it. In the case of persistent runners, the annotation should be removed once the job is finished.


### Additional context

It is practically impossible to run very long jobs on a Runner which the Cluster Autoscaler can terminate and evict unless the cluster is very stable in its capacity. Currently the only acceptable solutions are:

1. Set `minReplicas =  0` and add the annotation to all pods, solving the problem by never leaving idle Pods deployed
2. Set up an ARC Autoscaler [scheduled override](https://github.com/actions/actions-runner-controller/blob/master/docs/automatically-scaling-runners.md#scheduled-overrides) to regularly drop `minReplicas` to zero to allow the Cluster Autoscaler to reclaim the Node(s) the idle Pod(s) are on

These solutions are less desirable because of (1) the lack of idle Runners to pick up jobs quickly and (2) long periods of time where the Cluster Autoscaler is prevented from scaling down the cluster.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to add annotations to Runner Pods once they start running a job #2562

What would you like added?

Why is this needed?

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add ability to add annotations to Runner Pods once they start running a job #2562

Description

What would you like added?

Why is this needed?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions