Add option to disable pod affinity#235
Conversation
|
Thanks! fwiw, I had the exact same implementation in my fork :) |
|
This would help us a lot in our setup as well, as without it breaks our GPU runners whenever there is no availability of GPUs on the node that the job pod runs (even though there are GPUs available on other nodes). |
|
This is exactly what we need ! :D Currently, we have the problem that the runner pods are created as expected, but we always have to wait for the workflow pods. Since the introduction of node affinity, our nodes do not have enough resources to process every workflow pod for every runner pod. The change with node affinity makes using the Kube Scheduler pointless for us. We deliberately chose this path so that we could use smaller nodes and scale the number of nodes to save costs. However, since this PR has been open for more than a month, I wonder if it is realistic to expect this change to be considered in the near future. I don't know who is responsible for this maybe, @nikola-jokic, but it would be nice to see some feedback on this PR |
|
Hey everyone, we are currently working on a PR that will disable the affinity and volume mounts completely. |
|
@nikola-jokic, thank you for the update. I didn't know you were working on something like this, and I'm very excited to see how it develops. I think it's a nice idea. |
|
@nikola-jokic! thanks for working on this. Do you have a rough timeline for the fix to be implemented? |
|
Hey @jennyluciav, The target date is Okt 13th, but it might be sooner. Most of the work has been done, and I'd love to test it a bit more to make sure everything works at least for most cases. I'm a bit worried about permissions, since with volume mounts, we could easily apply it to any directory while with copy, there are certain folders where it might result in an error, but that is mostly for user volume mounts that are likely not frequent. |
|
@nikola-jokic @Wielewout Why was this pull-request closed, is the work being moved to another PR ? I also have the problem where I'm using RWX for the work volume and I need the runner and workflow pod to be able to schedule to different nodes to fit with the resource requests they are attributed. |
This PR became obsolete because of #244. Instead of using a volume in both the runner and workflow pod, the runner will copy files to the workflow pod. This also removes the requirement to keep both pods on the same node. |
|
Did anyone get #244 in a working state? |
|
I still think the possibility should be added to disable nodeAffinity. The solution in #244 does not support all use cases. For example:
Any chance we can get this PR reopened? |
|
Hey @vvanouytsel , we can't keep two parallel versions of the hook at the same time. Since the implementation is 0.8.0 is not heavily tested on every environment, we applied both versions on the runner, so you can fallback to the 0.7.0 version of the hook. |
In #212 pod affinity was added when the kube scheduler is enabled. While a way better default, it makes less optimal use of resources in the cluster (or even breaks? some setups #201 (comment)).
By default pod affinity remains set, but by setting
ACTIONS_RUNNER_USE_POD_AFFINITY=falsepod affinity rules will be skipped.When disabling, the runner and workflow pod can then be scheduled on different nodes again. It is up to the user though to support RWX volumes in the cluster, a node selector for architecture if using a multi-arch cluster (on both the runner and workflow pod so they match), ...