Pass worker resources to pod args#398
Pass worker resources to pod args#398jacobtomlinson merged 2 commits intodask:mainfrom jhamman:feature/worker-resources
Conversation
jacobtomlinson
left a comment
There was a problem hiding this comment.
Neat thanks @jhamman.
It would be awesome to sync up some time and chat about what you're working on. Feel free to drop something into my calendar!
|
I was surprised to see this kwarg on the docs (btw it says 2021.03 🤔) but not available on my machine ( I think it is needed for dask to understand our GPU setup. This would be our setup to schedule on a GPU machine managed by kOps: dask_kubernetes.make_pod_spec(
image="rapidsai/rapidsai:cuda11.5-runtime-ubuntu20.04-py3.8",
# memory_limit="15Gi", # not needed, dask-worker will auto-detect available RAM. using cpu requests to reserve a whole g4dn.xlarge
cpu_request="3800m", # 200m reservation by kube-system daemonsets
resources="GPU=1", # http://distributed.dask.org/en/stable/resources.html
threads_per_worker=1, # could try to increase with risk of OOM when multiple threads try to load different models into GPU
extra_pod_config={
"tolerations": [
{"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}
],
"nodeSelector": {"kops.k8s.io/gpu": "1"}, # kOps only https://github.com/kubernetes/kops/blob/v1.22.4/docs/gpu.md
},
)For non-kOps clusters, the nodeSelector is probably missing and you'd have to hack in dask_kubernetes.make_pod_spec(
image="rapidsai/rapidsai:cuda11.5-runtime-ubuntu20.04-py3.8",
# memory_limit="15Gi", # not needed, dask-worker will auto-detect available RAM. using cpu requests to reserve a whole g4dn.xlarge
# cpu_request="3800m", # 200m reservation by kube-system daemonsets
extra_container_config={
"resources": {"limits": {"nvidia.com/gpu": "1"}, "requests": {"cpu": "3800m"}},
}, # https://v1-22.docs.kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
resources="GPU=1", # http://distributed.dask.org/en/stable/resources.html
threads_per_worker=1, # could try to increase with risk of OOM when multiple threads try to load different models into GPU
extra_pod_config={
"tolerations": [
{"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}
],
},
) |
|
Thanks @ddelange. Looks like the version number on the docs is broken, it's technically correct because we are 38 commits ahead of Thanks for spotting the docs issues. Please feel free to raise a PR to correct it. |
|
Yeah it must be a versioneer issue in RTD. We intentionally chose to point |





This adds a new keyword argument to
make_pod_spec, facilitating passing worker resources through to thedask-worker --resourcesoption.This avoids the pattern of manipulating the pod spec after creating.
Before:
Now: