Skip to content

Pass worker resources to pod args#398

Merged
jacobtomlinson merged 2 commits intodask:mainfrom
jhamman:feature/worker-resources
Feb 3, 2022
Merged

Pass worker resources to pod args#398
jacobtomlinson merged 2 commits intodask:mainfrom
jhamman:feature/worker-resources

Conversation

@jhamman
Copy link
Copy Markdown
Member

@jhamman jhamman commented Feb 2, 2022

This adds a new keyword argument to make_pod_spec, facilitating passing worker resources through to the dask-worker --resources option.

This avoids the pattern of manipulating the pod spec after creating.

Before:

pod_spec = make_pod_spec(...)
pod_spec.spec.containers[0].args.extend(["--resources", "FOO=1"])

Now:

pod_spec = make_pod_spec(..., resources="FOO=1")

Copy link
Copy Markdown
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat thanks @jhamman.

It would be awesome to sync up some time and chat about what you're working on. Feel free to drop something into my calendar!

@jacobtomlinson jacobtomlinson merged commit 431c84d into dask:main Feb 3, 2022
@ddelange
Copy link
Copy Markdown
Contributor

ddelange commented Mar 23, 2022

I was surprised to see this kwarg on the docs (btw it says 2021.03 🤔) but not available on my machine (2022.1.0). Are there plans for a release this month?

image

I think it is needed for dask to understand our GPU setup. This would be our setup to schedule on a GPU machine managed by kOps:

dask_kubernetes.make_pod_spec(
    image="rapidsai/rapidsai:cuda11.5-runtime-ubuntu20.04-py3.8",
    # memory_limit="15Gi",  # not needed, dask-worker will auto-detect available RAM. using cpu requests to reserve a whole g4dn.xlarge
    cpu_request="3800m",  # 200m reservation by kube-system daemonsets
    resources="GPU=1",  # http://distributed.dask.org/en/stable/resources.html
    threads_per_worker=1,  # could try to increase with risk of OOM when multiple threads try to load different models into GPU
    extra_pod_config={
        "tolerations": [
            {"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}
        ],
        "nodeSelector": {"kops.k8s.io/gpu": "1"},  # kOps only https://github.com/kubernetes/kops/blob/v1.22.4/docs/gpu.md
    },
)

For non-kOps clusters, the nodeSelector is probably missing and you'd have to hack in nvidia.com/gpu: 1 into requests.limits:

dask_kubernetes.make_pod_spec(
    image="rapidsai/rapidsai:cuda11.5-runtime-ubuntu20.04-py3.8",
    # memory_limit="15Gi",  # not needed, dask-worker will auto-detect available RAM. using cpu requests to reserve a whole g4dn.xlarge
    # cpu_request="3800m",  # 200m reservation by kube-system daemonsets
    extra_container_config={
        "resources": {"limits": {"nvidia.com/gpu": "1"}, "requests": {"cpu": "3800m"}},
    },  # https://v1-22.docs.kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/
    resources="GPU=1",  # http://distributed.dask.org/en/stable/resources.html
    threads_per_worker=1,  # could try to increase with risk of OOM when multiple threads try to load different models into GPU
    extra_pod_config={
        "tolerations": [
            {"key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule"}
        ],
    },
)

@jacobtomlinson
Copy link
Copy Markdown
Member

Thanks @ddelange. Looks like the version number on the docs is broken, it's technically correct because we are 38 commits ahead of 2021.03.0. We are also a few commits ahead of 2022.1.0. We tend to release ad-hoc here whenever there is something of note to publish.

Thanks for spotting the docs issues. Please feel free to raise a PR to correct it.

@ddelange
Copy link
Copy Markdown
Contributor

Maybe versioneer does not find the commit of the latest tag on the branch from which the docs are built, and falls back to the latest tag which is present on that branch?
image

@ddelange
Copy link
Copy Markdown
Contributor

ddelange commented Mar 23, 2022

Maybe it's cool to point readthedocs by default to the latest stable release (instead of to HEAD containing unreleased features like this one)?
image

Under the Versions tab you can then activate all previous tags:
image

And if I'm not mistaken, all new tags will from then on automatically be marked as Active and be available from the docs dropdown:
image

@jacobtomlinson
Copy link
Copy Markdown
Member

Yeah it must be a versioneer issue in RTD. We intentionally chose to point dask and distributed to stable recently but decided to leave other projects on latest as they get released less frequently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants