[Docker] Automagically add "runtime=nvidia"#11125

Merged

ericl merged 8 commits intoray-project:masterfrom

ijrsvt:defult-runtime-nvidia

Oct 2, 2020

Contributor

ijrsvt commented Sep 29, 2020

Why are these changes needed?

Users normally need to add - --runtime=nvidia to enable GPUs inside of their docker container. This PR makes that obsolete by checking if the nvidia runtime is available and opting to use that.

An alternate solution is to just always add the following to run_options:

--runtime=`docker info -f '{{.Runtimes.nvidia}}' | grep "nvidia-container-runtime" > /dev/null  && echo "nvidia" || docker info -f  '{{.DefaultRuntime}}'`

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(


          first pass

3763ba1

ijrsvt assigned wuisawesome and ericl

Contributor Author

ijrsvt commented Sep 29, 2020

cc @ray-project/ray-autoscaler

ericl approved these changes

View reviewed changes

ericl added the @author-action-required label

wuisawesome reviewed

View reviewed changes

Contributor

wuisawesome left a comment

Can you also use the flag in example-full.yaml and explain that this sets the --runtime flag?


          explicit default in json

ee528f2

Contributor Author

ijrsvt commented Sep 29, 2020

Sounds good @wuisawesome !

ijrsvt added 2 commits

September 29, 2020 17:01


          Change in examples

f55bb09


          change in docs

231e2f6

ijrsvt mentioned this pull request

Docker on by default in example-full.yaml; use new defaults.yaml for defaults #11121

Closed

6 tasks

wuisawesome approved these changes

View reviewed changes

Contributor

wuisawesome left a comment

lgtm

ijrsvt added 2 commits

September 30, 2020 10:19


          fix example full

a658496


          Merge branch 'master' into defult-runtime-nvidia

1c8dd42

ijrsvt added tests-ok and removed @author-action-required labels

ericl reviewed

View reviewed changes

python/ray/autoscaler/gcp/example-gpu-docker.yaml Outdated

                   container_name: "ray-nvidia-docker-test" # e.g. ray_docker
-                  run_options:
-                    - --runtime=nvidia
+                  disable_automatic_runtime_detection: False

Contributor

ericl Oct 1, 2020

Can you leave this out of the examples? It should be an internal flag that users don't need to use.

ericl reviewed

View reviewed changes

doc/source/cluster/autoscaling.rst Outdated Show resolved Hide resolved


          Update doc/source/cluster/autoscaling.rst

af981f1

ericl requested changes

View reviewed changes

Contributor

ericl left a comment

Just one comment to remove the internal flag.

ericl added the @author-action-required label

Contributor Author

ijrsvt commented Oct 1, 2020

Ahh, looks like you merged it :) Thanks @ericl

ijrsvt removed the @author-action-required label

ericl reviewed

View reviewed changes

python/ray/autoscaler/aws/example-full.yaml

    
                  # head_image: "rayproject/ray:0.8.7-gpu"

                  # head_run_options:

                  #     - --runtime=nvidia

                  # Allow Ray to automatically detect GPUs

Contributor

ericl Oct 1, 2020

Suggested change

      
                # Allow Ray to automatically detect GPUs
          
                # Allow Ray to automatically detect GPUs

python/ray/autoscaler/aws/example-full.yaml Outdated

    
                  # head_run_options:

                  #     - --runtime=nvidia

                  # Allow Ray to automatically detect GPUs

                  # disable_automatic_runtime_detection: False

Contributor

ericl Oct 1, 2020

Suggested change

      
                # disable_automatic_runtime_detection: False
          
                # disable_automatic_runtime_detection: False

python/ray/autoscaler/aws/example-gpu-docker.yaml Outdated

                   container_name: "ray-nvidia-docker-test" # e.g. ray_docker
-                  run_options:
-                    - --runtime=nvidia
+                  disable_automatic_runtime_detection: False

Contributor

ericl Oct 1, 2020

Suggested change

      
                disable_automatic_runtime_detection: False
          
                disable_automatic_runtime_detection: False

python/ray/autoscaler/aws/example-ml.yaml Outdated

                   # if no cached version is present.
                   pull_before_run: True
                   run_options: []  # Extra options to pass into "docker run"
+                  disable_automatic_runtime_detection: False

Contributor

ericl Oct 1, 2020

Suggested change

      
                disable_automatic_runtime_detection: False
          
                disable_automatic_runtime_detection: False

python/ray/autoscaler/azure/example-gpu-docker.yaml Outdated

                   container_name: "ray-nvidia-docker-test" # e.g. ray_docker
-                  run_options:
-                    - --runtime=nvidia
+                  disable_automatic_runtime_detection: False

Contributor

ericl Oct 1, 2020

Suggested change

      
                disable_automatic_runtime_detection: False
          
                disable_automatic_runtime_detection: False

python/ray/autoscaler/azure/example-gpu.yaml Outdated

                   # if no cached version is present.
                   pull_before_run: True
                   run_options: []  # Extra options to pass into "docker run"
+                  disable_automatic_runtime_detection: False

Contributor

ericl Oct 1, 2020

Suggested change

      
                disable_automatic_runtime_detection: False
          
                disable_automatic_runtime_detection: False

python/ray/autoscaler/gcp/example-gpu-docker.yaml Outdated

                   container_name: "ray-nvidia-docker-test" # e.g. ray_docker
-                  run_options:
-                    - --runtime=nvidia
+                  disable_automatic_runtime_detection: False

Contributor

ericl Oct 1, 2020

Suggested change

      
                disable_automatic_runtime_detection: False
          
                disable_automatic_runtime_detection: False

ericl requested changes

View reviewed changes

Contributor

ericl left a comment

Can you make the changes?

ericl added the @author-action-required label

Contributor Author

ijrsvt commented Oct 1, 2020

Oh, I didn't realize you wanted it changed from the YAMLs


          remove from yamls

a7b1fcb

ijrsvt removed the @author-action-required label

ericl requested changes

View reviewed changes

Contributor

ericl left a comment

Did you commit the changes? I left the as comments only.

ericl added the @author-action-required label

Contributor Author

ijrsvt commented Oct 1, 2020

@ericl 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦
Yes i did, but I forgot to push.

ijrsvt removed the @author-action-required label

ericl approved these changes

View reviewed changes

ericl merged commit 0d5b09f into ray-project:master

ijrsvt mentioned this pull request

[Docker] Set Docker as the Default #11416

Merged

10 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels