Skip to content

[Docker] Automagically add "runtime=nvidia"#11125

Merged
ericl merged 8 commits intoray-project:masterfrom
ijrsvt:defult-runtime-nvidia
Oct 2, 2020
Merged

[Docker] Automagically add "runtime=nvidia"#11125
ericl merged 8 commits intoray-project:masterfrom
ijrsvt:defult-runtime-nvidia

Conversation

@ijrsvt
Copy link
Copy Markdown
Contributor

@ijrsvt ijrsvt commented Sep 29, 2020

Why are these changes needed?

Users normally need to add - --runtime=nvidia to enable GPUs inside of their docker container. This PR makes that obsolete by checking if the nvidia runtime is available and opting to use that.

An alternate solution is to just always add the following to run_options:

--runtime=`docker info -f '{{.Runtimes.nvidia}}' | grep "nvidia-container-runtime" > /dev/null  && echo "nvidia" || docker info -f  '{{.DefaultRuntime}}'`

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@ijrsvt
Copy link
Copy Markdown
Contributor Author

ijrsvt commented Sep 29, 2020

cc @ray-project/ray-autoscaler

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Sep 29, 2020
Copy link
Copy Markdown
Contributor

@wuisawesome wuisawesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also use the flag in example-full.yaml and explain that this sets the --runtime flag?

@ijrsvt
Copy link
Copy Markdown
Contributor Author

ijrsvt commented Sep 29, 2020

Sounds good @wuisawesome !

Copy link
Copy Markdown
Contributor

@wuisawesome wuisawesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@ijrsvt ijrsvt added tests-ok The tagger certifies test failures are unrelated and assumes personal liability. and removed @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. labels Oct 1, 2020
container_name: "ray-nvidia-docker-test" # e.g. ray_docker
run_options:
- --runtime=nvidia
disable_automatic_runtime_detection: False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this out of the examples? It should be an internal flag that users don't need to use.

Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment to remove the internal flag.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 1, 2020
@ijrsvt
Copy link
Copy Markdown
Contributor Author

ijrsvt commented Oct 1, 2020

Ahh, looks like you merged it :) Thanks @ericl

@ijrsvt ijrsvt removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 1, 2020
# head_image: "rayproject/ray:0.8.7-gpu"
# head_run_options:
# - --runtime=nvidia
# Allow Ray to automatically detect GPUs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Allow Ray to automatically detect GPUs
# Allow Ray to automatically detect GPUs

# head_run_options:
# - --runtime=nvidia
# Allow Ray to automatically detect GPUs
# disable_automatic_runtime_detection: False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# disable_automatic_runtime_detection: False
# disable_automatic_runtime_detection: False

container_name: "ray-nvidia-docker-test" # e.g. ray_docker
run_options:
- --runtime=nvidia
disable_automatic_runtime_detection: False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disable_automatic_runtime_detection: False
disable_automatic_runtime_detection: False

# if no cached version is present.
pull_before_run: True
run_options: [] # Extra options to pass into "docker run"
disable_automatic_runtime_detection: False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disable_automatic_runtime_detection: False
disable_automatic_runtime_detection: False

container_name: "ray-nvidia-docker-test" # e.g. ray_docker
run_options:
- --runtime=nvidia
disable_automatic_runtime_detection: False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disable_automatic_runtime_detection: False
disable_automatic_runtime_detection: False

# if no cached version is present.
pull_before_run: True
run_options: [] # Extra options to pass into "docker run"
disable_automatic_runtime_detection: False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disable_automatic_runtime_detection: False
disable_automatic_runtime_detection: False

container_name: "ray-nvidia-docker-test" # e.g. ray_docker
run_options:
- --runtime=nvidia
disable_automatic_runtime_detection: False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
disable_automatic_runtime_detection: False
disable_automatic_runtime_detection: False

Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the changes?

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 1, 2020
@ijrsvt
Copy link
Copy Markdown
Contributor Author

ijrsvt commented Oct 1, 2020

Oh, I didn't realize you wanted it changed from the YAMLs

@ijrsvt ijrsvt removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 1, 2020
Copy link
Copy Markdown
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you commit the changes? I left the as comments only.

@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 1, 2020
@ijrsvt
Copy link
Copy Markdown
Contributor Author

ijrsvt commented Oct 1, 2020

@ericl 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦 🤦‍♂️ 🤦
Yes i did, but I forgot to push.

@ijrsvt ijrsvt removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 1, 2020
@ericl ericl merged commit 0d5b09f into ray-project:master Oct 2, 2020
@ijrsvt ijrsvt mentioned this pull request Oct 16, 2020
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

tests-ok The tagger certifies test failures are unrelated and assumes personal liability.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants