-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Head node: OSX Mojave
Workers: Ubuntu 16.04 - Ray installed from (source or binary): Source
- Ray version: 0.7.3
- Python version: 3.7.4
- Exact command to reproduce:
ray up tune.yaml
ray submit --docker tune.yaml tune_script.py
tune.yaml: https://github.com/ethanabrooks/ppo/blob/tune-task/tune.yaml
Dockerfile: https://github.com/lobachevzky/ppo/blob/tune-task/Dockerfile
tune_script.py: https://github.com/lobachevzky/ppo/blob/tune-task/tune_script.py
Describe the problem
I am trying to use the Ray library to launch runs on multiple remote machines with Docker. Per the docs, I use ray up CONFIG_YAML to set up my cluster and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT to run a script on them. The problem is that the process/container only launches on the head node and nothing runs on the workers.
Examining the source, ray up CONFIG_YAML calls the function create_or_update_cluster and ray submit [OPTIONS] CLUSTER_CONFIG_FILE SCRIPT calls submit. Neither of these appear to interact with any node except the head.