Local cluster YAML no longer working in 0.9.0.dev0

### What is the problem?
With my previous version of Ray (0.7.7), I had a `cluster.yaml` file that worked well, but it has since stopped working since I upgraded to 0.9.0.dev0 to include a [recent tune bug fix](https://github.com/ray-project/ray/pull/7563) for PAUSED trials. When I run a test script after running `ray up cluster.yaml`, only the head node is visible and I’m getting this warning:
```2020-03-16 19:48:44,344	WARNING worker.py:802 -- When connecting to an existing cluster, _internal_config must match the cluster's _internal_config.```
I know that there is a firewall between my machines, so I had to open specific ports and force Ray to use them in my cluster YAML file previously, so maybe there were some new port changes that are blocking communication?

*Ray version and other system information (Python version, TensorFlow version, OS):*
Ray: 0.9.0.dev0
OS: Centos 7

### Reproduction (REQUIRED)
Please provide a script that can be run to reproduce the issue. The script should have **no external library dependencies** (i.e., use fake or mock data / environments):

My cluster.yaml is:
```# An unique identifier for the head node and workers of this cluster.
cluster_name: asedler_nesu
​
## NOTE: Typically for local clusters, min_workers == initial_workers == max_workers.
​
# The minimum number of workers nodes to launch in addition to the head
# node. This number should be >= 0.
# Typically, min_workers == initial_workers == max_workers.
min_workers: 1
# The initial number of worker nodes to launch in addition to the head node.
# Typically, min_workers == initial_workers == max_workers.
initial_workers: 1
​
# The maximum number of workers nodes to launch in addition to the head node.
# This takes precedence over min_workers.
# Typically, min_workers == initial_workers == max_workers.
max_workers: 1
​
# Autoscaling parameters.
# Ignore this if min_workers == initial_workers == max_workers.
autoscaling_mode: default
target_utilization_fraction: 0.8
idle_timeout_minutes: 5
​
# This executes all commands on all nodes in the docker container,
# and opens all the necessary ports to support the Ray cluster.
# Empty string means disabled. Assumes Docker is installed.
docker:
    image: "" # e.g., tensorflow/tensorflow:1.5.0-py3
    container_name: "" # e.g. ray_docker
    # If true, pulls latest version of image. Otherwise, `docker run` will only pull the image
    # if no cached version is present.
    pull_before_run: True
    run_options: []  # Extra options to pass into "docker run"
​
# Local specific configuration.
provider:
    type: local
    head_ip: neuron.bme.emory.edu
    worker_ips: 
        - sulcus.bme.emory.edu
​
# How Ray will authenticate with newly launched nodes.
auth:
    ssh_user: asedler
    ssh_private_key: ~/.ssh/id_rsa
​
# Leave this empty.
head_node: {}
​
# Leave this empty.
worker_nodes: {}
​
# Files or directories to copy to the head and worker nodes. The format is a
# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
file_mounts: {
#    "/path1/on/remote/machine": "/path1/on/local/machine",
#    "/path2/on/remote/machine": "/path2/on/local/machine",
}
​
# List of commands that will be run before `setup_commands`. If docker is
# enabled, these commands will run outside the container and before docker
# is setup.
initialization_commands: []
​
# List of shell commands to run to set up each nodes.
setup_commands: []
​
# Custom commands that will be run on the head node after common setup.
head_setup_commands: []
​
# Custom commands that will be run on worker nodes after common setup.
worker_setup_commands: []
​
# NOTE: Modified the following commands to use the tf2-gpu environment 
# and to use specific ports that have been opened for this purpose
# by Andrew Sedler (asedler3@gatech.edu)
​
# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
    - conda activate tf2-gpu && ray stop
    - conda activate tf2-gpu && ulimit -c unlimited && ray start --head --redis-port=6379 --redis-shard-ports=59519 --node-manager-port=19580 --object-manager-port=39066 --autoscaling-config=~/ray_bootstrap_config.yaml
​
# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
    - conda activate tf2-gpu && ray stop
    - conda activate tf2-gpu && ray start --redis-address=$RAY_HEAD_IP:6379 --node-manager-port=19580 --object-manager-port=39066
```

The test script is:
```import ray
ray.init(address="localhost:6379")
import time
from pprint import pprint
@ray.remote
def f():
    time.sleep(0.01)
    return ray.services.get_node_ip_address()
# Get a list of the IP addresses of the nodes that have joined the cluster.
pprint(set(ray.get([f.remote() for _ in range(1000)])))
```

If we cannot run your script, we cannot fix your issue.

- [x] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://ray.readthedocs.io/en/latest/installation.html).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local cluster YAML no longer working in 0.9.0.dev0 #7632

What is the problem?

Reproduction (REQUIRED)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Local cluster YAML no longer working in 0.9.0.dev0 #7632

Description

What is the problem?

Reproduction (REQUIRED)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions