-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): ami-3b6bce43 # Amazon Deep Learning AMI (Ubuntu)
- Ray installed from (source or binary): pip3 install ray
- Ray version: 0.3.1
- Python version: 3.6
- Exact command to reproduce: see below
Describe the problem
I first bring up a Ray cluster with ray create_or_update and then start a remote Jupyter notebook via
ssh -L 8899:localhost:8899 -i /Users/ludwig/.ssh/ray-autoscaler_us-west-2.pem ubuntu@34.213.245.91 jupyter notebook --port=8899
After that, I execute the following code blocks in the Jupyter notebook (which runs a kernel in the Python 3 environment)
import numpy as np
import ray
ray.init(redis_address="172.31.15.101:6379")
@ray.remote
def f(_):
return my_square(np.random.randint(0, 10))
def my_square(x):
return x * x
ray.get([f.remote(x) for x in range(1)])
This yields the following error:
Remote function __main__.f failed with:
Traceback (most recent call last):
File "<ipython-input-3-31e8a6f51ae7>", line 3, in f
NameError: name 'my_square' is not defined
You can inspect errors by running
ray.error_info()
If this driver is hanging, start a new one with
ray.init(redis_address="172.31.15.101:6379")
---------------------------------------------------------------------------
RayGetError Traceback (most recent call last)
<ipython-input-4-693bc96baeb6> in <module>()
----> 1 ray.get([f.remote(x) for x in range(1)])
~/anaconda3/lib/python3.6/site-packages/ray/worker.py in get(object_ids, worker)
2243 for i, value in enumerate(values):
2244 if isinstance(value, RayTaskError):
-> 2245 raise RayGetError(object_ids[i], value)
2246 return values
2247 else:
RayGetError: Could not get objectid ObjectID(2c2e76ba4dd326dd7b73540ab7933c73420b6bbf). It was created by remote function __main__.f which failed with:
Remote function __main__.f failed with:
Traceback (most recent call last):
File "<ipython-input-3-31e8a6f51ae7>", line 3, in f
NameError: name 'my_square' is not defined
After running the second code block (containing the function definitions), everything is fine.
I don't know the Ray internals, so I can only speculate about the reason. Could it be that the my_square function is not properly packaged with the remote function f because it is defined later?
Source code / logs
The source code and error output is above. For completeness, here is the Ray config file
# An unique identifier for the head node and workers of this cluster.
cluster_name: ludwig_test_1
# The minimum number of workers nodes to launch in addition to the head
# node. This number should be >= 0.
min_workers: 4
# The maximum number of workers nodes to launch in addition to the head
# node. This takes precedence over min_workers.
max_workers: 4
# The autoscaler will scale up the cluster to this target fraction of resource
# usage. For example, if a cluster of 10 nodes is 100% busy and
# target_utilization is 0.8, it would resize the cluster to 13. This fraction
# can be decreased to increase the aggressiveness of upscaling.
target_utilization_fraction: 0.8
# If a node is idle for this many minutes, it will be removed.
idle_timeout_minutes: 5
# Cloud-provider specific configuration.
provider:
type: aws
region: us-west-2
availability_zone: us-west-2c
# How Ray will authenticate with newly launched nodes.
auth:
ssh_user: ubuntu
# By default Ray creates a new private keypair, but you can also use your own.
# If you do so, make sure to also set "KeyName" in the head and worker node
# configurations below.
# ssh_private_key: /path/to/your/key.pem
# Provider-specific config for the head node, e.g. instance type. By default
# Ray will auto-configure unspecified fields such as SubnetId and KeyName.
# For more documentation on available fields, see:
# http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
head_node:
InstanceType: m5.large
ImageId: ami-3b6bce43 # Amazon Deep Learning AMI (Ubuntu)
# You can provision additional disk space with a conf as follows
# BlockDeviceMappings:
# - DeviceName: /dev/sda1
# Ebs:
# VolumeSize: 100
# Additional options in the boto docs.
# Provider-specific config for worker nodes, e.g. instance type. By default
# Ray will auto-configure unspecified fields such as SubnetId and KeyName.
# For more documentation on available fields, see:
# http://boto3.readthedocs.io/en/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances
worker_nodes:
InstanceType: m5.large
ImageId: ami-3b6bce43 # Amazon Deep Learning AMI (Ubuntu)
# Run workers on spot by default. Comment this out to use on-demand.
InstanceMarketOptions:
MarketType: spot
# Additional options can be found in the boto docs, e.g.
# SpotOptions:
# MaxPrice: MAX_HOURLY_PRICE
# Additional options in the boto docs.
# Files or directories to copy to the head and worker nodes. The format is a
# dictionary from REMOTE_PATH: LOCAL_PATH, e.g.
file_mounts: {
# "/path1/on/remote/machine": "/path1/on/local/machine",
# "/path2/on/remote/machine": "/path2/on/local/machine",
}
# List of shell commands to run to set up nodes.
setup_commands:
# Note: if you're developing Ray, you probably want to create an AMI that
# has your Ray repo pre-cloned. Then, you can replace the pip installs
# below with a git checkout <your_sha> (and possibly a recompile).
- pip install -U ray==0.3.1
# Custom commands that will be run on the head node after common setup.
head_setup_commands:
- pip install boto3==1.4.8 # 1.4.8 adds InstanceMarketOptions
# Custom commands that will be run on worker nodes after common setup.
worker_setup_commands: []
# Command to start ray on the head node. You don't need to change this.
head_start_ray_commands:
- ray stop
- ray start --head --redis-port=6379 --autoscaling-config=~/ray_bootstrap_config.yaml
# Command to start ray on worker nodes. You don't need to change this.
worker_start_ray_commands:
- ray stop
- ray start --redis-address=$RAY_HEAD_IP:6379