-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
- Ray installed from (source or binary): source
- Ray version: 215d526
- Python version: 3.6.2 Anaconda
- Exact command to reproduce:
-
On machine 1
ray start --head --redis-port=6379 --num-workers=0 -
On machine 2
ray start --redis-address <head-node-ip>:6379 --num-workers=0 -
On machine 1
cd ray/python/ray/rllib python train.py --run=ES --env=CartPole-v0 --redis-address=<head-node-ip>:6379
About half of the time, this fails with
$ python train.py --run=ES --env=CartPole-v0 --redis-address=172.31.5.255:6379
/home/ubuntu/anaconda3/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
== Status ==
Using FIFO scheduling algorithm.
Result logdir: /home/ubuntu/ray_results/default
- ES_CartPole-v0_0: PENDING
Unified logger created with logdir '/home/ubuntu/ray_results/default/ES_CartPole-v0_0_2018-01-19_01-37-30wdhanz66'
== Status ==
Using FIFO scheduling algorithm.
Resources used: 1/8 CPUs, 0/0 GPUs
Result logdir: /home/ubuntu/ray_results/default
- ES_CartPole-v0_0: RUNNING
Remote function __init__ failed with:
Traceback (most recent call last):
File "/home/ubuntu/ray3/python/ray/worker.py", line 771, in _process_task
*arguments)
File "/home/ubuntu/ray3/python/ray/actor.py", line 196, in actor_method_executor
return method(actor, *args)
File "/home/ubuntu/ray3/python/ray/rllib/agent.py", line 127, in __init__
self._init()
File "/home/ubuntu/ray3/python/ray/rllib/es/es.py", line 157, in _init
noise_id = create_shared_noise.remote()
File "/home/ubuntu/ray3/python/ray/worker.py", line 2509, in func_call
objectids = _submit_task(function_id, args)
File "/home/ubuntu/ray3/python/ray/worker.py", line 2364, in _submit_task
return worker.submit_task(function_id, args)
File "/home/ubuntu/ray3/python/ray/worker.py", line 543, in submit_task
self.task_driver_id.id()][function_id.id()]
KeyError: b'Z`\xd9\xd5?/\x88\x04>\xa4Xph\xb9\xe3\xca\xf4\xa1\x1b\x13'
You can inspect errors by running
ray.error_info()
If this driver is hanging, start a new one with
ray.init(redis_address="172.31.5.255:6379")
Remote function train failed with:
Traceback (most recent call last):
File "/home/ubuntu/ray3/python/ray/worker.py", line 771, in _process_task
*arguments)
File "/home/ubuntu/ray3/python/ray/actor.py", line 196, in actor_method_executor
return method(actor, *args)
File "/home/ubuntu/ray3/python/ray/rllib/agent.py", line 145, in train
"Agent initialization failed, see previous errors")
ValueError: Agent initialization failed, see previous errors
You can inspect errors by running
ray.error_info()
If this driver is hanging, start a new one with
ray.init(redis_address="172.31.5.255:6379")
Error processing event: Traceback (most recent call last):
File "/home/ubuntu/ray3/python/ray/tune/trial_runner.py", line 162, in _process_events
result = ray.get(result_id)
File "/home/ubuntu/ray3/python/ray/worker.py", line 2240, in get
raise RayGetError(object_ids, value)
ray.worker.RayGetError: Could not get objectid ObjectID(a87f1adc2ec2e19f0199e246b9f733c6ea16750c). It was created by remote function train which failed with:
Remote function train failed with:
Traceback (most recent call last):
File "/home/ubuntu/ray3/python/ray/worker.py", line 771, in _process_task
*arguments)
File "/home/ubuntu/ray3/python/ray/actor.py", line 196, in actor_method_executor
return method(actor, *args)
File "/home/ubuntu/ray3/python/ray/rllib/agent.py", line 145, in train
"Agent initialization failed, see previous errors")
ValueError: Agent initialization failed, see previous errors
Stopping ES_CartPole-v0_0 Actor timed out, but moving on...
== Status ==
Using FIFO scheduling algorithm.
Resources used: 0/8 CPUs, 0/0 GPUs
Result logdir: /home/ubuntu/ray_results/default
- ES_CartPole-v0_0: ERROR
Traceback (most recent call last):
File "train.py", line 82, in <module>
num_cpus=args.num_cpus, num_gpus=args.num_gpus)
File "/home/ubuntu/ray3/python/ray/tune/tune.py", line 82, in run_experiments
raise TuneError("Trial did not complete", trial)
ray.tune.error.TuneError: ('Trial did not complete', <ray.tune.trial.Trial object at 0x7f30baab6c18>)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't