-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
tuneTune-related issuesTune-related issues
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Ubuntu 16.04 - Ray installed from (source or binary):
pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.8.0.dev4-cp36-cp36m-manylinux1_x86_64.whl - Ray version:
0.8.0.dev4 - Python version:
3.6.7 - Exact command to reproduce:
Describe the problem
I run 5 trials with ray.tune. In one of the trials (each time), an error occurs at the end of training: AssertionError: Resource invalid: Resources(cpu=3, gpu=0.33, memory=0, object_store_memory=0, extra_cpu=0, extra_gpu=0, extra_memory=0, extra_object_store_memory=0, custom_resources={}, extra_custom_resources={}).
When I trace back the error, I end up in the following function (ray/tune/resources.py):
def is_nonnegative(self):
all_values = [self.cpu, self.gpu, self.extra_cpu, self.extra_gpu]
all_values += list(self.custom_resources.values())
all_values += list(self.extra_custom_resources.values())
return all(v >= 0 for v in all_values)
It seems custom_resources and extra_custom_resources are not defined. It is weird that the error only occurs in one run... Is this a bug, or any suggestions on how to fix?
Source code / logs
This is how I call tune.run
tune.run(
ModelTrainerMT,
resources_per_trial={
'cpu': config['ncpu'],
'gpu': config['ngpu'],
},
num_samples=1,
config=best_config,
local_dir=store,
raise_on_failed_trial=True,
verbose=1,
with_server=False,
ray_auto_init=False,
scheduler=early_stopping_scheduler,
loggers=[JsonLogger, CSVLogger],
checkpoint_at_end=True,
reuse_actors=True,
stop={'epoch': 2 if args.test else config['max_t']}
)
Traceback
2019-09-06 09:56:45,526 ERROR trial_runner.py:557 -- Error processing event.
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 552, in _process_trial
self.trial_executor.stop_trial(trial)
File "/opt/conda/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 246, in stop_trial
self._return_resources(trial.resources)
File "/opt/conda/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 388, in _return_resources
"Resource invalid: {}".format(resources))
AssertionError: Resource invalid: Resources(cpu=3, gpu=0.33, memory=0, object_store_memory=0, extra_cpu=0, extra_gpu=0, extra_memory=0, extra_object_store_memory=0, custom_resources={}, extra_custom_resources={})
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
tuneTune-related issuesTune-related issues