-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Closed
Labels
P0Issues that should be fixed in short orderIssues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't
Description
What is the problem?
Using a recent nightly build of Ray/RLlib, you can't train using GPUs with TensorFlow 1.14 due to an API mismatch.
rollout_worker.py assumes that tensorflow has a function list_physical_devices but in 1.14, it's only experimental_list_devices, so you get
AttributeError: module 'tensorflow._api.v1.config' has no attribute 'list_physical_devices'
Here's the code in question in rollout_worker.py:
if (ray.is_initialized() and
ray.worker._mode() != ray.worker.LOCAL_MODE):
# Check available number of GPUs
if not ray.get_gpu_ids():
logger.debug(
"Creating policy evaluation worker {}".format(
worker_index) +
" on CPU (please ignore any CUDA init errors)")
elif (policy_config["framework"] in ["tf2", "tf", "tfe"] and
not tf.config.list_physical_devices("GPU")) or \
(policy_config["framework"] == "torch" and
not torch.cuda.is_available()):
raise RuntimeError(
"GPUs were assigned to this worker by Ray, but "
"your DL framework ({}) reports GPU acceleration is "
"disabled. This could be due to a bad CUDA- or {} "
"installation.".format(
policy_config["framework"],
policy_config["framework"]))vs the API in tensorflow/_api/v1/config/__init__.py:
from tensorflow.python.eager.context import list_devices as experimental_list_devicesand here's the full stacktrace:
Traceback (most recent call last):
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 471, in _process_trial
result = self.trial_executor.fetch_result(trial)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 430, in fetch_result
result = ray.get(trial_future[0], DEFAULT_GET_TIMEOUT)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/worker.py", line 1532, in get
raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::PPO.train() (pid=7711, ip=10.128.0.4)
File "python/ray/_raylet.pyx", line 433, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 468, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 472, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 473, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 426, in ray._raylet.execute_task.function_executor
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 88, in __init__
Trainer.__init__(self, config, env, logger_creator)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 475, in __init__
super().__init__(config, logger_creator)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/tune/trainable.py", line 232, in __init__
self.setup(copy.deepcopy(self.config))
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 639, in setup
self._init(self.config, self.env_creator)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 102, in _init
env_creator, self._policy, config, self.config["num_workers"])
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 709, in _make_workers
logdir=self.logdir)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 67, in __init__
RolloutWorker, env_creator, policy, 0, self._local_config)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/evaluation/worker_set.py", line 296, in _make_worker
extra_python_environs=extra_python_environs)
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/ray/rllib/evaluation/rollout_worker.py", line 415, in __init__
not tf.config.list_physical_devices("GPU")) or \
File "/home/andrew/miniconda3/envs/ray_nightly_tf14/lib/python3.7/site-packages/tensorflow/python/util/deprecation_wrapper.py", line 106, in __getattr__
attr = getattr(self._dw_wrapped_module, name)
AttributeError: module 'tensorflow._api.v1.config' has no attribute 'list_physical_devices'
Reproduction (REQUIRED)
Ray: latest nightly wheel as of 2020-07-22
TensorFlow: 1.14
Python: 3.7
OS: Ubuntu 20.04
from ray import tune
from ray.rllib.agents.ppo import PPOTrainer
tune.run(PPOTrainer,
config={
"env": "CartPole-v0",
"num_workers": 4,
"num_envs_per_worker": 2,
"num_gpus": 0.5,
"num_gpus_per_worker": 0.1,
})Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P0Issues that should be fixed in short orderIssues that should be fixed in short orderbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't