Skip to content

[rllib] TorchPolicy GPU not detected (IndexError) #17425

@juliusfrost

Description

@juliusfrost

What is the problem?

Ray version and other system information (Python version, TensorFlow version, OS): Windows, PyTorch

Sometimes ray.get_gpu_ids() does not list any gpus when num_gpus=1 and I get the following index error

(pid=19720) ray.exceptions.RayActorError: The actor died because of an error raised in its creation task, ray::RolloutWorker.__init__() (pid=2656, ip=192.168.50.5)
(pid=19720)   File "python\ray\_raylet.pyx", line 535, in ray._raylet.execute_task
(pid=19720)   File "python\ray\_raylet.pyx", line 485, in ray._raylet.execute_task.function_executor
(pid=19720)   File "C:\Users\Julius\Anaconda3\envs\minerl-rllib\lib\site-packages\ray\_private\function_manager.py", line 563, in actor_method_executor
(pid=19720)     return method(__ray_actor, *args, **kwargs)
(pid=19720)   File "C:\Users\Julius\Anaconda3\envs\minerl-rllib\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 550, in __init__
(pid=19720)     self._build_policy_map(
(pid=19720)   File "C:\Users\Julius\Anaconda3\envs\minerl-rllib\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1345, in _build_policy_map
(pid=19720)     self.policy_map.create_policy(name, orig_cls, obs_space, act_space,
(pid=19720)   File "C:\Users\Julius\Anaconda3\envs\minerl-rllib\lib\site-packages\ray\rllib\policy\policy_map.py", line 127, in create_policy
(pid=19720)     self[policy_id] = class_(observation_space, action_space,
(pid=19720)   File "C:\Users\Julius\Anaconda3\envs\minerl-rllib\lib\site-packages\ray\rllib\policy\policy_template.py", line 256, in __init__
(pid=19720)     self.parent_cls.__init__(
(pid=19720)   File "C:\Users\Julius\Anaconda3\envs\minerl-rllib\lib\site-packages\ray\rllib\policy\torch_policy.py", line 159, in __init__
(pid=19720)     self.device = self.devices[0]
(pid=19720) IndexError: list index out of range

Reproduction (REQUIRED)

Please provide a short code snippet (less than 50 lines if possible) that can be copy-pasted to reproduce the issue. The snippet should have no external library dependencies (i.e., use fake or mock data / environments):

Not exactly sure what is required to reproduce it... will update when I find out

If the code snippet cannot be run by itself, the issue will be closed with "needs-repro-script".

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.

@richardliaw @sven1977

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'ttriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions