[rllib] How do I get my GPU to work with my Torch Policies? config["num_gpus"] = 1 introduces errors; 

I am in the final stages of a project I’ve been working on for a while now in RLlib and as I try to train my model using the gpu (and the Tune API with config[“num_gpus”] = 1), I can’t seem to get it to run without throwing errors.

Specifically, when I try to train my agent, I get an error thrown from [here (Line 157) ](https://github.com/ray-project/ray/blob/3e53619d6479beea383a86f2ec1bd89e3756f6c2/rllib/policy/torch_policy.py#L157) essentially telling me that len(self.devices) is 0 and that no GPUS are being detected.

Initially I thought it was because my GPU was not set up to work with PyTorch (which is the framework I am using for my project), but after running a simple test with torch.cuda.is_available(), torch.cuda.device(0), and torch.cuda.get_device_name(0) I can see that my GPU is being recognized by Torch (RTX 2060-Max Q, just for reference).

Has anyone encountered this error before and are there any workarounds to it? I saw someone suggest removing config["num_gpus] = 1 [https://www.gitmemory.com/issue/ray-project/ray/16459/862005565](url) from the tune.run config call, but that seems to just cause the PyTorch policies to run on my CPU (and they train properly), which is not what I wanted.

Thanks for your help


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rllib] How do I get my GPU to work with my Torch Policies? config["num_gpus"] = 1 introduces errors; #17174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[rllib] How do I get my GPU to work with my Torch Policies? config["num_gpus"] = 1 introduces errors; #17174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions