Skip to content

[rllib] How do I get my GPU to work with my Torch Policies? config["num_gpus"] = 1 introduces errors;  #17174

@akshaygh0sh

Description

@akshaygh0sh

I am in the final stages of a project I’ve been working on for a while now in RLlib and as I try to train my model using the gpu (and the Tune API with config[“num_gpus”] = 1), I can’t seem to get it to run without throwing errors.

Specifically, when I try to train my agent, I get an error thrown from here (Line 157) essentially telling me that len(self.devices) is 0 and that no GPUS are being detected.

Initially I thought it was because my GPU was not set up to work with PyTorch (which is the framework I am using for my project), but after running a simple test with torch.cuda.is_available(), torch.cuda.device(0), and torch.cuda.get_device_name(0) I can see that my GPU is being recognized by Torch (RTX 2060-Max Q, just for reference).

Has anyone encountered this error before and are there any workarounds to it? I saw someone suggest removing config["num_gpus] = 1 https://www.gitmemory.com/issue/ray-project/ray/16459/862005565 from the tune.run config call, but that seems to just cause the PyTorch policies to run on my CPU (and they train properly), which is not what I wanted.

Thanks for your help

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementRequest for new feature and/or capability

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions