Skip to content

Get GPU with freer memory - small bug to correct (in the doc) #563

@fedor-goncharov

Description

@fedor-goncharov

While working on a station with 2 GPUs I noticed sometimes the utility function get_freer_gpu from utils.nn.py
does not select the right one, even if the message is correct.

The issue is well-known since nvidia-smi prints GPUs in order of the ids on the PCI-bus and CUDA Runtime
prints those in the order of perfomance capabilities

https://discuss.pytorch.org/t/gpu-device-ordering/60785

To make get_freer_gpu work correctly it is needed simply to specify an environment variable:

export CUDA_DEVICE_ORDER=PCI_BUS_ID

Do it in the shell that runs your .py script or that is supporting your Jupyter notebook (e.g., use magic command %env).

I think there is no need to change anything in the code itself, but to add a line a doc - "if you plan to select freer gpu
by the library, then do set the environment variable ... ".

P. S. Sorry, to put this as an issue, I wanted to make a PR on this but the docs compilation seems has some errors right now,
so maybe one of main developers could add such comment in the docs. If you are lazy to do that, just push on me to make a PR :)
I could do that in the end :)

Metadata

Metadata

Assignees

Labels

type: bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions