-
Notifications
You must be signed in to change notification settings - Fork 133
Description
While working on a station with 2 GPUs I noticed sometimes the utility function get_freer_gpu from utils.nn.py
does not select the right one, even if the message is correct.
The issue is well-known since nvidia-smi prints GPUs in order of the ids on the PCI-bus and CUDA Runtime
prints those in the order of perfomance capabilities
https://discuss.pytorch.org/t/gpu-device-ordering/60785
To make get_freer_gpu work correctly it is needed simply to specify an environment variable:
export CUDA_DEVICE_ORDER=PCI_BUS_IDDo it in the shell that runs your .py script or that is supporting your Jupyter notebook (e.g., use magic command %env).
I think there is no need to change anything in the code itself, but to add a line a doc - "if you plan to select freer gpu
by the library, then do set the environment variable ... ".
P. S. Sorry, to put this as an issue, I wanted to make a PR on this but the docs compilation seems has some errors right now,
so maybe one of main developers could add such comment in the docs. If you are lazy to do that, just push on me to make a PR :)
I could do that in the end :)