Add NCCL support for pytorch_distributed_simple.py#150
Add NCCL support for pytorch_distributed_simple.py#150toshihikoyanase wants to merge 6 commits intooptuna:mainfrom
pytorch_distributed_simple.py#150Conversation
|
@not522 Could you review this PR? |
|
I found that this code did not work on the PC which had NVIDIA GPUs without NCCL support. |
|
This pull request has not seen any recent activity. |
|
This pull request has not seen any recent activity. |
|
This pull request has not seen any recent activity. |
|
This pull request has not seen any recent activity. |
|
This pull request has not seen any recent activity. |
|
Sorry for my late response. I have been busy these days and don't have time to review this PR, so could you please reassign the reviewer? |
|
This pull request has not seen any recent activity. |
|
This pull request was closed automatically because it had not seen any recent activity. If you want to discuss it, you can reopen it freely. |
|
I will review this PR along with optuna/optuna#4268. @toshihikoyanase Could you fix CI errors? |
|
Let me close this PR since it uses the old API. I'll create a new one. |
Motivation
Alternative approach for #145.
Description of the changes
deviceargument ofoptuna.integration.TorchDistributedTrialMASTER_ADDRandMASTER_PORTin environment variables if existI confirmed that this script worked with 2 nodes.
See
https://pytorch.org/docs/stable/distributed.html