[c10d] Make C10d support CPU only build#11513
[c10d] Make C10d support CPU only build#11513teng-li wants to merge 3 commits intopytorch:masterfrom
Conversation
apaszke
left a comment
There was a problem hiding this comment.
I'm not an expert in our build system nor c10d, but looks ok after a cursory look. I have some comments that could be improved. It's not very nice that we have to sprinkle that many ifdefs, and I'm not sure if that will work fine with our binaries, so it might be better to consult someone else too.
| endif() | ||
| else() | ||
| message(STATUS "Building C10D without CUDA support") | ||
| endif() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| add_definitions(-DUSE_CUDA=1) | ||
| endif() | ||
| else() | ||
| message(STATUS "Building C10D without CUDA support") |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| #ifdef USE_CUDA | ||
| , | ||
| cuda_(false) | ||
| #endif |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| for (size_t i = 0; i < srcSizes.size(); i++) { | ||
| #ifdef USE_CUDA | ||
| deviceGuard.set_index(key.type->is_cuda() ? key.devices[i] : -1); | ||
| #else |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| c10d_add_test(ProcessGroupGlooAsyncTest.cpp c10d c10d_cuda_test) | ||
| if(DISTRIBUTED_NCCL_FOUND) | ||
| c10d_add_test(ProcessGroupNCCLTest.cpp c10d c10d_cuda_test) | ||
| endif() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| if(USE_CUDA AND CUDA_FOUND) | ||
| cuda_add_library(c10d_cuda_test CUDATest.cu) | ||
| target_link_libraries(c10d_cuda_test c10d) | ||
| endif() |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
* master: (165 commits) Aibench for asr decoder Explicitly set locale on docs build. (pytorch#11595) Documentation for debugging JIT Fused weightnorm for ATen (pytorch#10842) Move Type, Tensor, TensorMethods to core. Add reminder % to the jit Fix reloading modules back into python (pytorch#11552) Add trigonometry functions to docs/source/onnx.rst Add EndToEndHybridModel CUDA tests (pytorch#11544) minor formatting error log (pytorch#11528) Warn that export+import module always load onto the CPU (pytorch#11485) caffe2::StorageImpl use at::DataPtr (pytorch#11282) Sync all libnccl soversions, not just libnccl.so.1 (pytorch#11575) Document BatchNorm and update default behavior (pytorch#11484) Typo fix in randomness.rst (pytorch#11571) Move some bmm/baddbmm to ATen (pytorch#11292) Make c10d test work on CPU only build (pytorch#11567) Clean up some C++ cruftiness in the script lexer. Allow setting deletion constant Make C10d support CPU only build (pytorch#11513) ...
Summary: This makes torch.distributed works for CPU only build. Also added one more CI test case to cover MPI CPU build. All CI tests should cover this change Pull Request resolved: pytorch#11513 Differential Revision: D9784546 Pulled By: teng-li fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7
This makes torch.distributed works for CPU only build.
Also added one more CI test case to cover MPI CPU build.
All CI tests should cover this change