Conversation
91c8ca6 to
3b4ca0c
Compare
test/test_c10d.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@pytorchbot retest this please |
|
test failures look real |
3b4ca0c to
3d92883
Compare
test/test_c10d.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_c10d.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_c10d.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_c10d.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_c10d.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
test/test_c10d.py
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
3d92883 to
f549c3f
Compare
f549c3f to
23e8275
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some. I refactored the c10d tests to derive some tests cases from a general `MultiGPUTestCase` and followed lots of patterns from `test_distributed.py` w.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!). I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from `test_distributed.py` but more inlined which I find easier to read. Pull Request resolved: pytorch#9670 Differential Revision: D8977724 Pulled By: goldsborough fbshipit-source-id: 186eab38a72384d7992a2ec5c89f304ad42d5944
Summary: This PR depends on the tests added in #9670. It moves the first, tiny function from the c10d DDP to C++: `dist_broadcast_coalesced`. Let me know if ` torch/csrc/distributed/c10d/ddp.h` will be a good place to put these rewritten functions. pietern The controller you requested could not be found. apaszke Pull Request resolved: #9729 Differential Revision: D8985308 Pulled By: goldsborough fbshipit-source-id: dc459fe9040273714044152063585e746974752f
Summary: Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some. I refactored the c10d tests to derive some tests cases from a general `MultiGPUTestCase` and followed lots of patterns from `test_distributed.py` w.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!). I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from `test_distributed.py` but more inlined which I find easier to read. Pull Request resolved: pytorch#9670 Differential Revision: D8977724 Pulled By: goldsborough fbshipit-source-id: 186eab38a72384d7992a2ec5c89f304ad42d5944
Summary: This PR depends on the tests added in pytorch#9670. It moves the first, tiny function from the c10d DDP to C++: `dist_broadcast_coalesced`. Let me know if ` torch/csrc/distributed/c10d/ddp.h` will be a good place to put these rewritten functions. pietern The controller you requested could not be found. apaszke Pull Request resolved: pytorch#9729 Differential Revision: D8985308 Pulled By: goldsborough fbshipit-source-id: dc459fe9040273714044152063585e746974752f
Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some.
I refactored the c10d tests to derive some tests cases from a general
MultiGPUTestCaseand followed lots of patterns fromtest_distributed.pyw.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!).I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from
test_distributed.pybut more inlined which I find easier to read.Test Plan:
@apaszke @pietern @teng-li