[c10d] Distributed Data Parallel CPU module for C10D by teng-li · Pull Request #11168 · pytorch/pytorch

teng-li · 2018-09-01T02:23:31Z

Distributed Data Parallel CPU module for c10d. This is basically the same code as Distributed Data Parallel CPU module for THD, since c10d now has the exact same front-end interface as torch.distributed.

We will keep both in the first release and remove the THD one once c10d is stable enough.

Test fully covered just as THD too.

teng-li · 2018-09-05T19:50:48Z

@pytorchbot retest this please

pietern · 2018-09-05T19:59:48Z

This doesn't do the overlapping during autograd like the CUDA version does.

Do you plan to add this later? Not blocking of course.

teng-li · 2018-09-05T21:32:43Z

@pietern This was the version written by the open source community, depending on the need, we can add that later

pietern · 2018-09-05T21:34:40Z

Sounds good.

facebook-github-bot

teng-li has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

* upstream/master: (26 commits) cudnn 7 upgrade with spatialBN fix (pytorch#11291) Ignore FuseGraph Call on Windows (pytorch#11015) defer resolution of mkl to a cmake wrapper library (pytorch#11298) Cleanup dependency of distributed flags (pytorch#11221) Move minimal wrapdim functionality to core, remove THTensor include i… (pytorch#11283) Change includes from ATen/Storage.h to ATen/core/Storage.h (pytorch#11217) Fix scalar tensor assert in fusion compiler (pytorch#10952) Add dead code elimination pass (pytorch#10101) Distributed Data Parallel CPU module for C10D (pytorch#11168) Back out "[pt1][tensor] Add strides to caffe2::Tensor" Fix conv gradient conversion (pytorch#11312) Bag of clang tidy fixes for torch/csrc/ and torch/csrc/autograd (pytorch#11050) Sparse tensor printing; add NotImplemented autograd fn (pytorch#10181) Add convertToCaffe2Proto to python API fix doc for functional.dropout* (pytorch#10417) typo fix Tranpose2D -> Transpose2D (pytorch#11281) Remove THFinalizer Forward declarations of needed curand functions (pytorch#10911) nomnigraph - simplify core graph API and test (pytorch#11256) Small fixes to cppdocs for sync script (pytorch#11300) ...

Summary: Distributed Data Parallel CPU module for c10d. This is basically the same code as Distributed Data Parallel CPU module for THD, since c10d now has the exact same front-end interface as torch.distributed. We will keep both in the first release and remove the THD one once c10d is stable enough. Test fully covered just as THD too. Pull Request resolved: pytorch#11168 Differential Revision: D9674963 Pulled By: teng-li fbshipit-source-id: ecf52a7189374ca7930c2be305218167fdd822a7

[c10d] Distributed Data Parallel CPU module for C10D

61b3c25

teng-li requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners September 1, 2018 02:23

teng-li requested a review from pietern September 1, 2018 02:23

teng-li added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 1, 2018

pietern approved these changes Sep 5, 2018

View reviewed changes

facebook-github-bot reviewed Sep 6, 2018

View reviewed changes

facebook-github-bot closed this in 220c9e5 Sep 6, 2018

ezyang added the merged label Jun 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[c10d] Distributed Data Parallel CPU module for C10D#11168

[c10d] Distributed Data Parallel CPU module for C10D#11168
teng-li wants to merge 1 commit intopytorch:masterfrom
teng-li:DDPCPU

teng-li commented Sep 1, 2018 •

edited

Loading

Uh oh!

teng-li commented Sep 5, 2018

Uh oh!

pietern commented Sep 5, 2018

Uh oh!

teng-li commented Sep 5, 2018

Uh oh!

pietern commented Sep 5, 2018

Uh oh!

facebook-github-bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

teng-li commented Sep 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teng-li commented Sep 5, 2018

Uh oh!

pietern commented Sep 5, 2018

Uh oh!

teng-li commented Sep 5, 2018

Uh oh!

pietern commented Sep 5, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

teng-li commented Sep 1, 2018 •

edited

Loading