use allgatherv for sparse all reduce by zhaojuanmao · Pull Request #23917 · pytorch/pytorch

zhaojuanmao · 2019-08-07T00:05:54Z

Summary:
per #22226, The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv. This is mostly a win for memory usage if there is severe size imbalance between processes.

close #22226

Differential Revision: D16664985

pietern

Nice work! Mostly style/naming nits.

Also, I think that setAllGatherVOutput can be named just setOutput since the counts setter will be applicable to e.g. gatherv, scatterv as well.

pietern · 2019-08-07T06:13:29Z

Path style.

pietern · 2019-08-08T10:03:55Z

Is the cast needed?

pietern · 2019-08-08T10:06:22Z

Whitespace -- you can run clang-format to fix up style before committing.

pietern · 2019-08-08T10:12:57Z

This can be set as const outside the loop (and even at the beginning of the function).

pietern · 2019-08-08T10:21:05Z

This is not the number of dimensions but the number of elements in the dense dimensions, so wouldn't denseNumel (or similar) be more descriptive here?

pietern · 2019-08-08T10:24:56Z

The test show something is wrong in the implementation.

pietern · 2019-08-28T07:42:17Z

@zhaojuanmao It needs a rebase now because there seems to be a conflict with master.

Do you have time to continue this?

zhaojuanmao · 2019-08-28T07:51:28Z

@zhaojuanmao It needs a rebase now because there seems to be a conflict with master.

Do you have time to continue this?

the cuda test failed and needs some time to debug, possibly can work on it next week.

pietern

Can you clarify the need for the contiguous() call?

Don't worry about the const-ness in the loop body if this doesn't require further changing.

pietern · 2019-09-16T18:08:11Z

The inputs are always coalesced before running the algorithm and I was thinking that that implies they'll be contiguous.

Is this not the case?

pietern · 2019-09-16T18:09:04Z

These can both be const.

pietern · 2019-09-16T18:09:59Z

These can both be const.

pietern · 2019-09-16T18:11:12Z

Same as above -- I don't think this is needed.

zhaojuanmao · 2019-09-16T18:19:57Z

@pietern some tensors copied from cuda are not be contiguous, this is what I printed out locally while debugging, If I directly print out values using tensor data pointer, some values are garbage values.

pietern · 2019-09-16T18:45:30Z

This was against tensors when running tests? Do you have a repro for this?

I'm surprised this was not an issue before, because the max_nnz padding may have had the same issue and caused garbage crashes as well.

zhaojuanmao · 2019-09-16T18:46:32Z

this is same for existing codes, I printed out locally for existing codes, some tensors copies from cuda is not contiguous. but they called copy_ to a contiguous buffer tensors before calling allgather, the copy_ is similar to call contiguous() call

zhaojuanmao · 2019-09-16T18:47:50Z

yes, it is for tensors in unit tests. I printed values and is_contiguous() flag, for tensors with garbage values using data pointer, is_contiguous() is false

pietern · 2019-09-16T19:07:36Z

Ah yes, of course!

Can you create an issue for this? We call coalesce, which gives the underlying code the opportunity to coalesce as well. I don't think there is a good reason for not creating indices/values in an non-coalesced tensor if you're creating it anyway, so perhaps this is a legit bug.

Let's merge this now and then update if this is indeed a bug and there is a resolution.

zhaojuanmao · 2019-09-17T20:02:53Z

Ah yes, of course!

Can you create an issue for this? We call coalesce, which gives the underlying code the opportunity to coalesce as well. I don't think there is a good reason for not creating indices/values in an non-coalesced tensor if you're creating it anyway, so perhaps this is a legit bug.

Let's merge this now and then update if this is indeed a bug and there is a resolution.

asked whether it is expected to have non-contiguous indices tensor after calling coalesce() for the sparse tensor here https://discuss.pytorch.org/t/indices-tensor-is-not-contiguous-after-calling-sparse-tensor-coalesce/56198, because based on codes https://our.internmc.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/native/sparse/SparseTensor.cpp?lines=350, looks like indices tensor should be contiguous

if the answer is not expected, will create a bug PR issue

zhaojuanmao · 2019-09-17T21:28:28Z

@pytorchbot retest

pietern · 2019-09-18T06:28:46Z

The "ensure no tabs" linter is for unrelated changes on master (which should be fixed separately).

zhaojuanmao · 2019-09-18T16:32:18Z

@pietern yesterday MacOS failed to build because it can not recognize "long" symbols somehow, I changed it to int64_t, now it passed the tests.

will land it soon

facebook-github-bot

@zhaojuanmao is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: per pytorch#22226, The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv. This is mostly a win for memory usage if there is severe size imbalance between processes. close pytorch#22226 Pull Request resolved: pytorch#23917 Test Plan: buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics_cuda buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_checks Differential Revision: D16664985 Pulled By: zhaojuanmao fbshipit-source-id: a9a139da2b64617d2bb7f0b12f272e920140e5d1

facebook-github-bot · 2019-09-18T22:10:10Z

@zhaojuanmao merged this pull request in ed09704.

#23917 switched to using allgatherv instead of allgather for gloo sparse all-reduce. This PR removes a comment saying to use allgatherv if available since that has already been done. Pull Request resolved: #87018 Approved by: https://github.com/H-Huang

Summary: per pytorch#22226, The current sparse allreduce in ProcessGroupGloo pads the indices and values tensors to the maximum length across all processes and then performs a regular allgather (because they'll have equal size across processes). Instead, we can use allgatherv. This is mostly a win for memory usage if there is severe size imbalance between processes. close pytorch#22226 Pull Request resolved: pytorch#23917 Test Plan: buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_basics_cuda buck run mode/dev-nosan caffe2/test:c10d -- test_c10d.ProcessGroupGlooTest.test_sparse_allreduce_checks Differential Revision: D16664985 Pulled By: zhaojuanmao fbshipit-source-id: e7d3c0770cbc09f9175b3027b527e95053724843

pytorch#23917 switched to using allgatherv instead of allgather for gloo sparse all-reduce. This PR removes a comment saying to use allgatherv if available since that has already been done. Pull Request resolved: pytorch#87018 Approved by: https://github.com/H-Huang

zhaojuanmao requested review from apaszke, mrshenli and pietern as code owners August 7, 2019 00:05

pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Aug 7, 2019

zhaojuanmao requested a review from pritamdamania87 August 7, 2019 00:12

pietern reviewed Aug 8, 2019

View reviewed changes

zhaojuanmao force-pushed the export-D16664985 branch from 70f8d17 to 2c04cc3 Compare September 8, 2019 23:36

zhaojuanmao force-pushed the export-D16664985 branch 2 times, most recently from 7806acd to c345402 Compare September 10, 2019 22:23

zhaojuanmao force-pushed the export-D16664985 branch from c345402 to 2837625 Compare September 13, 2019 22:48

zhaojuanmao requested a review from pietern September 16, 2019 16:29

pietern reviewed Sep 16, 2019

View reviewed changes

pietern approved these changes Sep 16, 2019

View reviewed changes

zhaojuanmao force-pushed the export-D16664985 branch from 2837625 to 04960df Compare September 17, 2019 19:06

zhaojuanmao force-pushed the export-D16664985 branch from 04960df to d1eeacc Compare September 17, 2019 21:48

zhaojuanmao force-pushed the export-D16664985 branch from d1eeacc to 612c2c5 Compare September 18, 2019 05:21

facebook-github-bot reviewed Sep 18, 2019

View reviewed changes

zhaojuanmao force-pushed the export-D16664985 branch from 612c2c5 to e7800ca Compare September 18, 2019 16:44

facebook-github-bot closed this in ed09704 Sep 18, 2019

facebook-github-bot added the merged label Sep 18, 2019

zhaojuanmao deleted the export-D16664985 branch September 16, 2020 23:47

mruberry added the Merged label Oct 28, 2020

awgu mentioned this pull request Oct 15, 2022

[Docs] Remove outdated comment for sparse all-reduce #87018

Closed

Conversation

zhaojuanmao commented Aug 7, 2019

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pietern commented Aug 8, 2019

Uh oh!

pietern commented Aug 28, 2019

Uh oh!

zhaojuanmao commented Aug 28, 2019

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaojuanmao commented Sep 16, 2019

Uh oh!

pietern commented Sep 16, 2019

Uh oh!

zhaojuanmao commented Sep 16, 2019

Uh oh!

zhaojuanmao commented Sep 16, 2019

Uh oh!

pietern commented Sep 16, 2019

Uh oh!

zhaojuanmao commented Sep 17, 2019

Uh oh!

zhaojuanmao commented Sep 17, 2019

Uh oh!

pietern commented Sep 18, 2019

Uh oh!

zhaojuanmao commented Sep 18, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants