[1.6 cherry pick] Add guard for non-default stream in DDP's autograd engine callback by rohan-varma · Pull Request #41151 · pytorch/pytorch

rohan-varma · 2020-07-08T22:08:06Z

Port of #40115, to merge into release/1.6.

Original commit description:

Summary:
Pull Request resolved: #40115

A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by with torch.cuda.Stream(stream). In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail.

If a user does the following:

stream = torch.cuda.Stream()
with torch.cuda.stream(stream):
model = DDP(...)
loss = model(inptut).sum()
loss.backward()
grad = model.module.weight.grad()
average = dist.all_reduce(grad)

There is a chance that average and grad will not be equal. This is because the CUDA kernels corresponding to the all_reduce call may run before loss.backward()'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all.

This PR fixes the issue by passing the current stream into DDP's callback.

Tested by adding a UT test_DistributedDataParallel_non_default_stream that fails without this PR
ghstack-source-id: 106481208

Differential Revision: D22073353

fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e

…40115) Summary: Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106481208 Differential Revision: D22073353 fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e

rohan-varma requested review from mrshenli, pietern, pritamdamania87 and zhaojuanmao as code owners July 8, 2020 22:08

rohan-varma mentioned this pull request Jul 8, 2020

[v1.6.0] Release Tracker #40472

Closed

mrshenli approved these changes Jul 8, 2020

View reviewed changes

malfet merged commit 77ffb25 into release/1.6 Jul 9, 2020

facebook-github-bot deleted the cherrypick_ddp_stream branch January 27, 2021 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.6 cherry pick] Add guard for non-default stream in DDP's autograd engine callback#41151

[1.6 cherry pick] Add guard for non-default stream in DDP's autograd engine callback#41151
malfet merged 1 commit intorelease/1.6from
cherrypick_ddp_stream

rohan-varma commented Jul 8, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rohan-varma commented Jul 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rohan-varma commented Jul 8, 2020 •

edited

Loading