Skip to content

[redo] Fix SyncBatchNorm forward pass for non-default process group#43861

Closed
vkuzo wants to merge 1 commit intogh/vkuzo/134/basefrom
gh/vkuzo/134/head
Closed

[redo] Fix SyncBatchNorm forward pass for non-default process group#43861
vkuzo wants to merge 1 commit intogh/vkuzo/134/basefrom
gh/vkuzo/134/head

Conversation

@vkuzo
Copy link
Copy Markdown
Contributor

@vkuzo vkuzo commented Aug 30, 2020

Stack from ghstack:

Summary:

This is a redo of #38874, and
fixing my original bug from
#38246.

Test Plan:

run DDP + SyncBN on two process groups:
https://gist.github.com/vkuzo/8501121d0d25261b70110aae4465ae0a
the script fails before this PR and works after this PR

Reviewers:

Subscribers:

Tasks:

Tags:

Differential Revision: D23418816

Summary:

This is a redo of #38874, and
fixing my original bug from
#38246.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]
@vkuzo vkuzo requested a review from apaszke as a code owner August 30, 2020 16:25
vkuzo added a commit that referenced this pull request Aug 30, 2020
Summary:

This is a redo of #38874, and
fixing my original bug from
#38246.

Test Plan: CI

Reviewers:

Subscribers:

Tasks:

Tags:

ghstack-source-id: fb76ed7
Pull Request resolved: #43861
@vkuzo vkuzo changed the title Fix SyncBatchNorm forward pass for non-default process group [redo] Fix SyncBatchNorm forward pass for non-default process group Aug 30, 2020
@vkuzo vkuzo requested a review from ngimel August 30, 2020 16:30
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 30, 2020

Codecov Report

Merging #43861 into gh/vkuzo/134/base will increase coverage by 0.00%.
The diff coverage is 0.00%.

Impacted file tree graph

@@                Coverage Diff                 @@
##           gh/vkuzo/134/base   #43861   +/-   ##
==================================================
  Coverage              69.31%   69.32%           
==================================================
  Files                    378      378           
  Lines                  46745    46745           
==================================================
+ Hits                   32403    32404    +1     
+ Misses                 14342    14341    -1     
Impacted Files Coverage Δ
torch/nn/modules/_functions.py 63.30% <0.00%> (ø)
torch/testing/_internal/expecttest.py 78.57% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a41fa4...bc60649. Read the comment docs.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Aug 30, 2020

💊 CI failures summary and remediations

As of commit bc60649 (more details on the Dr. CI page):


  • 2/2 failures possibly* introduced in this PR
    • 2/2 non-CircleCI failure(s)

Extra GitHub checks: 1 failed


codecov.io: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

Copy link
Copy Markdown
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to add test?

@vkuzo
Copy link
Copy Markdown
Contributor Author

vkuzo commented Sep 2, 2020

would be good to add test?

@ngimel would you be ok with a manual test for this instance (100% ok to say yes)? There are some issues with my devgpu which are preventing me from running the OSS distributed test suite properly, so just hoping to plan whether I need to block a few hours to fix my env for this PR.

@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented Sep 2, 2020

Yeah, manual test is ok, our OSS CI probably just has 2 gpus and won't allow you to create non-default process group anyway.

@vkuzo
Copy link
Copy Markdown
Contributor Author

vkuzo commented Sep 2, 2020

Yeah, manual test is ok, our OSS CI probably just has 2 gpus and won't allow you to create non-default process group anyway.

great, thanks, added manual test to test plan

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in b167402.

@facebook-github-bot facebook-github-bot deleted the gh/vkuzo/134/head branch September 6, 2020 14:16
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…ytorch#43861)

Summary:
Pull Request resolved: pytorch#43861

This is a redo of pytorch#38874, and
fixing my original bug from
pytorch#38246.

Test Plan:
CI

Imported from OSS

Reviewed By: supriyar

Differential Revision: D23418816

fbshipit-source-id: 2a3a3d67fc2d03bb0bf30a87cce4e805ac8839fb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants