Conversation
Previous PR which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path. Which produces wrong dgrad in those cases.
|
I have a repro code and verified the fix on my local machine. |
💊 Build failures summary and remediationsAs of commit 8a0c630 (more details on the Dr. CI page):
XLA failureJob pytorch_xla_linux_xenial_py3_6_clang7_test is failing. Please create an issue with title prefixed by This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 12 times. |
|
cc'ing @ptrblck |
|
Test added, should be good for review now. @ngimel |
ngimel
left a comment
There was a problem hiding this comment.
Looks good, I have minor comments.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Previous PR pytorch#22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad. Pull Request resolved: pytorch#36382 Differential Revision: D20984446 Pulled By: ngimel fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd
Summary: Previous PR #22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad. Pull Request resolved: #36382 Differential Revision: D20984446 Pulled By: ngimel fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd
Summary: The sync_bn batch norm has been using a non-efficient implementation NaiveSyncBatchNorm. This is because in pytorch<=1.5, nn.SyncBatchNorm had incorrect gradient when the batch size on each worker is different, so a NaiveSyncBatchNorm was implemented. This issue has been fixed in pytorch/pytorch#36382. So we change that implementation back to the faster nn.SyncBatchNorm. Since nn.SyncBatchNorm only has GPU implementation, for the unit tests running with single process on CPU, the original NaiveSyncBatchNorm is used. Reviewed By: wat3rBro Differential Revision: D35300913 fbshipit-source-id: c649f4dfc1ab5b2a0dd91d0187b072ae2de229ff
Summary: Pull Request resolved: #66 The sync_bn batch norm has been using a non-efficient implementation NaiveSyncBatchNorm. This is because in pytorch<=1.5, nn.SyncBatchNorm had incorrect gradient when the batch size on each worker is different, so a NaiveSyncBatchNorm was implemented. This issue has been fixed in pytorch/pytorch#36382. So we change that implementation back to the faster nn.SyncBatchNorm. Since nn.SyncBatchNorm only has GPU implementation, for the unit tests running with single process on CPU, the original NaiveSyncBatchNorm is used. Reviewed By: wat3rBro Differential Revision: D35300913 fbshipit-source-id: ec8107b78bd2a4be5e9452e6aeac6b89f0f04930
Previous PR #22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad.