Revert "Move batch_norm to ATen/native, speed up (#12368)"#13191
Revert "Move batch_norm to ATen/native, speed up (#12368)"#13191houseroad wants to merge 1 commit intopytorch:masterfrom
Conversation
This reverts commit dc211c7.
facebook-github-bot
left a comment
There was a problem hiding this comment.
houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Please revert to unbreak master. Thank you!
…On Fri, Oct 26, 2018 at 18:42 bddppq ***@***.***> wrote:
@bddppq <https://github.com/bddppq> requested your review on: #13191
<#13191> Revert "Move batch_norm
to ATen/native, speed up (#12368
<#12368>)".
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#13191 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AFaWZW9UqjZ89jiMICly_OcsknJFe_c6ks5uo4_vgaJpZM4X9Dj4>
.
|
|
I could really use a hint which test is failing. |
|
@t-vi here is the log of failing test: https://circleci.com/gh/pytorch/pytorch/129750?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link |
Summary: Revert #12368 since it's causing onnx related test cases failing. pytorch/pytorch#12368 SsnL The controller you requested could not be found. Pull Request resolved: pytorch/pytorch#13191 Reviewed By: BIT-silence Differential Revision: D12810778 Pulled By: houseroad fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577
|
So I merge master into the my batch norm branch, built and installed onnx from third_party/onnx and ran and couldn't get the failure locally. :( |
|
Since I already revert your changes in the master, to reproduce the problem, I would suggest to do the following: then run This test became flaky, but locally, I can always reproduce the problem. |
|
@t-vi, please let me know if you still cannot reproduce the problem, I will dig into it on my side. :-) |
|
@t-vi also forgot to mentioning, thanks a lot for the contribution, it's very useful. we just temporarily revert it to unbreak the CI test, I will help you push it back. |
|
I do appreciate your help with reproducing it and once I get to that, I hope we'll be able to fix it soon, too. |
|
(Silly question, but) so it's the batchnorm on CPU that is problematic? I try to test that more systematically. Unfortunately, I cannot seem to reproduce the error. What I did was roughly (also I installed future (past) and torchvision). I ran the test |
|
No worries, I have uploaded the docker to my repo: https://hub.docker.com/r/houseroad/onnx-docker/tags/ after launching docker, try |
|
Thank you for this! I can reproduce the error. Whyever it seemed a good idea at the time, not using sum inside parallel_for seems to fix the problem: If it does for you, too, I'd have another go at the PR. |
|
@t-vi Awesome, it works for me on the local machine. |
|
cc: @colesbury that seems like a bug with sum_parallel more than anything else? |
…torch#13191) Summary: Revert pytorch#12368 since it's causing onnx related test cases failing. pytorch#12368 SsnL The controller you requested could not be found. Pull Request resolved: pytorch#13191 Reviewed By: BIT-silence Differential Revision: D12810778 Pulled By: houseroad fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577
Revert #12368 since it's causing onnx related test cases failing.
#12368
@ssnl @t-vi