Fix SpectralNorm with DataParallel by ssnl · Pull Request #12671 · pytorch/pytorch

ssnl · 2018-10-15T19:56:54Z

There were two problems with SN + DP:

In SN, the updated _u vector is saved back to module via a setattr. However, in DP, everything is run on a replica, so those updates are lost.
In DP, the buffers are broadcast via a broadcast_coalesced, so on replicas they are all views. Therefore, the detach_ call won't work.

Fixes are:

Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained
Do not call detach_.
Added comments in SN about the subtlety.
Added a note to the DP doc on this particular behavior of DP.

cc @crcrpar @taesung89 @t-vi @YaoshengFu

Fixes #11476

ssnl · 2018-10-15T21:50:40Z

Autograd in eval mode still has problems, but I decide to fix that in a later PR due to BC complications.

ssnl · 2018-10-16T07:38:12Z

@t-vi or @crcrpar Wanna review this one? :)

crcrpar · 2018-10-16T07:51:12Z

@ssnl I read thru this and seems solid! 👍

t-vi · 2018-10-16T08:37:10Z

Looks all reasonable to me, but I lack the Distributed expertise for that to mean much.
One bit I'm not sure about is the following: Do we / should we test that the weight_orig/weight are do get nonzero gradients on backward if they require grad?

Out of curiosity: What is the BC breaking when you recompute the weight in eval mode instead of detaching?

ssnl · 2018-10-16T17:50:02Z

@crcrpar @t-vi Thanks for looking!

@t-vi The problem with recomputing eval mode is that we only store weight_orig, and weight_u. But we need weight_v to also recompute weight graph. Note that weight_v is not uniquely determined even though we also store weight. And because our power iteration updates v vector first and u vector second, it can't be computed from u either. Therefore, I want to also register weight_v as a buffer. Additionally, I want to not register weight as a buffer, but as a plain attribute. Registering as buffer makes the state dict redundant, and it is not really a buffer. To make all these changes, I need to figure out a way to add hook into load_state_dict when people register spectral_norm. I will add the test you mentioned when that patch happens, because weight_orig still doesn't have gradient in eval as of this patch.

facebook-github-bot

SsnL is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

YaoshengFu · 2018-10-18T20:19:55Z

I updated my code with the latest spectral_norm implementation (I just replaced spectral_norm.py and function.py), but I got the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

This error disappeared if I switch back to the old spectral_norm implementation. @ssnl

ssnl · 2018-10-19T02:59:52Z

@YaoshengFu I can't reproduce the error you see. Could you install the nightly and check if the error still happens?

YaoshengFu · 2018-10-19T22:25:11Z

I have re-installed the latest version of pytorch from source and it still has the same error. I have tried it on different projects and I didn't see difference. For example, you can try to run code from this repo: https://github.com/rosinality/sagan-pytorch

Just replace the spectral_norm implementation in model.py or model_resnet.py (which seems to be replicated from the previous official implementation as well) with the official one and run it, you should be able to see the same error (at least I did).

ssnl · 2018-10-20T22:49:17Z

@YaoshengFu Thanks. I will do!

Summary: Problems with SN and DP after #12671 : 1. in eval mode, `weight_orig` is not getting correct gradient #12737 . Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval. 2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong. Fix: Make `weight` not a buffer anymore and always calculate it as above. 3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward. Fix: This PR clones `u` and `v` before using them. To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done. cc The controller you requested could not be found. crcrpar Pull Request resolved: #13350 Differential Revision: D12931044 Pulled By: SsnL fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fe8d914 Pull Request resolved: #37032

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 48d40f4 Pull Request resolved: #37032

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing ``` python test/test_quantization.py TestFakeQuantizePerTensor.test_fake_quant_control ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: cc8aa30 Pull Request resolved: #37032

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. TODO: #32684 needs to land before we can fix the graph mode test failures on this PR. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21206454](https://our.internmc.facebook.com/intern/diff/D21206454) [ghstack-poisoned]

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates all observers and fake_quant modules to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. TODO: #32684 and #37185 needs to land before we can fix the graph mode test failures on this PR. Test Plan: Script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 Added tests to relevant quant modules to prevent regressions Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21206454](https://our.internmc.facebook.com/intern/diff/D21206454) [ghstack-poisoned]

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing ``` python test/test_quantization.py TestFakeQuantizePerTensor.test_fake_quant_control ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2430f5d Pull Request resolved: #37032

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates all observers and fake_quant modules to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Depends on #32684 and #37185. Test Plan: Script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 Added integration and unit tests to cover: ``` python test/test_quantization.py TestDistributed ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21206454](https://our.internmc.facebook.com/intern/diff/D21206454) [ghstack-poisoned]

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing ``` python test/test_quantization.py TestFakeQuantizePerTensor.test_fake_quant_control ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 075a725 Pull Request resolved: #37032

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates all observers and fake_quant modules to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Depends on #32684 and #37185. Test Plan: Script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 Added integration and unit tests to cover: ``` python test/test_quantization.py TestDistributed ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D21206454](https://our.internmc.facebook.com/intern/diff/D21206454) [ghstack-poisoned]

Summary: DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing ``` python test/test_quantization.py TestFakeQuantizePerTensor.test_fake_quant_control ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2e3f88e Pull Request resolved: #37032

Summary: Pull Request resolved: #37032 DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see #12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. TODO: #32684 needs to land before we can fix the graph mode test failures on this PR. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Imported from OSS Differential Revision: D21206454 fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7

Summary: There were two problems with SN + DP: 1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost. 2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work. Fixes are: 1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained 2. Do not call `detach_`. 3. Added comments in SN about the subtlety. 4. Added a note to the DP doc on this particular behavior of DP. cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu Fixes pytorch#11476 Pull Request resolved: pytorch#12671 Differential Revision: D10410232 Pulled By: SsnL fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9

Summary: Problems with SN and DP after pytorch#12671 : 1. in eval mode, `weight_orig` is not getting correct gradient pytorch#12737 . Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval. 2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong. Fix: Make `weight` not a buffer anymore and always calculate it as above. 3. pytorch#12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward. Fix: This PR clones `u` and `v` before using them. To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done. cc The controller you requested could not be found. crcrpar Pull Request resolved: pytorch#13350 Differential Revision: D12931044 Pulled By: SsnL fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef

Summary: Pull Request resolved: pytorch#37032 DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see pytorch#12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. TODO: pytorch#32684 needs to land before we can fix the graph mode test failures on this PR. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Imported from OSS Differential Revision: D21206454 fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7

ssnl added 3 commits October 15, 2018 15:53

Fix SpectralNorm on DataParallel

ea913b3

lint

fb2a684

compute_weight -> compute_weight_and_update_u

44c09e3

Fix failed tests

530f09c

typo

c4459d7

colesbury approved these changes Oct 16, 2018

View reviewed changes

facebook-github-bot reviewed Oct 16, 2018

View reviewed changes

facebook-github-bot closed this in ac994f2 Oct 16, 2018

ssnl deleted the dp_dup_module branch October 19, 2018 00:07

ssnl mentioned this pull request Oct 30, 2018

Fix more spectral norm bugs #13350

Closed

ezyang added open source merged labels Jun 24, 2019

vkuzo mentioned this pull request Apr 21, 2020

Make quantization modules work with DataParallel #37032

Closed

Lotayou mentioned this pull request Oct 10, 2020

Image artifacts Lotayou/Face-Renovation#12

Closed

Lotayou mentioned this pull request Dec 22, 2020

why set model mode as train() instead of eval() in test? Lotayou/Face-Renovation#31

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SpectralNorm with DataParallel#12671

Fix SpectralNorm with DataParallel#12671
ssnl wants to merge 5 commits intopytorch:masterfrom
ssnl:dp_dup_module

ssnl commented Oct 15, 2018 •

edited

Loading

Uh oh!

ssnl commented Oct 15, 2018 •

edited

Loading

Uh oh!

ssnl commented Oct 16, 2018

Uh oh!

crcrpar commented Oct 16, 2018

Uh oh!

t-vi commented Oct 16, 2018

Uh oh!

ssnl commented Oct 16, 2018 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

YaoshengFu commented Oct 18, 2018 •

edited

Loading

Uh oh!

ssnl commented Oct 19, 2018

Uh oh!

YaoshengFu commented Oct 19, 2018

Uh oh!

ssnl commented Oct 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

ssnl commented Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Oct 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Oct 16, 2018

Uh oh!

crcrpar commented Oct 16, 2018

Uh oh!

t-vi commented Oct 16, 2018

Uh oh!

ssnl commented Oct 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

YaoshengFu commented Oct 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Oct 19, 2018

Uh oh!

YaoshengFu commented Oct 19, 2018

Uh oh!

ssnl commented Oct 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ssnl commented Oct 15, 2018 •

edited

Loading

ssnl commented Oct 15, 2018 •

edited

Loading

ssnl commented Oct 16, 2018 •

edited

Loading

YaoshengFu commented Oct 18, 2018 •

edited

Loading