Fix more spectral norm bugs by ssnl · Pull Request #13350 · pytorch/pytorch

ssnl · 2018-10-30T22:47:04Z

Problems with SN and DP after #12671 :

in eval mode, weight_orig is not getting correct gradient SpectralNorm in eval doesn't connect grad to weight_orig #12737 .

Fix: keep v vector around as a buffer and always calculate W = W_orig / (u @ W_orig @ v) even in eval.
in training mode, the weight buffer of the parallelized module is never updated, if someone touches weight_orig and/or weight and makes them not sharing storage. So in eval the weight used is wrong.

Fix: Make weight not a buffer anymore and always calculate it as above.
Fix SpectralNorm with DataParallel #12671 changed SN to update u in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss D(real) - D(fake)) because the vectors needed to backprop the 1st forward is changed in the 2nd forward.

Fix: This PR clones u and v before using them.

To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done.

cc @t-vi @crcrpar

ssnl · 2018-10-31T18:29:23Z

@YaoshengFu This PR will fix the spectral norm bug you see.

Sign in to view

crcrpar · 2018-11-01T00:18:34Z

I really always appreciate you guys.

I have one question as to my understanding.

u and v can/should be updated in a in-palce manner while weight_orig not.
This is why weight_orig is registered as an attribute.

Is this correct?

ssnl · 2018-11-01T16:38:40Z

@crcrpar Almost! weight_orig is still a parameter, so updates to it should be done on the parallelized module (the one you pass over to DataParallel) via things like optimizers explicitly started by user. Even if it is not updated in-place, it is fine, because the next time DataParallel copies it to the devices, the new weight_orig will be get from the parallelized module, and broadcast over.

u and v, however, are a bit different because they are updated when the module is activated, and not within users' control. So it is our job to automatically update it. And to ensure that such update works with DataParallel, it needs to be done in-place.

colesbury

lgtm, but I'm not very familiar with the spectral norm code feel a bit lost.

If @t-vi has time and is familiar with this, his review may be helpful.

ssnl · 2018-11-01T20:05:20Z

@t-vi and @crcrpar It would be great if one or both of you could take a look at the SN changes and let me know if they look good :)

Sign in to view

crcrpar · 2018-11-05T02:13:39Z

Sorry for my late response.

Changes are good but I have some questions about test codes.
So could you tell me if you have the time?

ssnl · 2018-11-05T03:14:47Z

Thank you for your comments @crcrpar :)

zmurez · 2018-11-05T07:36:12Z

I am still having a problem with spectral norm complaining about in place changes when followed by batch normalization and distributed across 2 or more GPUs using data parallel. This architecture arises in Big GAN (https://arxiv.org/abs/1809.11096). Interestingly this is not an issue with group normalization, instance normalization, or no normalization. It is also not an issue on a single GPU.

t-vi · 2018-11-05T09:12:01Z

So to get this a bit sorted, how many cases do we have? with/without DP * eval/training * weight.requires_grad=True/False * ???.
I'd feel much more comfortable if I had a list of everything that is expected to work.

ssnl · 2018-11-05T13:39:35Z

@zmurez even with this patch?

ssnl · 2018-11-05T13:40:48Z

@t-vi Yeah, I think those 8 cases are all we need to consider.

zmurez · 2018-11-05T17:11:06Z

@zmurez even with this patch?

I think so. It complained regardless of normalization and number of GPU's prior to adding this patch. However, I just copied the relevant lines into my own spectral_norm.py file instead of pulling the entire branch... So it is possible I missed something... but it seems unlikely since all the other cases are fixed.

Note, I am still using the latest stable release. To get this patch to work I also grabbed a copy of the normalize function, and implemented my on chain_matmul (a single if statement in this case with 3 matrices).

Does this bug exist for you? If not I guess I will have to consider updating to the unstable version.

Thanks

ssnl · 2018-11-05T17:52:42Z

@zmurez It is possible that other changes are needed. Do you have a small repro script I can try on my build?

zmurez · 2018-11-05T19:49:46Z

@zmurez It is possible that other changes are needed. Do you have a small repro script I can try on my build?

import torch
import torch.nn as nn

#from spectral_norm import spectral_norm       # my local spectral_norm patch
from torch.nn.utils import spectral_norm       # torch implementation

dim=5
batchsize=10
net = nn.DataParallel(nn.Sequential(
            #nn.Conv2d(dim, dim,1),
            nn.BatchNorm2d(dim),
            #nn.LeakyReLU(0.2, inplace=True),
            spectral_norm(nn.Conv2d(dim, dim,1)),
        )).cuda()
noise = torch.randn(batchsize, dim, 1, 1).cuda()
net.zero_grad()
out = net(noise).sum()
out.backward()
print('no bug')

ssnl · 2018-11-05T21:46:30Z

@zmurez Thanks! This is actually a bug elsewhere. I have submitted a fix at #13594. With these two patches together, I can run your script successfully.

t-vi

Thanks for adding all the new tests! They seem to be comprehensive and I think the patch is good with them.

Sign in to view

ssnl · 2018-11-05T22:21:07Z

@t-vi @crcrpar Thank you for your reviews! I appreciate them. :)

facebook-github-bot

@ssnl is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zmurez · 2018-11-05T23:16:26Z

It seems that when spectral norm is applied to a conv module, u is randomly initialized. Then v is computed such that the invariant holds.

This initial solve is very very slow (over tens of minutes) for large networks.
a) Is this really necessary or is a random initialization ok?
b) Can this solve be replaced by a sequence of power iterations?
If the weights of the conv module are later initialized using nn.init, I think this invariant will be broken?

ssnl · 2018-11-05T23:42:53Z

@zmurez Good point. I'll think about ways to fix it.

t-vi · 2018-11-06T04:50:04Z

For the initialization I think it should be not terribly important whether to init u or v randomly, so that could likely be changed. For loading the state dict: this only happens when there you load an "old version" state dict. How about warning about the performance and (potentially) offering a switch to forward re-compute u instead of solving for v. (I think it might be save to "do the right, slow thing" by default rather than the other way round.)

crcrpar · 2018-11-06T12:23:08Z

I agree with t-vi. Random initialization of v is not harmful.

ssnl · 2018-11-06T17:38:44Z

@t-vi @crcrpar @zmurez Thank you. I have removed the solving part at initialization in the new commit :)

facebook-github-bot

@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ssnl is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at #13350 (comment) edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: #13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5

Summary: Causing a problem with spectral norm, although SN won't use that anymore after #13350 . Pull Request resolved: #13352 Differential Revision: D14209562 Pulled By: ezyang fbshipit-source-id: f5e3183e1e7050ac5a66d203de6f8cf56e775134

Summary: Problems with SN and DP after pytorch#12671 : 1. in eval mode, `weight_orig` is not getting correct gradient pytorch#12737 . Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval. 2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong. Fix: Make `weight` not a buffer anymore and always calculate it as above. 3. pytorch#12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward. Fix: This PR clones `u` and `v` before using them. To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done. cc The controller you requested could not be found. crcrpar Pull Request resolved: pytorch#13350 Differential Revision: D12931044 Pulled By: SsnL fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef

…13594) Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at pytorch#13350 (comment) edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: pytorch#13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5

Summary: Causing a problem with spectral norm, although SN won't use that anymore after pytorch#13350 . Pull Request resolved: pytorch#13352 Differential Revision: D14209562 Pulled By: ezyang fbshipit-source-id: f5e3183e1e7050ac5a66d203de6f8cf56e775134

ssnl mentioned this pull request Oct 30, 2018

Fix autograd with buffers requiring grad in DataParallel #13352

Closed

ssnl force-pushed the sn_eval_dp branch from 9c9a80c to 2586d56 Compare October 31, 2018 16:12

ssnl changed the title ~~Fix spectral norm with data parallel~~ Fix more spectral norm bugs Oct 31, 2018

crcrpar reviewed Nov 1, 2018

View reviewed changes

Comment thread torch/nn/utils/spectral_norm.py Outdated

This comment was marked as off-topic.

Sign in to view

colesbury approved these changes Nov 1, 2018

View reviewed changes

Comment thread torch/nn/modules/module.py Outdated

crcrpar reviewed Nov 5, 2018

View reviewed changes

Comment thread test/test_nn.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

crcrpar reviewed Nov 5, 2018

View reviewed changes

Comment thread test/test_nn.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

ssnl mentioned this pull request Nov 5, 2018

Give broadcast_coalesced tensors different version counters #13594

Closed

t-vi reviewed Nov 5, 2018

View reviewed changes

Comment thread torch/nn/utils/spectral_norm.py Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

facebook-github-bot reviewed Nov 5, 2018

View reviewed changes

Fix spectral_norm with DataParallel in eval not getting grad

d6f871c

ssnl added 6 commits November 6, 2018 12:37

add comments for the hooks;

24b79bb

Make hooks top level classes to make py2 pickle happy

acf47c3

Fix backward for multiple forward

024fca0

lint

4c69d90

conditionally add in __setstate__

c82a3e5

Make init faster

d27e9aa

ssnl force-pushed the sn_eval_dp branch from 63b20bd to d27e9aa Compare November 6, 2018 17:38

facebook-github-bot reviewed Nov 6, 2018

View reviewed changes

skip if rocm

63a22fb

facebook-github-bot reviewed Nov 6, 2018

View reviewed changes

facebook-github-bot closed this in 2cd912b Nov 7, 2018

ssnl deleted the sn_eval_dp branch November 7, 2018 17:21

ezyang added open source merged labels Jun 24, 2019

ehuaa mentioned this pull request Mar 9, 2023

Nn utils spectral norm new Oneflow-Inc/oneflow#9967

Draft

Conversation

ssnl commented Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Oct 31, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

crcrpar commented Nov 1, 2018

Uh oh!

ssnl commented Nov 1, 2018

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ssnl commented Nov 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

crcrpar commented Nov 5, 2018

Uh oh!

ssnl commented Nov 5, 2018

Uh oh!

zmurez commented Nov 5, 2018

Uh oh!

t-vi commented Nov 5, 2018

Uh oh!

ssnl commented Nov 5, 2018

Uh oh!

ssnl commented Nov 5, 2018

Uh oh!

zmurez commented Nov 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl commented Nov 5, 2018

Uh oh!

zmurez commented Nov 5, 2018

Uh oh!

ssnl commented Nov 5, 2018

Uh oh!

t-vi left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ssnl commented Nov 5, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

zmurez commented Nov 5, 2018

Uh oh!

ssnl commented Nov 5, 2018

Uh oh!

t-vi commented Nov 6, 2018

Uh oh!

crcrpar commented Nov 6, 2018

Uh oh!

ssnl commented Nov 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

ssnl commented Oct 30, 2018 •

edited

Loading

ssnl commented Nov 1, 2018 •

edited

Loading

zmurez commented Nov 5, 2018 •

edited

Loading

ssnl commented Nov 6, 2018 •

edited

Loading