Skip to content

Out of memory with higher-order gradients involving batchnorm2d #3983

@andreasrobinson

Description

@andreasrobinson

There seems to be a memory leak when using higher-order gradients with batchnorm, closely related to previous bugs which have already been fixed. I'm using the release version 0.2.0 (though I also reproduced the bug with source revision 99141e6... from master).

I'm using the same test code from this (#2287) bug report, which is marked as fixed and merged into 0.2.0.

Here is the error (after a couple of iterations on a gtx1070):

0
1
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "batch_norm_test.py", line 42, in
loss.backward()
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_$ariables)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/auto$rad/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/nn/_functions/thnn/batchnorm_double_backwards.py", line 97, in batchnorm_double_backwards_fn
gG = ggI * first_back_grad_input(gO, 1)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 829, in mul
return self.mul(other)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 339, in mul
return Mul.apply(self, other)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/_functions/basic_ops.py", line 48, in forward
return a.mul(b)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu:66

Here is the code to reproduce the issue, copied from the ticket referenced above:

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class BatchNormTest(nn.Module):
    def __init__(self, c, num_classes=2):
        super(BatchNormTest, self).__init__()
        self.bn = nn.BatchNorm2d(c)

    def forward(self, x):
        out = x
        out = self.bn(out)
        out = F.relu(out)
        return out

c = 100
net = BatchNormTest(c)
use_cuda = True
inputs = Variable(torch.rand(100,c,100,100), requires_grad=True)
if use_cuda:
    net.cuda()
    inputs = inputs.cuda()

T = 100
for i in range(T):
    output = net(inputs)
    loss1 = torch.sum(output)
    grad_params = torch.autograd.grad(loss1, inputs, create_graph=True)

    grad = grad_params[0]
    loss = torch.sum(grad)

    loss.backward()
    print(i)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions