Out of memory with higher-order gradients involving batchnorm2d

There seems to be a memory leak when using higher-order gradients with batchnorm, closely related to previous bugs which have already been fixed.  I'm using the release version 0.2.0 (though I also reproduced the bug with source revision 99141e62... from master).  

I'm using the same test code from this (https://github.com/pytorch/pytorch/issues/2287) bug report, which is marked as fixed and merged into 0.2.0.  

Here is the error (after a couple of iterations on a gtx1070):

0
1
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "batch_norm_test.py", line 42, in <module>
    loss.backward()
 File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 156, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_$ariables)
  File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/auto$rad/__init__.py", line 98, in backward
    variables, grad_variables, retain_graph)
  File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/nn/_functions/thnn/batchnorm_double_backwards.py", line 97, in batchnorm_double_backwards_fn
    gG = ggI * first_back_grad_input(gO, 1)
  File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 829, in __mul__
    return self.mul(other)
  File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 339, in mul
    return Mul.apply(self, other)
  File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/_functions/basic_ops.py", line 48, in forward
    return a.mul(b)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu:66

Here is the code to reproduce the issue, copied from the ticket referenced above:

```
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class BatchNormTest(nn.Module):
    def __init__(self, c, num_classes=2):
        super(BatchNormTest, self).__init__()
        self.bn = nn.BatchNorm2d(c)

    def forward(self, x):
        out = x
        out = self.bn(out)
        out = F.relu(out)
        return out

c = 100
net = BatchNormTest(c)
use_cuda = True
inputs = Variable(torch.rand(100,c,100,100), requires_grad=True)
if use_cuda:
    net.cuda()
    inputs = inputs.cuda()

T = 100
for i in range(T):
    output = net(inputs)
    loss1 = torch.sum(output)
    grad_params = torch.autograd.grad(loss1, inputs, create_graph=True)

    grad = grad_params[0]
    loss = torch.sum(grad)

    loss.backward()
    print(i)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory with higher-order gradients involving batchnorm2d #3983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Out of memory with higher-order gradients involving batchnorm2d #3983

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions