There seems to be a memory leak when using higher-order gradients with batchnorm, closely related to previous bugs which have already been fixed. I'm using the release version 0.2.0 (though I also reproduced the bug with source revision 99141e6... from master).
I'm using the same test code from this (#2287) bug report, which is marked as fixed and merged into 0.2.0.
Here is the error (after a couple of iterations on a gtx1070):
0
1
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "batch_norm_test.py", line 42, in
loss.backward()
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_$ariables)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/auto$rad/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/nn/_functions/thnn/batchnorm_double_backwards.py", line 97, in batchnorm_double_backwards_fn
gG = ggI * first_back_grad_input(gO, 1)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 829, in mul
return self.mul(other)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 339, in mul
return Mul.apply(self, other)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/_functions/basic_ops.py", line 48, in forward
return a.mul(b)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu:66
Here is the code to reproduce the issue, copied from the ticket referenced above:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class BatchNormTest(nn.Module):
def __init__(self, c, num_classes=2):
super(BatchNormTest, self).__init__()
self.bn = nn.BatchNorm2d(c)
def forward(self, x):
out = x
out = self.bn(out)
out = F.relu(out)
return out
c = 100
net = BatchNormTest(c)
use_cuda = True
inputs = Variable(torch.rand(100,c,100,100), requires_grad=True)
if use_cuda:
net.cuda()
inputs = inputs.cuda()
T = 100
for i in range(T):
output = net(inputs)
loss1 = torch.sum(output)
grad_params = torch.autograd.grad(loss1, inputs, create_graph=True)
grad = grad_params[0]
loss = torch.sum(grad)
loss.backward()
print(i)
There seems to be a memory leak when using higher-order gradients with batchnorm, closely related to previous bugs which have already been fixed. I'm using the release version 0.2.0 (though I also reproduced the bug with source revision 99141e6... from master).
I'm using the same test code from this (#2287) bug report, which is marked as fixed and merged into 0.2.0.
Here is the error (after a couple of iterations on a gtx1070):
0
1
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "batch_norm_test.py", line 42, in
loss.backward()
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_$ariables)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/auto$rad/init.py", line 98, in backward
variables, grad_variables, retain_graph)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/nn/_functions/thnn/batchnorm_double_backwards.py", line 97, in batchnorm_double_backwards_fn
gG = ggI * first_back_grad_input(gO, 1)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 829, in mul
return self.mul(other)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/variable.py", line 339, in mul
return Mul.apply(self, other)
File "/home/andreas/anaconda2/envs/env/lib/python2.7/site-packages/torch/autograd/_functions/basic_ops.py", line 48, in forward
return a.mul(b)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCStorage.cu:66
Here is the code to reproduce the issue, copied from the ticket referenced above: