Gradient changes, though inputs, outputs, and internal state stay same.
cudnn.enabled = True
net = nn.BatchNorm1d(3).cuda()
net.eval()
idat = Variable(torch.rand(4,3).cuda(), requires_grad=True)
grad = torch.rand(4,3).cuda()
# 1st pass
res = net(idat)
res.backward(grad)
grad0 = idat.grad.data.cpu()
# 2nd pass
idat.grad.data.zero_()
res = net(idat)
res.backward(grad)
grad1 = idat.grad.data.cpu()
print (grad1 == grad0).all()
Even when I run forward in evaluation mode and backward in training I still get the gradient changed.
It works fine with cudnn disabled.