I ran into unexpected behaviour with copy.deepcopy applied to a Variable. The gradient buffer of the Variable is not copied.
a = torch.autograd.Variable(torch.ones(1))
a.grad = torch.autograd.Variable(torch.ones(1))
b = copy.deepcopy(a)
print(b.grad)
I think it would be a good idea to copy the gradient buffer during a deep copy. My use case is recording the gradient of a model's parameter space for optimization research. This would also be useful for debugging/development of complex models that involve atypical gradient operations.
This is handled here:
|
def __deepcopy__(self, memo): |
|
if not self.is_leaf: |
|
raise RuntimeError("Only Variables created explicitly by the user " |
|
"(graph leaves) support the deepcopy protocol at the moment") |
|
result = type(self)(self.data.clone()) |
|
result.requires_grad = self.requires_grad |
|
result.volatile = self.volatile |
|
memo[id(self)] = result |
|
return result |
A solution would be to also copy the grad attribute of the current Variable, which would involve a recursion of the deep copy since the grad attribute is also a Variable.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @albanD @gqchen @pearu @nikitaved @soulitzer @ssnl
I ran into unexpected behaviour with
copy.deepcopyapplied to aVariable. The gradient buffer of theVariableis not copied.I think it would be a good idea to copy the gradient buffer during a deep copy. My use case is recording the gradient of a model's parameter space for optimization research. This would also be useful for debugging/development of complex models that involve atypical gradient operations.
This is handled here:
pytorch/torch/autograd/variable.py
Lines 89 to 97 in 5760b03
A solution would be to also copy the
gradattribute of the currentVariable, which would involve a recursion of the deep copy since the grad attribute is also aVariable.cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @albanD @gqchen @pearu @nikitaved @soulitzer @ssnl