[BC-breaking] Remove Variable::Impl and DifferentiableViewImpl#17072
[BC-breaking] Remove Variable::Impl and DifferentiableViewImpl#17072yf225 wants to merge 198 commits intopytorch:masterfrom
Conversation
aten/src/ATen/SparseTensorImpl.cpp
Outdated
| AT_ASSERT(!indices.is_variable() && !values.is_variable()); // They should be plain tensors! // TODO: change this to check `.requires_grad()` and `GradMode::is_enabled()` when Variable and Tensor are merged | ||
| AT_ASSERT((!(indices.is_variable() && indices.requires_grad()) && | ||
| !(values.is_variable() && values.requires_grad())) | ||
| || at::NonVariableTypeMode::is_enabled()); // TODO: use `compute_requires_grad()` after it's moved to ATen |
There was a problem hiding this comment.
I will work on moving compute_requires_grad() to ATen in the next PR.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
torch/nn/modules/module.py
Outdated
| with torch.no_grad(): | ||
| param = fn(param) | ||
| if param._grad is not None: | ||
| param._grad = fn(param._grad) |
There was a problem hiding this comment.
I need to debug why we need this change
There was a problem hiding this comment.
Update: For some test cases, param._grad is already a tensor with allow_tensor_metadata_change set to false, and changing its storage in set_data() will throw an error. To fix this problem, we should just do param._grad = fn(param._grad), and use with torch.no_grad() to avoid accumulating gradients.
There was a problem hiding this comment.
New code is more idiomatic anyway, sgtm.
test/test_autograd.py
Outdated
| def test_accumulate_grad_tensor_reference(self): | ||
| def _test_grad_tensor(params_grad_tensor, backward_grad_tensor, should_preserve_reference): | ||
| params = torch.tensor([1.5, 1.5]).requires_grad_() | ||
| with torch.no_grad(): |
There was a problem hiding this comment.
I don't think you need torch.no_grad here.
aten/src/ATen/OpaqueTensorImpl.h
Outdated
| c10::intrusive_ptr<TensorImpl> shallow_copy_and_detach( | ||
| const c10::VariableVersion& version_counter, | ||
| bool allow_tensor_metadata_change) const override { | ||
| //AT_ASSERT(false); |
| * Shallow-copies data from another TensorImpl into this TensorImpl. | ||
| */ | ||
| void shallow_copy_from(c10::intrusive_ptr<TensorImpl> impl) override { | ||
| copy_tensor_data( |
There was a problem hiding this comment.
this doesn't check allow_tensor_metadata_change, right? Is that intentional?
There was a problem hiding this comment.
I added comment at https://github.com/pytorch/pytorch/pull/17072/files#diff-7f54d357d87885923a1536316f17ac6bR869 to explain this behavior.
c10/core/TensorImpl.h
Outdated
| * | ||
| * For usage of `version_counter` and `allow_tensor_metadata_change`, see NOTE [ TensorImpl Shallow-Copying ]. | ||
| */ | ||
| virtual void copy_tensor_data( |
There was a problem hiding this comment.
a few issues with this function:
- this is just used by the public API functions
shallow_copy_and_detach,shallow_copy_from, right? Then it should be private. - This doesn't need to be virtual; actually notice you don't ever use this, so this doesn't even need to be a non-static method.
- because it's a static method and you aren't using virtual dispatch, you can use the correct static types of inputs. So you don't have to do scary static_casts in the derived types.
There was a problem hiding this comment.
err, you probably need to make it protected so the derived types can call it.
| * Shallow-copies data from another TensorImpl into this TensorImpl. | ||
| */ | ||
| virtual void shallow_copy_from(c10::intrusive_ptr<TensorImpl> impl) { | ||
| copy_tensor_data( |
There was a problem hiding this comment.
- it's scary not to have an ASSERT in here -- I know you check the only call site to give a nice error message, but still, you want to guard the unsafe places.
- if you follow the arguments above that
copy_tensor_datashould have correct static types, you should do the static cast ofimplin here, after the ASSERT.
There was a problem hiding this comment.
I added the static cast and the ASSERT in the TensorImpl derived types' shallow_copy_from().
| * | ||
| * For usage of `version_counter` and `allow_tensor_metadata_change`, see NOTE [ TensorImpl Shallow-Copying ]. | ||
| */ | ||
| static void copy_tensor_data( |
There was a problem hiding this comment.
Did it not work to make it protected?
There was a problem hiding this comment.
It is protected now, with the keyword at https://github.com/pytorch/pytorch/pull/17072/files#diff-7f54d357d87885923a1536316f17ac6bR1410.
| /*src_impl=*/this, | ||
| /*dest_impl=*/impl.get(), | ||
| /*version_counter=*/version_counter, | ||
| /*allow_tensor_metadata_change=*/allow_tensor_metadata_change); |
There was a problem hiding this comment.
don't we need to refresh numel here?
| /*src_impl=*/opaque_impl, | ||
| /*dest_impl=*/this, | ||
| /*version_counter=*/version_counter(), | ||
| /*allow_tensor_metadata_change=*/allow_tensor_metadata_change()); |
There was a problem hiding this comment.
don't we need to refresh numel here?
facebook-github-bot
left a comment
There was a problem hiding this comment.
@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: As part of the Variable/Tensor merge work: pytorch/pytorch#13638, we make the following changes in this PR: 1. Remove the `Variable::Impl` class and the `DifferentiableViewImpl` class 2. Change all `Variable.data()` call sites to either use `Variable` directly, or use `Variable.tensor_data()` 3. Remove `Variable.data()` API 3. Add `Variable.variable_data()` that matches `tensor.data` in Python API, which creates a new `Variable` that shares the same storage and tensor metadata with the original `Variable`, but with a completely new autograd history. After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its `impl_`. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't. **Note that this PR is BC-breaking in the following use cases:** **Use Case 1:** Previously, `x.data = y` works even if `x` and `y` are of different TensorImpl type (e.g. `x` is a CPU dense tensor whose impl is of type TensorImpl, while `y` is a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR, `x.data = y` doesn't work anymore if `x` and `y` are of different TensorImpl type, because the underlying implementation `variable.set_data(tensor)` no longer works if `variable` and `tensor` have different TensorImpl type. **Use Case 2:** If a tensor `x`'s `grad` is sparse, accumulating dense gradients to `x` will change the tensor that `x.grad` is pointing to. This is better illustrated with the following example: ```python params = torch.tensor([1.5, 1.5]).requires_grad_() with torch.no_grad(): # Change gradient to a sparse tensor params.grad = torch.sparse_coo_tensor(torch.tensor([[1, 1]]).long(), torch.tensor([1., 1.])) grad_saved = params.grad params.backward(torch.tensor([1.5, 1.5])) assert id(grad_saved) == id(params.grad) # This will fail after this PR ``` The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the `params.grad` tensor reference. Pull Request resolved: pytorch/pytorch#17072 Differential Revision: D14075257 Pulled By: yf225 fbshipit-source-id: 0e681df641270dea586042dd26db59f2e76b5957
|
Wow, epic work!!! |
|
This broke XLA build. cc @ailzhang |
|
Hooray! |
Summary: After #17072, we are allowed to pass Variables into ATen ops, thus there is no need to unwrap input variables in the c10 call path. Note that since Caffe2 still expects inputs to be pure Tensors, we moved the unwrapping logic to the Caffe2 wrapper. Pull Request resolved: #21620 Differential Revision: D15763560 Pulled By: yf225 fbshipit-source-id: 5375f0e51eb320f380ae599ebf98e6b259f0bff8
As part of the Variable/Tensor merge work: #13638, we make the following changes in this PR:
Variable::Implclass and theDifferentiableViewImplclassVariable.data()call sites to either useVariabledirectly, or useVariable.tensor_data()Variable.data()APIVariable.variable_data()that matchestensor.datain Python API, which creates a newVariablethat shares the same storage and tensor metadata with the originalVariable, but with a completely new autograd history.After this PR, Variable doesn't wrap a Tensor internally anymore, and both Variable and Tensor use the same TensorImpl class as its
impl_. The only difference is that Variable always has AutogradMeta in its TensorImpl, but Tensor doesn't.Note that this PR is BC-breaking in the following use cases:
Use Case 1:
Previously,
x.data = yworks even ifxandyare of different TensorImpl type (e.g.xis a CPU dense tensor whose impl is of type TensorImpl, whileyis a CPU sparse tensor whose impl is of type SparseTensorImpl). However, after this PR,x.data = ydoesn't work anymore ifxandyare of different TensorImpl type, because the underlying implementationvariable.set_data(tensor)no longer works ifvariableandtensorhave different TensorImpl type.This especially shows up in the following use case:
Use Case 2:
If a tensor
x'sgradis sparse, accumulating dense gradients toxwill change the tensor thatx.gradis pointing to. This is better illustrated with the following example:The assertion in the last line will fail after this PR, because adding dense gradients to sparse gradients will change the
params.gradtensor reference.