Parametrizations depending on several inputs#58488
Parametrizations depending on several inputs#58488lezcano wants to merge 12 commits intopytorch:masterfrom
Conversation
💊 CI failures summary and remediationsAs of commit ccc0a87 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
Codecov Report
@@ Coverage Diff @@
## master #58488 +/- ##
=======================================
Coverage 76.46% 76.46%
=======================================
Files 1992 1992
Lines 199937 199980 +43
=======================================
+ Hits 152879 152913 +34
- Misses 47058 47067 +9 |
albanD
left a comment
There was a problem hiding this comment.
From a quick look, it looks quite good!
I think you can simplify a lot of code by making the params always a sequence as mentioned below.
Also I'm not sure to understand what prevents you from chaining two parametrizations that do have the following forward:
# param 1
def forward(self, p1, p2):
return p1, p2
# param 2
def forward(self, p1, p2):
return p1 + p2
There was a problem hiding this comment.
Do you have a code sample that reproduces this outside of reparametrization?
Is it because you have t.weight that is contiguous and has a t.weight.grad that is also contiguous.
But then you do t.weight.set_(other) and that one is not while t.weight.grad is still contiguous?
There was a problem hiding this comment.
A common trick we use here is to get value to always be a sequence by wrapping single Tensors into a tuple of size 1.
- That simplifies the logic here as you don't need to special case for single Tensor
- You can always handle
original0instead oforiginalwhen there is a single Tensor - Calling into the forward can be done with common code as well.
- If you really need to unpack it into a single Tensor (for the use of set_ for example), you can always save a boolean along with the sequence
There was a problem hiding this comment.
I tried doing that, but the treatment of these two cases is actually different enough to not be worth it---it clutters too much the implementation.
This comes from the fact that we reuse the tensor to avoid changing the id of the parameter and so on. Note, for example, that the exceptions raised in the if-else below are fundamentally different.
|
From offline: it's not possible to register parametrizations with several outputs, as each of these outputs would need to be a property on its own. |
806919e to
e8e586f
Compare
|
@albanD this is ready for another round of reviews. Now the errors are all detected and thrown when registering a new parametrization. This is very good, as we test that the forward and the There used to be this very annoying problem before that, if you had made a mistake when writing your parametrization and you accessed This early testing should get rid of this problem and, in general, should error early, which is nice. |
albanD
left a comment
There was a problem hiding this comment.
You commited some submodule updates by mistake I think.
Looks good to me. Mainly some small naming updates.
There was a problem hiding this comment.
nit: why not name them original and new in your code sample?
There was a problem hiding this comment.
It was more difficult to track all the names, and I was not able to come up with meaningful names. X, Y, Z made for more concise lines
There was a problem hiding this comment.
I don't think we want to enforce the shape equality for X and Y. In particular, set_ will work just fine with another Tensor of a different size.
There was a problem hiding this comment.
It is not necessary, but I thought that, in the same way that we disallow to change the dtype for users not to encounter weird behaviours, we should also disallow changing the shape. That being said, I can remove this constraint.
There was a problem hiding this comment.
On a second thought, let's not be too conservative. I'm removing this constraint, as I could imagine it being useful.
There was a problem hiding this comment.
I think that you would prefer to match exactly the two properties: requires_grad and Parameter or not based on the original.
Users can set requires_grad=False on Parameters, or create attributes that require gradients that are not Parameters.
There was a problem hiding this comment.
I'm not sure we can make that assumption here.
You most likely want to double check things here, especially below for the number of Tensors it returns when a Sequence is involved.
There was a problem hiding this comment.
There was a problem hiding this comment.
for the one above, I understand that you don't want to use original/new in the sample to keep names short.
But I'm not sure why X and Z are switched here. It would be better to have consistent naming throughout here.
|
This is ready for another review @albanD The invariants preserved and tested for are the following:
I believe that this is a minimal way of describe the set of invariants that we need. I hope that I have not missed any of them in the testing nor in the exceptions in the code. It was quite tricky to convince myself that everything was correct... |
Better testing Better docs
d0b0b26 to
ba56b21
Compare
| raise ValueError("'right_inverse' must return a Tensor or a Sequence of tensors (list, tuple...). " | ||
| f"Got {type(new).__name__}") | ||
|
|
||
| # If it is a sequence of one tensor, we unpack it |
There was a problem hiding this comment.
Do we really want to do that? Also this is not documented ?
There was a problem hiding this comment.
The idea was that in the 1 input case we can reuse the original tensor, which is good. Now, this is such a weird case... What do you think we should do?
There was a problem hiding this comment.
If people return (foo,) instead of foo, I'm sure they have a good reason to do it. So I would lean towards removing this.
Also the way it works right now, you will pass "foo" to their forward and not "(foo,)" which is not consistent.
| """ | ||
| # All the exceptions in this function should almost never throw. | ||
| # They could throw if, for example, right_inverse function does not return the same dtype | ||
| # for every input, which should most likely be caused by a bug in the code |
There was a problem hiding this comment.
nit: maybe this could be clearer if we say that "right_inverse function returns a different dtype when given a different input"
Use collections.abc Change the handling of a sequence of one tensor Reword some bits
|
@albanD corrected! |
| def forward(self) -> Tensor: | ||
| # Unpack the originals for the first parametrization | ||
| if self.ntensors == 1: | ||
| if self.is_tensor == 1: |
There was a problem hiding this comment.
These are bool checks no? You don't want the == 1 ?
There was a problem hiding this comment.
Oooopsss. Sorry for that and thanks for the catch!
|
@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| weight_data = weight.data.clone() | ||
| with torch.no_grad(): | ||
| weight.set_(weight_data) | ||
| weight.copy_(weight_data) |
There was a problem hiding this comment.
this is the only place I can think of that relates to the HUD CI failure
|
This seems to be breaking test_cudnn_weight_format in master: https://app.circleci.com/pipelines/github/pytorch/pytorch/336608/workflows/717d1325-5ae0-4b73-abf8-bced763e9dbb/jobs/14137264 |
|
This pull request has been reverted by 5c1d17e. |
Summary: Resubmit of #58488 There was a line that had been changed in `test_nn.py` as caught in #58488 (comment) I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: #60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3
Summary: Resubmit of pytorch#58488 There was a line that had been changed in `test_nn.py` as caught in pytorch#58488 (comment) I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: pytorch#60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3
Summary: Resubmit of #58488 There was a line that had been changed in `test_nn.py` as caught in #58488 (comment) I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: #60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3
Summary: Makes possible that the first register parametrization depends on a number of parameters rather than just one. Examples of these types of parametrizations are `torch.nn.utils.weight_norm` and low rank parametrizations via the multiplication of a `n x k` tensor by a `k x m` tensor with `k <= m, n`. Follows the plan outlined in pytorch#33344 (comment). A short summary of the idea is: we call `right_inverse` when registering a parametrization to generate the tensors that we are going to save. If `right_inverse` returns a sequence of tensors, then we save them as `original0`, `original1`... If it returns a `Tensor` or a sequence of length 1, we save it as `original`. We only allow to have many-to-one parametrizations in the first parametrization registered. The next parametrizations would need to be one-to-one. There were a number of choices in the implementation: If the `right_inverse` returns a sequence of parameters, then we unpack it in the forward. This is to allow to write code as: ```python class Sum(nn.Module): def forward(self, X, Y): return X + Y def right_inverse(Z): return Z, torch.zeros_like(Z) ``` rather than having to unpack manually a list or a tuple within the `forward` function. At the moment the errors are a bit all over the place. This is to avoid having to check some properties of `forward` and `right_inverse` when they are registered. I left this like this for now, but I believe it'd be better to call these functions when they are registered to make sure the invariants hold and throw errors as soon as possible. The invariants are the following: 1. The following code should be well-formed ```python X = module.weight Y = param.right_inverse(X) assert isinstance(Y, Tensor) or isinstance(Y, collections.Sequence) Z = param(Y) if isisntance(Y, Tensor) else param(*Y) ``` in other words, if `Y` is a `Sequence` of `Tensor`s (we check also that the elements of the sequence are Tensors), then it is of the same length as the number parameters `param.forward` accepts. 2. Always: `X.dtype == Z.dtype and X.shape == Z.shape`. This is to protect the user from shooting themselves in the foot, as it's too odd for a parametrization to change the metadata of a tensor. 3. If it's one-to-one: `X.dtype == Y.dtype`. This is to be able to do `X.set_(Y)` so that if a user first instantiates the optimiser and then puts the parametrisation, then we reuse `X` and the user does not need to add a new parameter to the optimiser. Alas, this is not possible when the parametrisation is many-to-one. The current implementation of `spectral_norm` and `weight_norm` does not seem to care about this, so this would not be a regression. I left a warning in the documentation though, as this case is a bit tricky. I'm still missing to go over the formatting of the documentation, I'll do that tomorrow. Pull Request resolved: pytorch#58488 Reviewed By: soulitzer Differential Revision: D29100708 Pulled By: albanD fbshipit-source-id: b9e91f439cf6b5b54d5fa210ec97c889efb9da38
Summary: Resubmit of pytorch#58488 There was a line that had been changed in `test_nn.py` as caught in pytorch#58488 (comment) I reverted that line, which should never have been changed. I reckon that should solve the issue. Pull Request resolved: pytorch#60530 Reviewed By: ngimel Differential Revision: D29329865 Pulled By: albanD fbshipit-source-id: 8dfd0cd968fe26a3924dae7ca366af2c8a8639b3
Makes possible that the first register parametrization depends on a number of parameters rather than just one. Examples of these types of parametrizations are
torch.nn.utils.weight_normand low rank parametrizations via the multiplication of an x ktensor by ak x mtensor withk <= m, n.Follows the plan outlined in #33344 (comment). A short summary of the idea is: we call
right_inversewhen registering a parametrization to generate the tensors that we are going to save. Ifright_inversereturns a sequence of tensors, then we save them asoriginal0,original1... If it returns aTensoror a sequence of length 1, we save it asoriginal.We only allow to have many-to-one parametrizations in the first parametrization registered. The next parametrizations would need to be one-to-one.
There were a number of choices in the implementation:
If the
right_inversereturns a sequence of parameters, then we unpack it in the forward. This is to allow to write code as:rather than having to unpack manually a list or a tuple within the
forwardfunction.At the moment the errors are a bit all over the place. This is to avoid having to check some properties of
forwardandright_inversewhen they are registered. I left this like this for now, but I believe it'd be better to call these functions when they are registered to make sure the invariants hold and throw errors as soon as possible.The invariants are the following:
in other words, if
Yis aSequenceofTensors (we check also that the elements of the sequence are Tensors), then it is of the same length as the number parametersparam.forwardaccepts.X.dtype == Z.dtype and X.shape == Z.shape. This is to protect the user from shooting themselves in the foot, as it's too odd for a parametrization to change the metadata of a tensor.X.dtype == Y.dtype. This is to be able to doX.set_(Y)so that if a user first instantiates the optimiser and then puts the parametrisation, then we reuseXand the user does not need to add a new parameter to the optimiser. Alas, this is not possible when the parametrisation is many-to-one. The current implementation ofspectral_normandweight_normdoes not seem to care about this, so this would not be a regression. I left a warning in the documentation though, as this case is a bit tricky.I'm still missing to go over the formatting of the documentation, I'll do that tomorrow.