Create linalg.tensordot#63478
Conversation
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 77a19da (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
| python_module: linalg | ||
| variants: function | ||
|
|
||
| - func: linalg_tensordot.out(Tensor self, Tensor other, int[] dims_self, int[] dims_other, *, Tensor(a!) out) -> Tensor(a!) |
There was a problem hiding this comment.
I think it's worth investing the time and port the implementation to be a "structured kernel". See https://github.com/pytorch/rfcs/blob/rfc-0005/RFC-0005-structured-kernel-definitions.md#structured-keyword-proposal
| from torch import _VF | ||
| from torch._C import _linalg # type: ignore[attr-defined] | ||
|
|
||
| __all__ = [ |
There was a problem hiding this comment.
Removing tensordot from __all__ would decouple torch.tensordot and torch.functional.tensordot. Then you'd get it as an alias to torch._C._linalg.linalg_tensordot and it should be possible to add a separate docstring in _torch_docs.py that would be simply referring to torch.linalg.tensordot.
There was a problem hiding this comment.
I don't think this is doable: if I make torch.tensordot an alias to torch._C._linalg.linalg_tensordot it would have a different signature and behavior as torch.linalg.tensordot, because the latter has a python implementation.
| - func: linalg_multi_dot.out(Tensor[] tensors, *, Tensor(a!) out) -> Tensor(a!) | ||
| python_module: linalg | ||
|
|
||
| - func: linalg_tensordot(Tensor self, Tensor other, int[] dims_self, int[] dims_other) -> Tensor |
There was a problem hiding this comment.
The signature in Python is tensordot(a, b, dims=2, out=None) where dims (int or Tuple[List[int], List[int]] or List[List[int]] containing two lists or Tensor). Can all the logic on dims argument be ported from Python to C++? If not maybe the Python signature should be modified so that the mapping from Python to C++ is more direct and no glue code from functional.py is neccessary?
There was a problem hiding this comment.
honestly, I don't know the answer to this question. The fact that the logic was originally written in Python makes me to suspect that there was a good reason to do that but I don't really know if the original reason still holds.
E.g., maybe the dispatcher cannot handle such a complex overload?
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slowFor more information, please take a look at the CI Flow Wiki. |
|
Creating a linalg.tensordot alias is unfortunately tricky because of all the special handling we have for functionals. To add it we'd probably need to create linalg.functional.py and then also bind the function defined there in functional.py? There are some other issues with tensordot itself that may be interesting to address first:
I filed #65989 for the correctness issue (and a suggestion to fix the docs). PRs addressing these three issues would be extremely interesting if you'd prefer to start with those, @antocuni. For the docs in particular it'd be helpful to borrow a page from NumPy and show the common cases as well as their equivalent computations. The current torch docs are nearly indecipherable because they use undefined terms and the mathematical formula is (at best) very obtuse. |
Done. I just pushed, let's see if the tests pass.
I already started/tried in #64819 but it can't be done at the moment because apparently
I'm confused: according to #65989 the correctness problem is already solved, isn't it? Is there anything else which should be done?
Having better docs look always like a good idea, I'd be happy to work on that. But it doesn't sound like a blocker for this PR, can we merge it first? |
mruberry
left a comment
There was a problem hiding this comment.
This looks really good, @antocuni. I made an inline comment about preserving BC. There are a few other places that use tensordot that may need an update:
https://github.com/pytorch/pytorch/blob/master/docs/source/amp.rst
and
(we should be sure both tensordot and linalg.tensordot are handled properly)
and the last change I'd like to suggest is that while linalg.tensordot is the suggested user-facing op that internally we don't change the implementation of tensordot and actually alias linalg.tensordot to it. This might avoid some internal refactoring issues (that we don't expect to apply in the future).
|
@mruberry I applied the changes you suggested and moved the implementation of |
| class M(torch.nn.Module): | ||
| def forward(self, x, y): | ||
| output = torch.tensordot(x, y, 2) | ||
| output = torch.linalg.tensordot(x, y, 2) |
There was a problem hiding this comment.
Should we keep one redundant test for torch.tensordot()?
| b = torch.randn(4, 5, 6, 7, device=device) | ||
| c = torch.tensordot(a, b, dims=2).cpu() | ||
| # case 4: dims is an integer | ||
| a = torch.randn(2, 3, 4, 5, dtype=dtype, device=device) |
| self.assertRaises(RuntimeError, lambda: torch.lstsq(torch.randn(0, 0), torch.randn(0, 0))) | ||
| self.assertRaises(RuntimeError, lambda: torch.lstsq(torch.randn(0,), torch.randn(0, 0))) | ||
|
|
||
| @dtypes(*floating_types()) |
There was a problem hiding this comment.
We can probably just run this in tf32 still unless there's a particular reason this would like to test double, too?
There was a problem hiding this comment.
yes, there is no particular reason. I switched to testing only torch.float32 in 33345f7
|
|
||
| a = torch.tensordot(torch.tensor(0.), torch.tensor(0.), 0) | ||
| # case 8: dims=0 | ||
| a = torch.linalg.tensordot(torch.tensor(0.), torch.tensor(0.), 0) |
There was a problem hiding this comment.
These tensors need to use the device and dtype, too
|
|
||
| if out is None: | ||
| return _VF.tensordot(a, b, dims_a, dims_b) # type: ignore[attr-defined] | ||
| return torch._C._linalg.linalg_tensordot(a, b, dims_a, dims_b) # type: ignore[attr-defined] |
There was a problem hiding this comment.
I'm a little confused by this part still. This functional has a bunch of logic before calling torch._C._linalg.linalg_tensordot(), but when torch.linalg.tensordot() is called how does it implement all the above logic?
There was a problem hiding this comment.
torch.linalg.tensordot is the very same python function as torch.functional.tensordot:
>>> import torch
>>> torch.linalg.tensordot is torch.functional.tensordot
True
|
@mruberry hopefully addressed your latest comments. I also updated to the latest master and fixed conflicts (tests still running at the time of writing, hopefully they will be green) |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
PR stack (ghstack-style but done manually)
linalg.tensordotto structured kernels #64819Fixes #61649.
As the title says, this introduces
torch.linalg.tensordotand makestorch.tensordotan alias to it.However,
tensordotis a bit different than most otherlinalgoperators so it is worth some discussion on how to proceed.Differently than the other operators, the user entry-point of
tensordotis implemented in python, intorch/functional.py, which in turns callstorch._C._linalg.linalg_tensordot.What I did so far in this PR is:
at::linalg_tensordotand make it the main implementationat::tensordota C++ alias toat::linalg_tensordottorch/functional.pytorch.tensordotandtorch.linalg.tensordottorch.linalg.tensordoteverywheretorch.linalg.tensordoteverywhereI am not fully sure about point (3) though. One possible alternative is to move the implementation to e.g. a newly created
torch/linalg/functional.py, but this creates more problems because there are other pieces of code and/or tests which expect all these functions to be insidetorch/functional.py.The other problem of the current solution is that since
torch.tensordotandtorch.linalg.tensordotare the very same python-level function, they have the very same docstring and thus the same generated documentation page. Maybe it is possible to tweak sphinx to use a custom text fortorch.tensordotinstead of using the docstring, but I haven't investigate yet.I opted for the current solution because it was the easiest to implement and I wanted to get some feedback before spending time in more complex ones. Do you think it is enough or we should try to differentiate more between
torch.tensordotandtorch.linalg.tensordot?cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano @rgommers @pmeier @asmeurer @leofang @AnirudhDagar @asi1024 @emcastillo @kmaehashi