Some PyTorch operations, like max and svd, can return a (named)tuple of tensors. There are two issues with their behavior:
- they do not always return a tuple
- when they do return a tuple, they can return the extra tensors in strange ways
For example:
# max over an entire tensor returns the maximum value in the tensor
t = torch.rand((2, 2))
: tensor([
[0.7156, 0.0664],
[0.6009, 0.6874]])
torch.max(t)
: tensor(0.7156)
# max with additional arguments returns a tuple
torch.max(t, dim=1)
: torch.return_types.max(values=tensor([0.7156, 0.6874]), indices=tensor([0, 1]))
# By default, svd returns a tuple of three tensors U, S, and V
t = torch.randn(5, 3)
u, s, v = torch.svd(t)
# When compute_uv=False the u and v tensors are filled with zeros
t = torch.randn(5, 3)
u, s, v = torch.svd(t, compute_uv=False)
: (tensor([
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
tensor([3.2669, 1.8042, 0.9421]),
tensor([
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]))
I propose we improve the consistency of PyTorch's UX by doing the following:
- Update these functions to consistently return a tuple (in general a function should always return the same type, regardless of its arguments)
- Always return the same number of elements with uncomputed elements represented as None in Python (this leaves how they're represented in C++ as a follow-up detail)
In the above snippets, this would change the return values of max to (values, None) and (values, indices), and the return values of svd would become (u, s, v) and (None, s, None). (In practice the max example is more nuanced since we would likely deprecate max in favor of amax for the cases where it's producing a single tensor.)
This approach has some nice properties. In addition to making the return type of these functions consistent, it also means that unpacking the return value always works, and that the return values can be accessed consistently using a name. In the max example above, in particular, adding a dim argument to max requires rewriting the left-hand side of that expression to make the rest of the program work as before. There's no single left-hand side that can survive changes to how max is called, either.
I expect this will also make shape propagation logic easier.
This proposal is, unfortunately, divergent with NumPy, which also has functions that return different types based on their arguments. I think this divergence is OK, however. We put PyTorch's UX principles before NumPy compatibility, and if developers struggle with this behavior we can consider providing a more stringently NumPy-compatible namespace for them. Another example of where PyTorch's UX principles come before NumPy compatibility is that PyTorch's functions always perform the same operation as the method of the same name. In NumPy, however, a function and its corresponding method can implement divergent behavior.
@ngimel collaborated on this proposal.
cc @mruberry @rgommers @heitorschueroff @ezyang (shape prop), @suo (jit concerns), @rgommers (NumPy perspective)
Some PyTorch operations, like max and svd, can return a (named)tuple of tensors. There are two issues with their behavior:
For example:
I propose we improve the consistency of PyTorch's UX by doing the following:
In the above snippets, this would change the return values of max to (values, None) and (values, indices), and the return values of svd would become (u, s, v) and (None, s, None). (In practice the max example is more nuanced since we would likely deprecate max in favor of amax for the cases where it's producing a single tensor.)
This approach has some nice properties. In addition to making the return type of these functions consistent, it also means that unpacking the return value always works, and that the return values can be accessed consistently using a name. In the max example above, in particular, adding a
dimargument to max requires rewriting the left-hand side of that expression to make the rest of the program work as before. There's no single left-hand side that can survive changes to how max is called, either.I expect this will also make shape propagation logic easier.
This proposal is, unfortunately, divergent with NumPy, which also has functions that return different types based on their arguments. I think this divergence is OK, however. We put PyTorch's UX principles before NumPy compatibility, and if developers struggle with this behavior we can consider providing a more stringently NumPy-compatible namespace for them. Another example of where PyTorch's UX principles come before NumPy compatibility is that PyTorch's functions always perform the same operation as the method of the same name. In NumPy, however, a function and its corresponding method can implement divergent behavior.
@ngimel collaborated on this proposal.
cc @mruberry @rgommers @heitorschueroff @ezyang (shape prop), @suo (jit concerns), @rgommers (NumPy perspective)