Ambiguous docs+confusing behavior of softmax for 3+ dimensions

The docs for `nn.Softmax`, which are essentially copied over from Torch7's `nn.Softmax`, say "Applies the SoftMax function to an n-dimensional input Tensor, rescaling them so that the elements of the n-dimensional output Tensor lie in the range (0, 1) and sum to 1." This is ambiguous with respect to what exactly should sum to 1 (what dimension/s the softmax should be performed over). What in fact happens, also a holdover from Torch7, is:
- For a 1D input, the softmax takes place over dimension 0 (this was needed in Torch7 to support inputs without a batch dimension, which are not allowed in PyTorch). This is still reasonable default behavior, because there's no reason to use `nn.Softmax` when you have a batch of scalars.
- For a 2D input, the softmax takes place over dimension 1. This is the most common case, and the desired behavior.
- For a 3D input, the softmax takes place over dimension 0. This is usually wrong in PyTorch, since dimension 0 is usually the batch dimension. It may, however, be the right thing to do if dimension 0 is the timestep dimension of a timestep x batch x feature tensor; this is the desired behavior in RNNs with attention.
- For a 4D input, the softmax again takes place over dimension 1.
- For a 5D+ input, THNN gives a RuntimeError.

If all of this is intentional, then we should document it. The ideal solution is probably to keep this behavior and document it but add an optional `dim` argument, as in TensorFlow, that lets the user pick a dimension to softmax over. For reference, the default behavior in TF is to softmax over the last dimension, while the default behavior in Chainer (which can also IIRC be overridden by the user as of v2) is to softmax over dimension 1. It is occasionally also useful to be able to softmax over multiple dimensions at once, but this is easy to emulate with `.view()` so it's not necessary to have in core.
It may also be useful to add a `torch.softmax` with the same behavior as `F.softmax`, to apply the softmax operation to Tensors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ambiguous docs+confusing behavior of softmax for 3+ dimensions #1020

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ambiguous docs+confusing behavior of softmax for 3+ dimensions #1020

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions