Skip to content

Sparse softmax support (CPU)#36305

Closed
pearu wants to merge 21 commits intopytorch:masterfrom
Quansight:pearu/23651-softmax-sparse
Closed

Sparse softmax support (CPU)#36305
pearu wants to merge 21 commits intopytorch:masterfrom
Quansight:pearu/23651-softmax-sparse

Conversation

@pearu
Copy link
Copy Markdown
Collaborator

@pearu pearu commented Apr 9, 2020

This PR implements softmax support for sparse tensors.

The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be -inf that will have the effect of "zero entries ignored". This relation is used for testing the correctness of results here.

Resolves #23651 for CPU.

  • sparse softmax
    • CPU C++ implementation
    • unittests
    • update softmax documentation
    • autograd support
  • sparse log_softmax
    • CPU C++ implementation
    • unittests
    • update log_softmax documentation
    • autograd support

@pearu pearu self-assigned this Apr 9, 2020
@pearu pearu changed the title Sparse softmax support Sparse softmax support [WIP] Apr 9, 2020
@pearu pearu changed the title Sparse softmax support [WIP] Sparse softmax support (CPU) [WIP] Apr 16, 2020
@pearu pearu force-pushed the pearu/23651-softmax-sparse branch from 1842a1c to cb88366 Compare April 16, 2020 14:24
@pearu pearu requested a review from apaszke as a code owner April 20, 2020 13:25
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 20, 2020

The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be -inf that will have the effect of "zero entries ignored"

We had a bunch of discussions about whether or not it is appropriate to treat unspecified sparse tensor entries as things other than zero; specifically, whether or not we should be allowed to overload the old function name in this situation (some discussion about this at #1369 ). Did you folks at Quansight come to a decision about what should be done here?

@pearu
Copy link
Copy Markdown
Collaborator Author

pearu commented Apr 21, 2020

@ezyang we have not discussed the overloading vs name-prefixing options yet but we have discussed the idea of introducing the so-called fill value to sparse tensors that would set the default value for unspecified sparse tensor entries.

The fill value feature would be useful also for the sparse softmax case that would allow users to specify the user-defined behavior:

  • Using fill_value=0 would lead to nice invariant softmax(sparse, dim).to_dense() == softmax(sparse.to_dense(), dim) while the result of softmax(sparse, dim) would be still sparse tensor (with a non-zero fill_value).
  • Using fill_value=-inf [the current PR] gives a mathematical basis for softmax(sparse, dim) to be a sparse tensor (as requested in the issue Add sparse softmax/log_softmax functionality (ignore zero entries) #23651).

In addition, the fill value feature has the potential to eliminate the overloading vs name-prefixing question, and in general, it would lower the bridge between sparse and dense tensor in a consistent way.

The fill value has been implemented in https://sparse.pydata.org/ and from #9674 (comment) I see you have already discussed this option as well.

For this PR, we have two choices:

  1. sparse softmax is exposed as softmax (and log_softmax) [the current behavior] via overloading
  2. sparse softmax is exposed as spsoftmax (and splog_softmax)

In the long term, with the perspective of having the fill value feature available, option 1. makes sense to me.
On the other hand, option 2. seems to be the current practice in pytorch and I would be ok changing the PR accordingly.

So, I am not certain atm which approach would be preferable to users. Thoughts?

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 21, 2020

If y'all are prepared to take on the maintenance burden of a fill value, I think I would also agree that this would be best, and then we don't have to introduce sparse versions of all of our functions. Without fill, (1) will technically require you to do BC-breaking changes in the future, as you will turn an operation that previously implicitly converted zero-filled tensors to neginf-filled tensors, into an error. It will save you work in the future if you do (2) first.

@pearu
Copy link
Copy Markdown
Collaborator Author

pearu commented Apr 22, 2020

OK, it makes sense. Here's a plan:

  • Expose sparse softmax as torch.sparse.softmax that is defined for sparse tensors with assumed fill value equal to negative infinity (that would be equivalent to ignoring unspecified sparse tensor entries).
  • Expose sparse log softmax as torch.sparse.log_softmax with fill value as above.
  • Defining torch.softmax on sparse tensors will be reserved for sparse tensors that support fill value feature. The result of torch.softmax will be different from the result of torch.sparse.softmax because of the default fill value for sparse tensors will be zero. The results will be equal iff the fill value of a sparse tensor is specified as negative infinity.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 22, 2020

SGTM.

@pearu pearu force-pushed the pearu/23651-softmax-sparse branch 2 times, most recently from 9144b22 to 7661c03 Compare April 26, 2020 19:15
@pearu pearu changed the title Sparse softmax support (CPU) [WIP] Sparse softmax support (CPU) Apr 27, 2020
@pearu
Copy link
Copy Markdown
Collaborator Author

pearu commented Apr 27, 2020

@apaszke @ezyang this PR is ready for review. The PR implements sparse softmax support for CPU, the CUDA support is planned as a followup PR.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 27, 2020

@pearu can you get someone from Quansight to review first?

@hameerabbasi hameerabbasi self-requested a review April 29, 2020 14:34
@nikitaved nikitaved self-requested a review April 29, 2020 15:33
Comment thread aten/src/ATen/native/sparse/SoftMax.cpp Outdated
Comment thread aten/src/ATen/native/sparse/SoftMax.cpp Outdated
@nikitaved
Copy link
Copy Markdown
Collaborator

nikitaved commented Apr 30, 2020

All the loops could be replaced with the TensorIterator loops, I think... TensorIterator loops over all dimensions not involved in reduction, and the kernel loop iterates over the reduction dimension if you perform the reduction and scaling all at once. Otherwise you could iterate over each element and then rescale it appropriately while knowing which value to rescale it with. TensorIterator also brings multithreading for free.

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 5, 2020
@pearu pearu force-pushed the pearu/23651-softmax-sparse branch from 5d652ec to f7f2101 Compare May 5, 2020 21:14
@pearu pearu requested a review from nikitaved May 5, 2020 21:23
Comment thread aten/src/ATen/native/sparse/SoftMax.cpp Outdated
Comment thread aten/src/ATen/native/sparse/SoftMax.cpp Outdated
Comment thread aten/src/ATen/native/sparse/SoftMax.cpp Outdated
Comment thread aten/src/ATen/native/sparse/SoftMax.cpp Outdated
Comment thread aten/src/ATen/native/sparse/SoftMax.cpp Outdated
@pearu pearu force-pushed the pearu/23651-softmax-sparse branch from 85a58cf to 7d095a3 Compare May 12, 2020 14:24
Comment thread torch/sparse/__init__.py
if TYPE_CHECKING:
from torch import dtype as DType
else:
DType = int
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang do you happen to know why this is needed? @pearu checked and found that torchscript doesn't support torch.types._dtype in the type annotation. That seems like it's going to give problems in other places.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we're going need to clue in TorchScript about this, unfortunately >:(

@pearu
Copy link
Copy Markdown
Collaborator Author

pearu commented May 12, 2020

Here are some benchmark results for torch.sparse.softmax (v1.6.0.a) and tensorflow.sparse.softmax (v 2.2.0), both using CPU, values are float64 scalars, timing repeat is 1000:

sizes nnz dimension pytorch sparse pytorch dense tensorflow
(100, 100) 1000 0 127.68 us 66.16 us N/A
(100, 100) 1000 1 97.27 us 10.36 us 234.18 us
(100, 100) 10000 0 595.16 us 67.35 us N/A
(100, 100) 10000 1 513.72 us 10.22 us 490.75 us
(100, 100, 100) 1000 0 280.73 us 1013.29 us N/A
(100, 100, 100) 1000 1 230.72 us 793.75 us N/A
(100, 100, 100) 1000 2 225.87 us 554.37 us 324.38 us
(100, 100, 100) 10000 0 1371.77 us 1135.67 us N/A
(100, 100, 100) 10000 1 1228.75 us 691.85 us N/A
(100, 100, 100) 10000 2 1171.73 us 533.57 us 1294.87 us
(100, 100, 100) 100000 0 7957.96 us 1413.64 us N/A
(100, 100, 100) 100000 1 6750.87 us 752.53 us N/A
(100, 100, 100) 100000 2 5801.34 us 532.25 us 6444.39 us

Some notes:

  • pytorch.sparse.softmax is generally faster than tensorflow.sparse.softmax. That is pretty good considering that pytorch sparse softmax supports more features that tf.
  • softmax performance is higher for larger dimension argument, this is likely related to better locality properties of index pools when using row-wise storage.
  • pytorch sparse vs dense timings show that there exists a density region (< 1 %) where using sparse storage gives considerably better performance than using dense storage. For relatively dense matrices, usage of sparse storage leads to up to 10x smaller performance.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented May 13, 2020

Failure is legit:


May 12 18:18:16   test_softmax (__main__.TestSparse) ... /var/lib/jenkins/workspace/aten/src/ATen/native/sparse/SoftMax.cpp:348:35: runtime error: division by zero
May 12 18:18:17     #0 0x7f50670b37cd in .omp_outlined. (/opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so+0xc2517cd)
May 12 18:18:17     #1 0x7f5053405712 in __kmp_invoke_microtask (/opt/conda/lib/libiomp5.so+0x122712)
May 12 18:18:17     #2 0x7f5053393ffe in __kmp_invoke_task_func (/opt/conda/lib/libiomp5.so+0xb0ffe)
May 12 18:18:17     #3 0x7f5053393059 in __kmp_launch_thread (/opt/conda/lib/libiomp5.so+0xb0059)
May 12 18:18:17     #4 0x7f5053405bd7 in _INTERNALdc942ded::__kmp_launch_worker(void*) (/opt/conda/lib/libiomp5.so+0x122bd7)
May 12 18:18:17     #5 0x7f508a9cd6b9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76b9)
May 12 18:18:17     #6 0x7f508a70341c in clone /build/glibc-LK5gWL/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109
May 12 18:18:17 

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ezyang merged this pull request in 48c0331.

@cpuhrsch
Copy link
Copy Markdown
Contributor

Interesting to read the conversations in this PR again in light of maskedtensor cc @george-qi

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
This PR implements softmax support for sparse tensors.

The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be `-inf` that will have the effect of "zero entries ignored". This relation is used for testing the correctness of results here.

Resolves pytorch#23651 for CPU.

- [x] sparse softmax
  - [x] CPU C++ implementation
  - [x] unittests
  - [x] update softmax documentation
  - [x] autograd support
- [x] sparse log_softmax
  - [x] CPU C++ implementation
  - [x] unittests
  - [x] update log_softmax documentation
  - [x] autograd support
Pull Request resolved: pytorch#36305

Differential Revision: D21566540

Pulled By: ezyang

fbshipit-source-id: a632ea69c38622f960721482e442efeb8d0a54fc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add sparse softmax/log_softmax functionality (ignore zero entries)

10 participants