Softmax/LogSoftMax refactor (wrapped up) by apaszke · Pull Request #3245 · pytorch/pytorch

apaszke · 2017-10-23T20:23:55Z

These commits wrap up the previous Softmax refactor. All the important changes are on the CUDA side. Once I unified the code I also added a special instantiation that made the kernels faster in certain cases (small inner dim, large softmax dim - might be useful in NLP for short sequences?)

tl;dr CUDA Softmax now supports a dim argument, and is usually 4x-256x faster than the previous implementation (it didn't have a spatial implementation before). Now, it also shares kernels with LogSoftmax, and certain optimizations benefited the log case giving up to 64x speedup in certain cases as well.

Here are the plots that show old / new timing ratios for different sizes of dim and size of the innermost dimensions (on the left of dim). Paralellizing over batch is easy, so it is fixed as 64 in all plots. Red dots are better, blue are regressions. Note that the plot is log in all axis (so z of 8 means 2^8x faster)

Softmax

Benefits from this diff all over the place. The old kernel was written in a quite archaic way.

LogSoftmax

Benefits from adding a custom kernel for the cases when inner_size is no longer 1, so we can't use the super fast kernel, but the dim_size is large, so using a single thread to reduce values is slow. It is only enabled for a subset of the space where it provided speedups.

Overall times

These are the log plots (in all axes) of the time (no more ratios) for the new algorithm. Softmax on the left, LogSoftmax on the right:

apaszke · 2017-10-23T20:44:01Z

I forgot that the z axis is also log in all plots, so 2-8x speedup is really 4-256x. I've updated the description

Added a new instantiation of the spatial kernel for low inner_size and larger dim_size.

killeent

This looks good to me mostly, I didn't check the validity of the algorithm in any way.

torch/lib/THCUNN/SoftMaxCommon.cuh

+  ReduceOp<T> r;
+  shared += threadIdx.y * blockDim.x;
+
+  __syncthreads();


torch/lib/THCUNN/SoftMaxCommon.cuh

+
+
+template <typename T, typename AccumT>
+struct MaxFloat


torch/lib/THCUNN/SoftMaxCommon.cuh

+
+template <template<typename> class Reduction, typename AccumT>
+__device__ __forceinline__ AccumT
+blockReduce(AccumT* smem, AccumT val,


torch/lib/THCUNN/SoftMaxCommon.cuh

+  int last = size % (ILP * blockDim.x);
+
+  // Body (unroll by ILP times)
+  for (; offset < size - last; offset += blockDim.x * ILP) {


ezyang · 2017-10-24T18:58:54Z

@apaszke, do you want to wait until someone reviews the math, or merge it sooner?

apaszke · 2017-10-24T19:00:19Z

No preference. If anyone is up to reviewing it then I'll wait, otherwise there's no point. I can add more reference functions to make sure it's all ok

Unify CUDA kernels for SoftMax and LogSoftMax

6c83cab

apaszke force-pushed the softmax_refactor branch from fff05a8 to 4427327 Compare October 23, 2017 20:57

Improve SoftMax and LogSoftMax kernels performance

866f51e

Added a new instantiation of the spatial kernel for low inner_size and larger dim_size.

apaszke force-pushed the softmax_refactor branch from 4427327 to 866f51e Compare October 23, 2017 20:58

vadimkantorov mentioned this pull request Oct 23, 2017

Typo in SoftMax docs prevent formula rendering #3186

Closed

This was referenced Oct 23, 2017

softmax (and log_softmax) fail for 3D tensors in CUDA #3246

Closed

Ensure that CMNIST example supports CUDA probtorch/probtorch#10

Merged

killeent reviewed Oct 24, 2017

View reviewed changes

soumith merged commit b3642b3 into master Oct 25, 2017

soumith deleted the softmax_refactor branch November 21, 2017 18:20

ezyang added the open source label Jun 24, 2019

Conversation

apaszke commented Oct 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Softmax

LogSoftmax

Overall times

Uh oh!

apaszke commented Oct 23, 2017

Uh oh!

killeent left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Oct 24, 2017

Uh oh!

apaszke commented Oct 24, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

apaszke commented Oct 23, 2017 •

edited

Loading