use batched gemm from mkl on torch.bmm when mkl is available#11365
Closed
mingfeima wants to merge 1 commit intopytorch:masterfrom
Closed
use batched gemm from mkl on torch.bmm when mkl is available#11365mingfeima wants to merge 1 commit intopytorch:masterfrom
mingfeima wants to merge 1 commit intopytorch:masterfrom
Conversation
Collaborator
|
Hi, Best regards Thomas P.S.: Also, I think that it would be best to do this for baddbmm/bmm in one go. |
ssnl
reviewed
Sep 7, 2018
| namespace at { namespace native { | ||
|
|
||
| Tensor bmm_mkl(const Tensor& self, const Tensor& tensor) { | ||
| throw std::runtime_error("bmm: ATen not compiled with MKL support"); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Collaborator
|
Very nice. Thanks a lot! Although it seems to have some bug currently (see CI failures). |
Collaborator
Author
|
close this as folded in #11292 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR uses mkl batched gemm for
torch.bmmwhen mkl is available. The current logic dealing withtorch.bmmis to dobatch_sizeiterations of gemm. From the performance point of view, this should be OK in case the gemm size is large enough. However, in many cases, the gemm size is relatively small and not efficient.One scenario it globalAttention calculation of NMT, where
mat1: N * 1 * Tmat2: N * T * HN refers to batch size, T refers to timestep and H is the hidden size.
there the gemm size is relatively small, MKL has batched gemm APIs which is beneficial in case dealing with batched small gemms.
The following script is used for benchmarking and testing the PR. On Xeon skylake 8180 (2 sockets * 28 cores), it runs
0.81mswithout the PR and0.45mswith the PR.