Skip to content

Fast gemm for einsum#24509

Merged
asmorkalov merged 5 commits intoopencv:4.xfrom
Abdurrahheem:ash/dev_einsum_fast_gemm
Nov 16, 2023
Merged

Fast gemm for einsum#24509
asmorkalov merged 5 commits intoopencv:4.xfrom
Abdurrahheem:ash/dev_einsum_fast_gemm

Conversation

@Abdurrahheem
Copy link
Copy Markdown
Contributor

@Abdurrahheem Abdurrahheem commented Nov 7, 2023

This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs

Notation:

  • WX: windows10_x64
  • MX: macos_x64
  • MA: macos_arm64
  • UX: ubuntu_x64
  • UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication


Benchmarks: (arrow indicates increase in inference speed compared to einsum with gemm)

Equation Inputs Mat Dims UX (ms) UA (ms) MX (ms) MA (ms) WX (ms)
"ij, jk -> ik" [2, 3], [3,2] 0.04 ± 0.00 - - - -
"ij, jk -> ik" [20, 30], [30,20] 0.07 ± 0.00 - - - -
"ij, jk -> ik" [113, 127], [127,113] 1.17 ± 0.02 ↓ ~ 48% - - - -
"imkj, injs -> imnks" [1, 4, 7, 9], [1, 5, 9, 8] 0.10 ± 0.00 - - - -
"imkj, injs -> imnks" [1, 4, 70, 90], [1, 5, 90, 80] 5.75 ± 0.10 ↓ ~ 37% - - - -
"imkj, injs -> imnks" [1, 4, 73, 91], [1, 5, 91, 57] 5.58 ± 0.12 ↓ ~ 48% - - - -
"ij -> i" [30, 40] 0.03 ± 0.00 - - - -
"ij -> i" [113, 374] 0.13 ± 0.00 - - - -
"...ij -> ...i" [30, 40] 0.03 ± 0.00 - - - -
"...ij -> ...i" [113, 374] 0.13 ± 0.00 - - - -
"...ij, ...jk -> ...ik" [40, 50], [50,80] 0.26 ± 0.00 - - - -
"...ij, ...jk -> ...ik" [47, 51], [51, 83] 0.28 ± 0.01 - - - -

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Comment on lines +44 to +45
FastGemmOpt opt;
opt.init();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to make it layer-wide, but not initialize each time.

@fengyuentau
Copy link
Copy Markdown
Member

How do I understand your collected results? Is it faster or slower compared with cv::gemm?

@asmorkalov asmorkalov added this to the 4.9.0 milestone Nov 8, 2023
@Abdurrahheem
Copy link
Copy Markdown
Contributor Author

How do I understand your collected results? Is it faster or slower compared with cv::gemm?

Sorry. I added description. Inference is faster with FastGemm

@Abdurrahheem Abdurrahheem force-pushed the ash/dev_einsum_fast_gemm branch from 82f1c13 to b470932 Compare November 8, 2023 17:43
@asmorkalov
Copy link
Copy Markdown
Contributor

armv7 neon (jetson tk1):

Geometric mean (ms)

                                                       Name of Test                                                        4.x-baseline-1 4.x-fastgemm-1 4.x-fastgemm-1
                                                                                                                                                               vs      
                                                                                                                                                         4.x-baseline-1
                                                                                                                                                           (x-factor)  
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                 0.027          0.027           1.03     
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                               0.120          0.120           1.00     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{40, 50}, {50, 80}}               0.459          0.269           1.71     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{47, 51}, {51, 83}}               0.523          0.292           1.79     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                       0.027          0.026           1.03     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                                     0.121          0.120           1.01     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{2, 3}, {3, 2}}                            0.058          0.053           1.08     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{20, 30}, {30, 20}}                        0.135          0.119           1.14     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{113, 127}, {127, 113}}                    3.635          2.044           1.78     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 7, 9}, {1, 5, 9, 8}}         0.118          0.125           0.95     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 70, 90}, {1, 5, 90, 80}}     30.297         10.084          3.00     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 73, 91}, {1, 5, 91, 57}}     23.522         8.031           2.93  

@asmorkalov
Copy link
Copy Markdown
Contributor

x86 without AVX2:

Geometric mean (ms)

                                                       Name of Test                                                        4.x-baseline-1 4.x-fastgemm-1 4.x-fastgemm-1
                                                                                                                                                               vs      
                                                                                                                                                         4.x-baseline-1
                                                                                                                                                           (x-factor)  
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                 0.007          0.007           0.96     
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                               0.043          0.044           1.00     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{40, 50}, {50, 80}}               0.254          0.153           1.66     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{47, 51}, {51, 83}}               0.289          0.163           1.77     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                       0.007          0.007           0.97     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                                     0.042          0.044           0.97     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{2, 3}, {3, 2}}                            0.007          0.007           1.10     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{20, 30}, {30, 20}}                        0.038          0.030           1.28     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{113, 127}, {127, 113}}                    0.760          0.630           1.21     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 7, 9}, {1, 5, 9, 8}}         0.050          0.044           1.14     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 70, 90}, {1, 5, 90, 80}}     4.641          3.224           1.44     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 73, 91}, {1, 5, 91, 57}}     3.735          2.685           1.39     

@asmorkalov
Copy link
Copy Markdown
Contributor

So fastGemm integration totally makes sense. I propose to Extract platform detection and run it once in layer constructor. Other things looks good to me.

@Abdurrahheem
Copy link
Copy Markdown
Contributor Author

So fastGemm integration totally makes sense. I propose to Extract platform detection and run it once in layer constructor. Other things looks good to me.

In my last comment I have fixed tests issues mentioned by @dkurt and fixed platform detection. Should I get results with new platform detection or can we merge the PR without it. In my opinion there will not be too much of a difference in terms of results, so we can merge the PR

@Abdurrahheem Abdurrahheem marked this pull request as ready for review November 13, 2023 08:41
@asmorkalov
Copy link
Copy Markdown
Contributor

@dkurt Do you have other remarks? If no, I propose to merge the PR after constructors fix.

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

for (size_t i = 0; i < output.size(); i++) {
Mat output_slice = output_buffer.row(i);
output[i].copyTo(output_slice);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hconcat?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried using concat. For some reason it fails the inference. Can you suggest the showcase the usage you had in your mind?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is error message?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me, vconcat for sure:

// ...
output.emplace_back(tmp_output.reshape(1, 1));
// ...
Mat output_buffer;
cv::vconcat(output, output_buffer);

int outputDim[] = {static_cast<int>(output.size()), M, N};
output_buffer = output_buffer.reshape(1, 3, &outputDim[0]);

@dkurt
Copy link
Copy Markdown
Member

dkurt commented Nov 16, 2023

@Abdurrahheem, branch was pushed to origin by mistake: https://github.com/opencv/opencv/tree/ash/dev_einsum_fast_gemm

Please do locally:

git remote set-url --push origin ""

To avoid pushing to OpenCV:

$ git remote -v
dkurt   https://github.com/dkurt/opencv (fetch)
dkurt   https://github.com/dkurt/opencv (push)
origin  https://github.com/opencv/opencv (fetch)
origin   (push)

@asmorkalov asmorkalov merged commit 8c10545 into opencv:4.x Nov 16, 2023
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
…_gemm

Fast gemm for einsum opencv#24509

## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
…_gemm

Fast gemm for einsum opencv#24509

## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
…_gemm

Fast gemm for einsum opencv#24509

## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants