Skip to content

Einsum Layer Performance Test#24445

Merged
asmorkalov merged 6 commits intoopencv:4.xfrom
Abdurrahheem:ash/dev_einsum_pref
Nov 8, 2023
Merged

Einsum Layer Performance Test#24445
asmorkalov merged 6 commits intoopencv:4.xfrom
Abdurrahheem:ash/dev_einsum_pref

Conversation

@Abdurrahheem
Copy link
Copy Markdown
Contributor

@Abdurrahheem Abdurrahheem commented Oct 24, 2023

This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs

Notation:

  • WX: windows10_x64
  • MX: macos_x64
  • MA: macos_arm64
  • UX: ubuntu_x64
  • UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication


Benchmarks:

Equation Inputs Mat Dims UX (ms) UA (ms) MX (ms) MA (ms) WX (ms)
"ij, jk -> ik" [2, 3], [3,2] 0.04 ± 0.00 - - - -
"ij, jk -> ik" [20, 30], [30,20] 0.08 ± 0.00 - - - -
"ij, jk -> ik" [113, 127], [127,113] 2.41 ± 0.05 - - - -
"imkj, injs -> imnks" [1, 4, 7, 9], [1, 5, 9, 8] 0.11 ± 0.00 - - - -
"imkj, injs -> imnks" [1, 4, 70, 90], [1, 5, 90, 80] 15.49 ± 0.46 - - - -
"imkj, injs -> imnks" [1, 4, 73, 91], [1, 5, 91, 57] 11.53 ± 0.06 - - - -
"ij -> i" [30, 40] 0.03 ± 0.00 - - - -
"ij -> i" [113, 374] 0.13 ± 0.00 - - - -
"...ij -> ...i" [30, 40] 0.03 ± 0.00 - - - -
"...ij -> ...i" [113, 374] 0.13 ± 0.00 - - - -
"...ij, ...jk -> ...ik" [40, 50], [50,80] 0.37 ± 0.01 - - - -
"...ij, ...jk -> ...ik" [47, 51], [51, 83] 0.43 ± 0.01 - - - -

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@Abdurrahheem Abdurrahheem self-assigned this Oct 24, 2023
@Abdurrahheem Abdurrahheem marked this pull request as ready for review October 24, 2023 18:33
@fengyuentau
Copy link
Copy Markdown
Member

IIRC, cv::gemm is used in einsum layer. Could you use fastGemm and make a comparison?

@asmorkalov
Copy link
Copy Markdown
Contributor

[ RUN      ] Layer_Einsum.einsum/5, where GetParam() = 
Eqiation: 	imkj, injs -> imnks
InputSize: 	2
OutputSize: 	1
InputShape 0: 	100 400 700 900 
InputShape 1: 	100 500 900 800 

[ERROR:0@1279.725] global net_impl.cpp:1197 getLayerShapesRecursively OPENCV/DNN: []:(_input): getMemoryShapes() post validation failed. inputs=2 outputs=2/2 blobs=0 inplace=0
[ERROR:0@1279.728] global net_impl.cpp:1204 getLayerShapesRecursively     input[0] = [ 100 400 700 900 ]
[ERROR:0@1279.728] global net_impl.cpp:1204 getLayerShapesRecursively     input[1] = [ 100 500 900 800 ]
[ERROR:0@1279.728] global net_impl.cpp:1208 getLayerShapesRecursively     output[0] = [ 100 400 700 900 ]
[ERROR:0@1279.728] global net_impl.cpp:1208 getLayerShapesRecursively     output[1] = [ 100 500 900 800 ]
[ERROR:0@1279.728] global net_impl.cpp:1214 getLayerShapesRecursively Exception message: OpenCV(4.8.0-dev) /home/ci/opencv/modules/dnn/src/net_impl.cpp:1193: error: (-2:Unspecified error) in function 'void cv::dnn::dnn4_v20230620::Net::Impl::getLayerShapesRecursively(int, cv::dnn::dnn4_v20230620::Net::Impl::LayersShapesMap&)'
>  (expected: 'total(os[i]) > 0'), where
>     'total(os[i])' is -569803776
> must be greater than
>     '0' is 0

/home/ci/opencv/modules/ts/src/ts_perf.cpp:1965: Failure
Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.8.0-dev) /home/ci/opencv/modules/dnn/src/net_impl.cpp:1193: error: (-2:Unspecified error) in function 'void cv::dnn::dnn4_v20230620::Net::Impl::getLayerShapesRecursively(int, cv::dnn::dnn4_v20230620::Net::Impl::LayersShapesMap&)'
>  (expected: 'total(os[i]) > 0'), where
>     'total(os[i])' is -569803776
> must be greater than
>     '0' is 0

params    = 
Eqiation: 	imkj, injs -> imnks
InputSize: 	2
OutputSize: 	1
InputShape 0: 	100 400 700 900 
InputShape 1: 	100 500 900 800 

termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 1
outliers  =          0
frequency =          0
[  FAILED  ] Layer_Einsum.einsum/5, where GetParam() = 
Eqiation: 	imkj, injs -> imnks
InputSize: 	2
OutputSize: 	1
InputShape 0: 	100 400 700 900 
InputShape 1: 	100 500 900 800 
 (1288349 ms)

@asmorkalov
Copy link
Copy Markdown
Contributor

Abduragim will add fastGemm with the next iteration.

@asmorkalov asmorkalov added this to the 4.9.0 milestone Oct 26, 2023
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov
Copy link
Copy Markdown
Contributor

@dkurt @fengyuentau I want to merge the PR. fastGem will be integrated with the next one to simplify performance comparison. Do you have any concerns?

Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what is the time cost on CI for these tests? is it tolerable (< 1000ms for example)?

@Abdurrahheem
Copy link
Copy Markdown
Contributor Author

Also what is the time cost on CI for these tests? is it tolerable (< 1000ms for example)?

@asmorkalov

@asmorkalov
Copy link
Copy Markdown
Contributor

On my old PC without AVX2: 17 tests from 1 test case ran. (15615 ms total)
The longest case is:

[ RUN      ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
.
[ PERFSTAT ]    (samples=10   mean=1273.04   median=1274.26   min=1261.88   stddev=6.78 (0.5%))

@fengyuentau
Copy link
Copy Markdown
Member

@Abdurrahheem Could you collect the perf results from detail pages and fill your table in the first comment?

@asmorkalov
Copy link
Copy Markdown
Contributor

@fengyuentau I propose to rerun the benchmark locally and update the PR. CI runs perf tests with single iteration and concurrently with other builds. The numbers are not reliable.

@fengyuentau
Copy link
Copy Markdown
Member

ARM64: ~3.5s

[ RUN ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
[ PERFSTAT ] (samples=1 mean=3578.70 median=3578.70 min=3578.70 stddev=0.00 (0.0%))

X64: ~1.8s

[ RUN ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
[ PERFSTAT ] (samples=1 mean=1797.72 median=1797.72 min=1797.72 stddev=0.00 (0.0%))

Win-X64: ~7.6s

[ RUN ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
[ PERFSTAT ] (samples=1 mean=7660.13 median=7660.13 min=7660.13 stddev=0.00 (0.0%))

I propose to make it a smaller scale.

@asmorkalov
Copy link
Copy Markdown
Contributor

@Abdurrahheem friendly reminder.

@Abdurrahheem
Copy link
Copy Markdown
Contributor Author

Abdurrahheem commented Nov 3, 2023

@fengyuentau I propose to rerun the benchmark locally and update the PR. CI runs perf tests with single iteration and concurrently with other builds. The numbers are not reliable.

I am only able to test on Ubuntu locally currently due to lack of different platforms

@Abdurrahheem
Copy link
Copy Markdown
Contributor Author

Updated the table with performance results.

@Abdurrahheem Abdurrahheem mentioned this pull request Nov 8, 2023
6 tasks
Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@asmorkalov asmorkalov merged commit 9d0c8a9 into opencv:4.x Nov 8, 2023
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
Einsum Layer Performance Test opencv#24445

## This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs

**Notation:**
- WX: windows10_x64
- MX: macos_x64
- MA: macos_arm64
- UX: ubuntu_x64
- UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication

---

Benchmarks:


| Equation                | Inputs Mat Dims                   | UX (ms)        | UA (ms) | MX (ms) | MA (ms) | WX (ms) |
|-------------------------|-----------------------------------|----------------|---------|---------|---------|---------|
| "ij, jk -> ik"          | [2, 3], [3,2]                     | 0.04 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [20, 30], [30,20]                 | 0.08 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [113, 127], [127,113]             | 2.41 ± 0.05    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 7, 9], [1, 5, 9, 8]        | 0.11 ± 0.00    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 70, 90], [1, 5, 90, 80]    | 15.49 ± 0.46   | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 73, 91], [1, 5, 91, 57]    | 11.53 ± 0.06   | -       | -       | -       | -       |
| "ij -> i"               | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "ij -> i"               | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [40, 50], [50,80]                 | 0.37 ± 0.01    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [47, 51], [51, 83]                | 0.43 ± 0.01    | -       | -       | -       | -       |

-----

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Einsum Layer Performance Test opencv#24445

## This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs

**Notation:**
- WX: windows10_x64
- MX: macos_x64
- MA: macos_arm64
- UX: ubuntu_x64
- UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication

---

Benchmarks:


| Equation                | Inputs Mat Dims                   | UX (ms)        | UA (ms) | MX (ms) | MA (ms) | WX (ms) |
|-------------------------|-----------------------------------|----------------|---------|---------|---------|---------|
| "ij, jk -> ik"          | [2, 3], [3,2]                     | 0.04 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [20, 30], [30,20]                 | 0.08 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [113, 127], [127,113]             | 2.41 ± 0.05    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 7, 9], [1, 5, 9, 8]        | 0.11 ± 0.00    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 70, 90], [1, 5, 90, 80]    | 15.49 ± 0.46   | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 73, 91], [1, 5, 91, 57]    | 11.53 ± 0.06   | -       | -       | -       | -       |
| "ij -> i"               | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "ij -> i"               | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [40, 50], [50,80]                 | 0.37 ± 0.01    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [47, 51], [51, 83]                | 0.43 ± 0.01    | -       | -       | -       | -       |

-----

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Einsum Layer Performance Test opencv#24445

## This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs

**Notation:**
- WX: windows10_x64
- MX: macos_x64
- MA: macos_arm64
- UX: ubuntu_x64
- UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication

---

Benchmarks:


| Equation                | Inputs Mat Dims                   | UX (ms)        | UA (ms) | MX (ms) | MA (ms) | WX (ms) |
|-------------------------|-----------------------------------|----------------|---------|---------|---------|---------|
| "ij, jk -> ik"          | [2, 3], [3,2]                     | 0.04 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [20, 30], [30,20]                 | 0.08 ± 0.00    | -       | -       | -       | -       |
| "ij, jk -> ik"          | [113, 127], [127,113]             | 2.41 ± 0.05    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 7, 9], [1, 5, 9, 8]        | 0.11 ± 0.00    | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 70, 90], [1, 5, 90, 80]    | 15.49 ± 0.46   | -       | -       | -       | -       |
| "imkj, injs -> imnks"   | [1, 4, 73, 91], [1, 5, 91, 57]    | 11.53 ± 0.06   | -       | -       | -       | -       |
| "ij -> i"               | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "ij -> i"               | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [30, 40]                          | 0.03 ± 0.00    | -       | -       | -       | -       |
| "...ij -> ...i"         | [113, 374]                        | 0.13 ± 0.00    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [40, 50], [50,80]                 | 0.37 ± 0.01    | -       | -       | -       | -       |
| "...ij, ...jk -> ...ik" | [47, 51], [51, 83]                | 0.43 ± 0.01    | -       | -       | -       | -       |

-----

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants