Einsum Layer Performance Test by Abdurrahheem · Pull Request #24445 · opencv/opencv

Abdurrahheem · 2023-10-24T18:32:09Z

This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs

Notation:

WX: windows10_x64
MX: macos_x64
MA: macos_arm64
UX: ubuntu_x64
UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication

Benchmarks:

Equation	Inputs Mat Dims	UX (ms)	UA (ms)	MX (ms)	MA (ms)	WX (ms)
"ij, jk -> ik"	[2, 3], [3,2]	0.04 ± 0.00	-	-	-	-
"ij, jk -> ik"	[20, 30], [30,20]	0.08 ± 0.00	-	-	-	-
"ij, jk -> ik"	[113, 127], [127,113]	2.41 ± 0.05	-	-	-	-
"imkj, injs -> imnks"	[1, 4, 7, 9], [1, 5, 9, 8]	0.11 ± 0.00	-	-	-	-
"imkj, injs -> imnks"	[1, 4, 70, 90], [1, 5, 90, 80]	15.49 ± 0.46	-	-	-	-
"imkj, injs -> imnks"	[1, 4, 73, 91], [1, 5, 91, 57]	11.53 ± 0.06	-	-	-	-
"ij -> i"	[30, 40]	0.03 ± 0.00	-	-	-	-
"ij -> i"	[113, 374]	0.13 ± 0.00	-	-	-	-
"...ij -> ...i"	[30, 40]	0.03 ± 0.00	-	-	-	-
"...ij -> ...i"	[113, 374]	0.13 ± 0.00	-	-	-	-
"...ij, ...jk -> ...ik"	[40, 50], [50,80]	0.37 ± 0.01	-	-	-	-
"...ij, ...jk -> ...ik"	[47, 51], [51, 83]	0.43 ± 0.01	-	-	-	-

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

fengyuentau · 2023-10-25T03:05:15Z

IIRC, cv::gemm is used in einsum layer. Could you use fastGemm and make a comparison?

asmorkalov · 2023-10-25T05:42:24Z

[ RUN      ] Layer_Einsum.einsum/5, where GetParam() = 
Eqiation: 	imkj, injs -> imnks
InputSize: 	2
OutputSize: 	1
InputShape 0: 	100 400 700 900 
InputShape 1: 	100 500 900 800 

[ERROR:0@1279.725] global net_impl.cpp:1197 getLayerShapesRecursively OPENCV/DNN: []:(_input): getMemoryShapes() post validation failed. inputs=2 outputs=2/2 blobs=0 inplace=0
[ERROR:0@1279.728] global net_impl.cpp:1204 getLayerShapesRecursively     input[0] = [ 100 400 700 900 ]
[ERROR:0@1279.728] global net_impl.cpp:1204 getLayerShapesRecursively     input[1] = [ 100 500 900 800 ]
[ERROR:0@1279.728] global net_impl.cpp:1208 getLayerShapesRecursively     output[0] = [ 100 400 700 900 ]
[ERROR:0@1279.728] global net_impl.cpp:1208 getLayerShapesRecursively     output[1] = [ 100 500 900 800 ]
[ERROR:0@1279.728] global net_impl.cpp:1214 getLayerShapesRecursively Exception message: OpenCV(4.8.0-dev) /home/ci/opencv/modules/dnn/src/net_impl.cpp:1193: error: (-2:Unspecified error) in function 'void cv::dnn::dnn4_v20230620::Net::Impl::getLayerShapesRecursively(int, cv::dnn::dnn4_v20230620::Net::Impl::LayersShapesMap&)'
>  (expected: 'total(os[i]) > 0'), where
>     'total(os[i])' is -569803776
> must be greater than
>     '0' is 0

/home/ci/opencv/modules/ts/src/ts_perf.cpp:1965: Failure
Failed
Expected: PerfTestBody() doesn't throw an exception.
  Actual: it throws cv::Exception:
  OpenCV(4.8.0-dev) /home/ci/opencv/modules/dnn/src/net_impl.cpp:1193: error: (-2:Unspecified error) in function 'void cv::dnn::dnn4_v20230620::Net::Impl::getLayerShapesRecursively(int, cv::dnn::dnn4_v20230620::Net::Impl::LayersShapesMap&)'
>  (expected: 'total(os[i]) > 0'), where
>     'total(os[i])' is -569803776
> must be greater than
>     '0' is 0

params    = 
Eqiation: 	imkj, injs -> imnks
InputSize: 	2
OutputSize: 	1
InputShape 0: 	100 400 700 900 
InputShape 1: 	100 500 900 800 

termination reason:  unhandled exception
bytesIn   =          0
bytesOut  =          0
samples   =          0 of 1
outliers  =          0
frequency =          0
[  FAILED  ] Layer_Einsum.einsum/5, where GetParam() = 
Eqiation: 	imkj, injs -> imnks
InputSize: 	2
OutputSize: 	1
InputShape 0: 	100 400 700 900 
InputShape 1: 	100 500 900 800 
 (1288349 ms)

asmorkalov · 2023-10-25T05:43:00Z

Abduragim will add fastGemm with the next iteration.

modules/dnn/perf/perf_einsum.cpp

asmorkalov

👍

asmorkalov · 2023-10-26T10:37:34Z

@dkurt @fengyuentau I want to merge the PR. fastGem will be integrated with the next one to simplify performance comparison. Do you have any concerns?

fengyuentau

Also what is the time cost on CI for these tests? is it tolerable (< 1000ms for example)?

modules/dnn/perf/perf_einsum.cpp

Abdurrahheem · 2023-10-26T11:02:19Z

Also what is the time cost on CI for these tests? is it tolerable (< 1000ms for example)?

@asmorkalov

asmorkalov · 2023-10-26T13:59:15Z

On my old PC without AVX2: 17 tests from 1 test case ran. (15615 ms total)
The longest case is:

[ RUN      ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
.
[ PERFSTAT ]    (samples=10   mean=1273.04   median=1274.26   min=1261.88   stddev=6.78 (0.5%))

fengyuentau · 2023-10-27T06:37:50Z

@Abdurrahheem Could you collect the perf results from detail pages and fill your table in the first comment?

asmorkalov · 2023-10-27T11:30:14Z

@fengyuentau I propose to rerun the benchmark locally and update the PR. CI runs perf tests with single iteration and concurrently with other builds. The numbers are not reliable.

fengyuentau · 2023-10-30T08:28:48Z

ARM64: ~3.5s

[ RUN ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
[ PERFSTAT ] (samples=1 mean=3578.70 median=3578.70 min=3578.70 stddev=0.00 (0.0%))

X64: ~1.8s

[ RUN ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
[ PERFSTAT ] (samples=1 mean=1797.72 median=1797.72 min=1797.72 stddev=0.00 (0.0%))

Win-X64: ~7.6s

[ RUN ] Layer_Einsum.einsum/7, where GetParam() = Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 700, 900}, {1, 5, 900, 800}}
[ PERFSTAT ] (samples=1 mean=7660.13 median=7660.13 min=7660.13 stddev=0.00 (0.0%))

I propose to make it a smaller scale.

asmorkalov · 2023-11-03T05:45:39Z

@Abdurrahheem friendly reminder.

Abdurrahheem · 2023-11-03T19:13:04Z

@fengyuentau I propose to rerun the benchmark locally and update the PR. CI runs perf tests with single iteration and concurrently with other builds. The numbers are not reliable.

I am only able to test on Ubuntu locally currently due to lack of different platforms

Abdurrahheem · 2023-11-07T17:35:24Z

Updated the table with performance results.

fengyuentau

Thank you!

Einsum Layer Performance Test opencv#24445 ## This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs **Notation:** - WX: windows10_x64 - MX: macos_x64 - MA: macos_arm64 - UX: ubuntu_x64 - UA: ubuntu_arm64 All data in ms (milliseconds). Gemm is backend for matrix multiplication --- Benchmarks: | Equation | Inputs Mat Dims | UX (ms) | UA (ms) | MX (ms) | MA (ms) | WX (ms) | |-------------------------|-----------------------------------|----------------|---------|---------|---------|---------| | "ij, jk -> ik" | [2, 3], [3,2] | 0.04 ± 0.00 | - | - | - | - | | "ij, jk -> ik" | [20, 30], [30,20] | 0.08 ± 0.00 | - | - | - | - | | "ij, jk -> ik" | [113, 127], [127,113] | 2.41 ± 0.05 | - | - | - | - | | "imkj, injs -> imnks" | [1, 4, 7, 9], [1, 5, 9, 8] | 0.11 ± 0.00 | - | - | - | - | | "imkj, injs -> imnks" | [1, 4, 70, 90], [1, 5, 90, 80] | 15.49 ± 0.46 | - | - | - | - | | "imkj, injs -> imnks" | [1, 4, 73, 91], [1, 5, 91, 57] | 11.53 ± 0.06 | - | - | - | - | | "ij -> i" | [30, 40] | 0.03 ± 0.00 | - | - | - | - | | "ij -> i" | [113, 374] | 0.13 ± 0.00 | - | - | - | - | | "...ij -> ...i" | [30, 40] | 0.03 ± 0.00 | - | - | - | - | | "...ij -> ...i" | [113, 374] | 0.13 ± 0.00 | - | - | - | - | | "...ij, ...jk -> ...ik" | [40, 50], [50,80] | 0.37 ± 0.01 | - | - | - | - | | "...ij, ...jk -> ...ik" | [47, 51], [51, 83] | 0.43 ± 0.01 | - | - | - | - | ----- ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

Abdurrahheem added 2 commits October 24, 2023 19:18

1st iteration of perf_einsum

65cdc6d

working perf tests

ea84c2b

Abdurrahheem added the category: dnn label Oct 24, 2023

Abdurrahheem requested review from asmorkalov, dkurt and fengyuentau October 24, 2023 18:32

Abdurrahheem self-assigned this Oct 24, 2023

Abdurrahheem marked this pull request as ready for review October 24, 2023 18:33

added new tests

fac3d71

asmorkalov added the test label Oct 26, 2023

asmorkalov reviewed Oct 26, 2023

View reviewed changes

modules/dnn/perf/perf_einsum.cpp Outdated Show resolved Hide resolved

modules/dnn/perf/perf_einsum.cpp Outdated Show resolved Hide resolved

modules/dnn/perf/perf_einsum.cpp Outdated Show resolved Hide resolved

asmorkalov added this to the 4.9.0 milestone Oct 26, 2023

fix stdout format & more tests

44c322f

asmorkalov approved these changes Oct 26, 2023

View reviewed changes

fengyuentau reviewed Oct 26, 2023

View reviewed changes

modules/dnn/perf/perf_einsum.cpp Outdated Show resolved Hide resolved

PR fix

f6f543f

removed long tests

a0f90f6

Abdurrahheem mentioned this pull request Nov 8, 2023

Fast gemm for einsum #24509

Merged

6 tasks

fengyuentau approved these changes Nov 8, 2023

View reviewed changes

asmorkalov merged commit 9d0c8a9 into opencv:4.x Nov 8, 2023

asmorkalov mentioned this pull request Jan 19, 2024

5.x merge 4.x #24862

Merged

Uh oh!

Conversation

Abdurrahheem commented Oct 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR adds performance tests for Einsum Layer. See below results of performance test on different inputs

Pull Request Readiness Checklist

Uh oh!

fengyuentau commented Oct 25, 2023

Uh oh!

asmorkalov commented Oct 25, 2023

Uh oh!

asmorkalov commented Oct 25, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Oct 26, 2023

Uh oh!

fengyuentau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Abdurrahheem commented Oct 26, 2023

Uh oh!

asmorkalov commented Oct 26, 2023

Uh oh!

fengyuentau commented Oct 27, 2023

Uh oh!

asmorkalov commented Oct 27, 2023

Uh oh!

fengyuentau commented Oct 30, 2023

Uh oh!

asmorkalov commented Nov 3, 2023

Uh oh!

Abdurrahheem commented Nov 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Abdurrahheem commented Nov 7, 2023

Uh oh!

fengyuentau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Abdurrahheem commented Oct 24, 2023 •

edited

Loading

Abdurrahheem commented Nov 3, 2023 •

edited

Loading