Conversation
| FastGemmOpt opt; | ||
| opt.init(); |
There was a problem hiding this comment.
I propose to make it layer-wide, but not initialize each time.
|
How do I understand your collected results? Is it faster or slower compared with |
Sorry. I added description. Inference is faster with FastGemm |
82f1c13 to
b470932
Compare
|
armv7 neon (jetson tk1): |
|
x86 without AVX2: |
|
So fastGemm integration totally makes sense. I propose to Extract platform detection and run it once in layer constructor. Other things looks good to me. |
fixes to performace test
In my last comment I have fixed tests issues mentioned by @dkurt and fixed platform detection. Should I get results with new platform detection or can we merge the PR without it. In my opinion there will not be too much of a difference in terms of results, so we can merge the PR |
|
@dkurt Do you have other remarks? If no, I propose to merge the PR after constructors fix. |
| for (size_t i = 0; i < output.size(); i++) { | ||
| Mat output_slice = output_buffer.row(i); | ||
| output[i].copyTo(output_slice); | ||
| } |
There was a problem hiding this comment.
Tried using concat. For some reason it fails the inference. Can you suggest the showcase the usage you had in your mind?
There was a problem hiding this comment.
Excuse me, vconcat for sure:
// ...
output.emplace_back(tmp_output.reshape(1, 1));
// ...
Mat output_buffer;
cv::vconcat(output, output_buffer);
int outputDim[] = {static_cast<int>(output.size()), M, N};
output_buffer = output_buffer.reshape(1, 3, &outputDim[0]);…encv into ash/dev_einsum_fast_gemm
|
@Abdurrahheem, branch was pushed to origin by mistake: https://github.com/opencv/opencv/tree/ash/dev_einsum_fast_gemm Please do locally: To avoid pushing to OpenCV: |
…_gemm Fast gemm for einsum opencv#24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
…_gemm Fast gemm for einsum opencv#24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
…_gemm Fast gemm for einsum opencv#24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs
Notation:
All data in ms (milliseconds).
Gemm is backend for matrix multiplication
Benchmarks: (arrow indicates increase in inference speed compared to einsum with gemm)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.