Vulkan backend for NaryEltwiseLayer in DNN module#24768
Vulkan backend for NaryEltwiseLayer in DNN module#24768asmorkalov merged 13 commits intoopencv:4.xfrom
Conversation
|
Hi @Haosonn, thanks for your contribution!
Yes. Previously patch of vulkan, I just focused on the Integrated graphics. Our Vulkan backend still needs a lot of optimization. In my opinion, the first priority is supporting more layers, so that we could reduce the number of calling
It's hard to do so, we can not predict if the next layer of NaryEltwiseLayer was supported by Vulkan. Some fast transfer strategy like MNN's vulkan, they have two different implementations: VkBuffer and VkImage. And the VkImage is much faster on data transfering of GPU-CPU. |
fengyuentau
left a comment
There was a problem hiding this comment.
@zihaomu Please review this PR as well.
4ae98b5 to
836f0d1
Compare
|
Several tests failed:
Also see https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/105934/steps/test_objdetect/logs/stdio, which looks like memory issues. |
|
@Haosonn @fengyuentau please rebase and fix conflicts. |
add several test cases Update Update Update Update Update
add a preheat calculation
& uncomment some operators in OpNary constructor
|
@zihaomu @fengyuentau Could you take a look again? |
fengyuentau
left a comment
There was a problem hiding this comment.
LGTM 👍 Thanks for the contribution!
zihaomu
left a comment
There was a problem hiding this comment.
Thanks for your contribution! 👍
Vulkan backend for NaryEltwiseLayer in DNN module opencv#24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <IskXCr@outlook.com>
|
This patch cause FP16 test failures: #24954 |
|
I see performance degradation for this test case with 1/2/4 threads (no threading in implementation anyway) on 12700K:
To reviewers: PRs with optimization or other non-trivial implementation changes should have attached performance reports. |
Pow is not supported yet in Vulkan backend. So I guess something else happened? |
|
There is regression on CPU, not Vulkan. |
|
It looks weirder to me that this patch did very limited changes on the CPU implementation but yet affected the CPU performance, specifically Pow only. Let me investigate it. |
|
Update: Oh, I see, use @opencv-alalek Do you know how to force opencv_perf_* running 100 samples? I found they can run 10 to 100 samples, which may lead to some mistakes. |
|
@fengyuentau , there is TEST_CYCLE_N(100)
{
…
}Or you may use Sorry, I missed the thing that you already found |
We improve Vulkan backend for
NaryEltwiseLayerin DNN module by:NaryEltwiseLayercontext.cppCurrently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function
copyToHost, and we are going to fix that byfind out the best
VkMemoryPropertyfor various discrete GPUsprevent
copyToHostin middle layers during forwarding, (i.e keep data in GPU memory)Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.