[cv::transform] Enable CV_SIMD for the 16U case on AArch64.#19164
[cv::transform] Enable CV_SIMD for the 16U case on AArch64.#19164opencv-pushbot merged 1 commit intoopencv:3.4from
Conversation
alalek
left a comment
There was a problem hiding this comment.
This patch should go into 3.4 branch first.
We will merge changes from 3.4 into master regularly (weekly/bi-weekly).
Please:
- change "base" branch of this PR: master => 3.4 (use "Edit" button near PR title)
- rebase your commits from master onto 3.4 branch. For example:
git rebase -i --onto upstream/3.4 upstream/master
(check list of your commits, save and quit (Esc + "wq" + Enter)
whereupstreamis configured by following this GitHub guide and fetched (git fetch upstream). - push rebased commits into source branch of your fork (with
--forceoption)
Note: no needs to re-open PR, apply changes "inplace".
modules/core/src/matmul.simd.hpp
Outdated
| transform_32f( const float* src, float* dst, const float* m, int len, int scn, int dcn ) | ||
| { | ||
| #if CV_SIMD && !defined(__aarch64__) && !defined(_M_ARM64) | ||
| #if CV_SIMD |
There was a problem hiding this comment.
transform_32f
Should be transform_16u as stated here: #19163 (comment)
|
Hi - thank you for looking after my patch over the weekend. Sorry for not coming back to you earlier, but I have a strict rule that forbids me to enter my home office over the weekend :). First of all, thank you for fixing the commit. My local changes were covering both 32F and 16U when I ran the perf executables. I saw the results and decided to give it a go at upstreaming the 16U changes. The upstreaming for the 32F case was unintentional. One curiosity: how comes that the patch needed to be merged into 3.4 and not master? BTW, what would be the preferred communication channel of the OpenCV community to discuss development topic that are not bugs/perf issues? Is there a mailing list or a public discussion forum for developers? Kind regards! Francesco |
|
You need to read.
For the forum, there is exactly a place for you to discuss. It has been created recently, and the old one will be deprecated, and new one will be used. |
Performance uplift (x-factor) ranges from 2.20 to 2.50, on the following perf tests of the core module:
Mat_Transform::Size_MatTypeTransform::OCL_TransformFixtureThe associated issue is #19163.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request