Convert demosaic functions to HAL (2.6x faster for VSX)#15764
Convert demosaic functions to HAL (2.6x faster for VSX)#15764opencv-pushbot merged 1 commit intoopencv:3.4from ChipKerchner:demosaicingToHal
Conversation
|
Thank you for working on this! |
I'm not sure why this compilation is taking and timing out after almost 2 hrs. I've compiled it on gcc for x86, Power, and ARM without problem. I've also run opencv_test_imgproc on all 3 platforms without issue. Any suggestions? |
|
@ChipKerchner Sorry, It is CI failure. |
savuor
left a comment
There was a problem hiding this comment.
Universal intrinsics code is correct since it's directly translated from SSE2.
By the way, did you compare that NEON code performance vs universal intrinsics performance on the same platform?
Although NEON intrinsics code is more compact and intuitive, the less code to support we have the better.
I have no way on comparing NEON code performance - only correctness. If someone else can see if the NEON or the HAL code is faster on an ARM, please do so. If HAL is faster, I'll eliminate the NEON code. |
savuor
left a comment
There was a problem hiding this comment.
OK, let's approve that
When we find out that NEON code is slower we may remove it by separate PR.
|
👍 |
|
Great! I'll probably have a 2nd PR for the other sections of demosiacing at a later time. |
Convert demosaic functions to HAL (2.6x faster for VSX).
Added bayerRGBA to vectorized functions (missing from x86) - plus added alpha to parameter list.
Leaving NEON code in since it has much better instructions for interleaving data - plus added HAL bayerRGB_EA for NEON.