Skip to content

Convert demosaic functions to HAL (2.6x faster for VSX)#15764

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
ChipKerchner:demosaicingToHal
Oct 25, 2019
Merged

Convert demosaic functions to HAL (2.6x faster for VSX)#15764
opencv-pushbot merged 1 commit intoopencv:3.4from
ChipKerchner:demosaicingToHal

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

@ChipKerchner ChipKerchner commented Oct 23, 2019

Convert demosaic functions to HAL (2.6x faster for VSX).

Added bayerRGBA to vectorized functions (missing from x86) - plus added alpha to parameter list.

Leaving NEON code in since it has much better instructions for interleaving data - plus added HAL bayerRGB_EA for NEON.

force_builders=linux,docs,Custom,Linux AVX2,ARMv7
buildworker:Custom=linux-1
build_image:Custom=mips64el

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 24, 2019

Thank you for working on this!
Could you please take a look on compilation issues?

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

ChipKerchner commented Oct 24, 2019

Could you please take a look on compilation issues?

I'm not sure why this compilation is taking and timing out after almost 2 hrs. I've compiled it on gcc for x86, Power, and ARM without problem. I've also run opencv_test_imgproc on all 3 platforms without issue.

Any suggestions?

@alalek
Copy link
Copy Markdown
Member

alalek commented Oct 24, 2019

@ChipKerchner Sorry, It is CI failure.

Copy link
Copy Markdown
Contributor

@savuor savuor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Universal intrinsics code is correct since it's directly translated from SSE2.

By the way, did you compare that NEON code performance vs universal intrinsics performance on the same platform?
Although NEON intrinsics code is more compact and intuitive, the less code to support we have the better.

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

By the way, did you compare that NEON code performance vs universal intrinsics performance on the same platform?
Although NEON intrinsics code is more compact and intuitive, the less code to support we have the better.

I have no way on comparing NEON code performance - only correctness. If someone else can see if the NEON or the HAL code is faster on an ARM, please do so. If HAL is faster, I'll eliminate the NEON code.

@savuor savuor self-requested a review October 25, 2019 11:57
Copy link
Copy Markdown
Contributor

@savuor savuor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's approve that
When we find out that NEON code is slower we may remove it by separate PR.

@savuor savuor self-assigned this Oct 25, 2019
@savuor
Copy link
Copy Markdown
Contributor

savuor commented Oct 25, 2019

👍

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

Great! I'll probably have a 2nd PR for the other sections of demosiacing at a later time.

@opencv-pushbot opencv-pushbot merged commit c46f119 into opencv:3.4 Oct 25, 2019
@alalek alalek mentioned this pull request Oct 29, 2019
@ChipKerchner ChipKerchner deleted the demosaicingToHal branch November 5, 2019 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants