Skip to content

Fix compilation on arm64 with FP16 when disabled#24203

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
thesamesam:arm64-fp16
Sep 4, 2023
Merged

Fix compilation on arm64 with FP16 when disabled#24203
asmorkalov merged 1 commit intoopencv:4.xfrom
thesamesam:arm64-fp16

Conversation

@thesamesam
Copy link
Copy Markdown
Contributor

If building with -mcpu=native or any other setting which implies the current CPU has FP16 but with intrinsics disabled, we mistakenly try to use it even though convolution.hpp conditionally defines it correctly based on whether we should use it. convolution.cpp on the other hand was mismatched and trying to use it if the CPU supported it, even if not enabled in the build system.

Make the guards match.

Bug: https://bugs.gentoo.org/913031

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@thesamesam thesamesam force-pushed the arm64-fp16 branch 2 times, most recently from 2f3a130 to f6211de Compare August 28, 2023 06:46
@zihaomu zihaomu requested review from vpisarev and zihaomu and removed request for vpisarev August 28, 2023 11:39
@zihaomu
Copy link
Copy Markdown
Member

zihaomu commented Aug 28, 2023

Thanks for your contribution. How about doing the following:
https://github.com/opencv/opencv/blob/4.x/modules/dnn/src/layers/cpu_kernels/convolution.hpp#L17

#if define(__ARM_FEATURE_FP16_VECTOR_ARITHMETIC) && CV_FP16 // check FP16 FMA.
#define CONV_ARM_FP16 1
#endif

So that, we can use a single flag to determine if the user wants to enable FP16.

If building with -mcpu=native or any other setting which implies the current
CPU has FP16 but with intrinsics disabled, we mistakenly try to use it even
though convolution.hpp conditionally defines it correctly based on whether
we should *use it*. convolution.cpp on the other hand was mismatched and
trying to use it if the CPU supported it, even if not enabled in the build
system.

Make the guards match.

Bug: https://bugs.gentoo.org/913031
Signed-off-by: Sam James <sam@gentoo.org>
@thesamesam
Copy link
Copy Markdown
Contributor Author

Ah, good idea, thank you! Done.

Copy link
Copy Markdown
Member

@zihaomu zihaomu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants