Fix compilation on some ARM architectures.#22080
Conversation
modules/core/src/system.cpp
Outdated
| have[CV_CPU_NEON] = true; | ||
| #endif | ||
| #if (defined __ARM_FP && (((__ARM_FP & 0x2) != 0) && defined __ARM_NEON__)) | ||
| #if (defined __ARM_FP && (((__ARM_FP & 0x2) != 0) && (defined __ARM_NEON__ || (defined __ARM_NEON && defined __aarch64__)))) |
There was a problem hiding this comment.
If condition is reused then it makes sense to make setting of this flag as nested code.
#if (defined __ARM_NEON__ || (defined __ARM_NEON && defined __aarch64__))
have[CV_CPU_NEON] = true;
#if (defined __ARM_FP && ((__ARM_FP & 0x2) != 0)
have[CV_CPU_FP16] = true;
#endif
#endif
There was a problem hiding this comment.
Things are getting complicated for Armv7/Armv8 and build/run time.
__ARM_NEON__ and __ARM_NEON is defined in Armv7, but only __ARM_NEON is defined in Arm8
Moreover, what the predefined macro express is if the compiler is ready for NEON/FP16 instructions and intended to enable them, when the have array express is if they are available on actual device at run time.
It's much more easier to separate the block in Armv7 part and Armv8 part like following.
#elif (defined __clang__ || defined __APPLE__)
#if defined __arm__
// Armv7
// Check if NEON/FP16 instructions are provided on the processor (not by compiler)
// i.e. something equivalent to check at line 563 or 581
have[CV_CPU_NEON] = some_flags & some_check1;
have[CV_CPU_FP16] = some_flags & some_check2;
#elif defined __aarch64__
// Armv8
have[CV_CPU_NEON] = true;
have[CV_CPU_FP16] = true;
// It's guaranteed that NEON/FP16 are supported on Armv8 processor
// So this flag can be decided at compile time.
#endif
#endif
Now, there are places to argue, since this section is clearly for iOS.
Since the variety of processor is limited and relatively few, you may go and check EVERY iPhone/iPad processor ever made and if EVERY processor supports NEON/FP16, it's safe to enable the flag at compile time.
This part is quite a mess, and you are trying to place another rock on an unstable piled stone tower.
If you are trying to modify this part, I suggest to totally re-write this (for iOS) part.
There was a problem hiding this comment.
This condition looks strange: https://github.com/opencv/opencv/blame/4.6.0/modules/core/src/system.cpp#L607
#elif (defined __clang__ || defined __APPLE__)
Perhaps modern Apple platforms are already NEON/FP16 capable.
No idea how __clang__ check could help with detection.
Code has been added here (8+ years): https://github.com/opencv/opencv/blame/bdb088dccab11c9031e92414e02a0964ff614562/modules/core/src/system.cpp#L340
@vrabaud Do you observe issues with Apple? Or with other ARM platforms?
There was a problem hiding this comment.
The condition looks strange.
Yes, that's true.
Perhaps modern Apple platforms are already NEON/FP16 capable.
I've done a quick search and the last "Armv7 iPhone" is iPhone5C, which is released on 2013
ref: https://en.wikipedia.org/wiki/IPhone_5C
First released September 20, 2013
CPU 1.3 GHz dual core 32-bit ARMv7-A "Swift"
Probably it's safe to drop "32bit Arm iPhone" support, but I vote to keep the support situation, since it's 3.4 branch, and drop it in "next" branch
No idea how clang check could help with detection.
It's not helping, probably it's better to refactor this part.
Also, the build for latest apple M1 & M2 arrives here, so it's a quite big surgery to refactor this part.
So, I feel the same question to ask
@vrabaud Do you observe issues with Apple? Or with other ARM platforms?
If @vrabaud couldl describe more in detail about the issue you are facing, that would help to decide which way to go.
There was a problem hiding this comment.
For reference, I think I found the correct API to call on iOS
API sysctlbyname
Detail of API
Sample on SO
According to the sample code
#if defined __arm__
// Armv7
size_t size;
int neon;
int half;
size = sizeof(neon);
sysctlbyname("hw.optional.AdvSIMD", &neon, &size, NULL, 0);
sysctlbyname("hw.optional.AdvSIMD_HPFPCvt", &half, &size, NULL, 0);
have[CV_CPU_NEON] = neon == 1;
have[CV_CPU_FP16] = half == 1;
#elif defined __aarch64__
// Armv8
have[CV_CPU_NEON] = true;
have[CV_CPU_FP16] = true;
#endif
But still, covering with __clang__ && __APPLE__ might not work due to Mac OS on Apple M1&M2 CPU, which makes things more complicated....
There was a problem hiding this comment.
Hi, my initial issue is that I have an internal ARM toolchain that defines __ARM_NEON but not __ARM_NEON__ (which is legacy according to https://gcc.gnu.org/legacy-ml/gcc-patches/2015-01/msg02717.html). My problem has nothing to do with iOS.
There was a problem hiding this comment.
Right, thank you
If possible, could you describe more about your platform please?
Probably is better to create another "if" section for your platform rather than populating this iOS part
Any specific macro to hook your platform?
Is it Armv8 or Armv7?
There was a problem hiding this comment.
I unfortunately have no idea, sorry, this is all internal. Could a fix then be the following where we leave __APPLE__ untouched but just modify __clang__ ?
#elif (defined __APPLE__)
#if (defined __ARM_NEON__ || (defined __ARM_NEON && defined __aarch64__))
have[CV_CPU_NEON] = true;
#endif
#if (defined __ARM_FP && (((__ARM_FP & 0x2) != 0) && defined __ARM_NEON__))
have[CV_CPU_FP16] = true;
#endif
#elif (defined __clang__)
#if (defined __ARM_NEON__ || (defined __ARM_NEON && defined __aarch64__))
have[CV_CPU_NEON] = true;
#if (defined __ARM_FP && ((__ARM_FP & 0x2) != 0))
have[CV_CPU_FP16] = true;
#endif
#endif
#endif
There was a problem hiding this comment.
I don't have a right to decide, but I vote for making another ifdef at this part.
No one knows what side effect could happen when we modify the iOS part, so better to keep it as-is for now.
Doesn't you compiler accepts -dM -E option? which should dump hundreds of macros.
If you have anything specific and unique, that will help the affect to be minimum.
There was a problem hiding this comment.
I pushed the updated fix.
This condition is the same as the line above.
alalek
left a comment
There was a problem hiding this comment.
LGTM 👍
@tomoaki0705 Thank you for the review!
This condition is the same as the line above.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request