added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16)#24420
added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16)#24420vpisarev merged 6 commits intoopencv:4.xfrom
Conversation
… method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.
|
Looks like the mentioned features are supported in |
Link: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170 |
There is a header of actual auxv flags / values: https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/uapi/asm/hwcap.h#L106 Also some libraries parse |
| ocv_update(CPU_FP16_IMPLIES "NEON") | ||
| else() | ||
| ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD") | ||
| ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD;FP16_SIMD;BF16_SIMD") |
There was a problem hiding this comment.
FP16_SIMD;BF16_SIMD
How does scope of these instructions correlate with other platforms?
... and available universal intrinsics?
Until it is unclear, it is better to add ARM_ prefix.
…ions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively
|
@asmorkalov, @opencv-alalek, thank you for the comments and for the links! all your concerns have been addressed :) |
| ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON") | ||
| ocv_update(CPU_NEON_FP16_FLAGS_ON "-march=armv8.2-a+fp16") | ||
| ocv_update(CPU_NEON_FP16_IMPLIES "NEON") | ||
| ocv_update(CPU_NEON_BF16_FLAGS_ON "-march=armv8.2-a+fp16+bf16") |
…FP16, NEON_BF16) (opencv#24420) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces
…FP16, NEON_BF16) (opencv#24420) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces
…FP16, NEON_BF16) (opencv#24420) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces
Currently, platform-specific (NEON) code is required to make use of those instructions. Later on, maybe universal intrinsics for FP16 and BF16 arithmetics will be added. Note that even modern ARM platforms don't have full set of BF16 operations. This is mostly instructions to implement BF16xBF16 to FP32 matrix multiplication.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.