added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16) by vpisarev · Pull Request #24420 · opencv/opencv

vpisarev · 2023-10-17T23:11:12Z

Currently, platform-specific (NEON) code is required to make use of those instructions. Later on, maybe universal intrinsics for FP16 and BF16 arithmetics will be added. Note that even modern ARM platforms don't have full set of BF16 operations. This is mostly instructions to implement BF16xBF16 to FP32 matrix multiplication.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

… method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.

asmorkalov · 2023-10-18T06:47:11Z

Looks like the mentioned features are supported in /proc/self/auxv. No need to rework the approach.
See profiles from ARM: https://developer.arm.com/downloads/-/exploration-tools/feature-names-for-a-profile and constants for HWCAP: https://www.kernel.org/doc/html/v6.1/arm64/elf_hwcaps.html

asmorkalov · 2023-10-18T06:51:20Z

The SIGILL and SIGTERM signals aren't generated under Windows. They're included for ANSI compatibility. Therefore, you can set signal handlers for these signals by using signal, and you can also explicitly generate these signals by calling [raise](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/raise?view=msvc-170).

Link: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170

opencv-alalek · 2023-10-18T09:14:33Z

/proc/self/auxv

There is a header of actual auxv flags / values: https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/uapi/asm/hwcap.h#L106

Also some libraries parse /proc/cpuinfo (but auxv is preferable)

opencv-alalek · 2023-10-18T09:22:32Z

cmake/OpenCVCompilerOptimizations.cmake

    ocv_update(CPU_FP16_IMPLIES "NEON")
  else()
-    ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD")
+    ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD;FP16_SIMD;BF16_SIMD")


FP16_SIMD;BF16_SIMD

How does scope of these instructions correlate with other platforms?
... and available universal intrinsics?

Until it is unclear, it is better to add ARM_ prefix.

…ions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively

vpisarev · 2023-10-18T19:06:15Z

@asmorkalov, @opencv-alalek, thank you for the comments and for the links! all your concerns have been addressed :)

tomoaki0705 · 2023-12-06T23:46:58Z

cmake/OpenCVCompilerOptimizations.cmake

    ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON")
+    ocv_update(CPU_NEON_FP16_FLAGS_ON "-march=armv8.2-a+fp16")
+    ocv_update(CPU_NEON_FP16_IMPLIES "NEON")
+    ocv_update(CPU_NEON_BF16_FLAGS_ON "-march=armv8.2-a+fp16+bf16")


Do we really need to combine +bf16 and +fp16? @vpisarev
I'm discussing this on #24588 and I'd like to hear your original thoughts about this line.

…FP16, NEON_BF16) (opencv#24420) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces

vpisarev added 2 commits October 18, 2023 02:06

added more or less cross-platform (based on POSIX signal() semantics)…

1d94c39

… method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.

hopefully fixed compile errors

a00824b

vpisarev changed the title ~~added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect of various modern NEON instructions~~ added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect various modern NEON instructions Oct 18, 2023

vpisarev changed the title ~~added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect various modern NEON instructions~~ added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect some modern NEON instructions Oct 18, 2023

vpisarev added 2 commits October 18, 2023 04:02

continue to fix CI

c3ed12b

another attempt to fix build on Linux aarch64

d3f0488

asmorkalov added category: core platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc pr: Discussion Required labels Oct 18, 2023

opencv-alalek reviewed Oct 18, 2023

View reviewed changes

* reverted to the original method to detect special arm neon instruct…

85e5ec3

…ions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively

vpisarev changed the title ~~added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect some modern NEON instructions~~ added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16) Oct 18, 2023

removed extra whitespaces

eb5ecc9

vpisarev merged commit ba4d6c8 into opencv:4.x Oct 18, 2023

opencv-alalek added optimization category: build/install labels Oct 18, 2023

opencv-alalek added this to the 4.9.0 milestone Oct 18, 2023

asmorkalov mentioned this pull request Nov 3, 2023

(5.x) Merge 4.x #24486

Merged

vpisarev deleted the fp16bf16_arithm branch November 20, 2023 01:03

tomoaki0705 reviewed Dec 6, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16)#24420

added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16)#24420
vpisarev merged 6 commits intoopencv:4.xfrom
vpisarev:fp16bf16_arithm

vpisarev commented Oct 17, 2023 •

edited

Loading

Uh oh!

asmorkalov commented Oct 18, 2023

Uh oh!

asmorkalov commented Oct 18, 2023

Uh oh!

opencv-alalek commented Oct 18, 2023

Uh oh!

opencv-alalek Oct 18, 2023

Uh oh!

vpisarev Oct 18, 2023

Uh oh!

vpisarev commented Oct 18, 2023

Uh oh!

tomoaki0705 Dec 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

vpisarev commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Oct 18, 2023

Uh oh!

asmorkalov commented Oct 18, 2023

Uh oh!

opencv-alalek commented Oct 18, 2023

Uh oh!

opencv-alalek Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

vpisarev Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

vpisarev commented Oct 18, 2023

Uh oh!

tomoaki0705 Dec 6, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vpisarev commented Oct 17, 2023 •

edited

Loading