Skip to content

build: prepare to build for ARM64 on Windows with Visual Studio#24698

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
tomoaki0705:support_arm64_windows
Dec 19, 2023
Merged

build: prepare to build for ARM64 on Windows with Visual Studio#24698
asmorkalov merged 1 commit intoopencv:4.xfrom
tomoaki0705:support_arm64_windows

Conversation

@tomoaki0705
Copy link
Copy Markdown
Contributor

@tomoaki0705 tomoaki0705 commented Dec 14, 2023

Sorry to post many PRs, but I came across with building for Windows on ARM64.
I faced several errors that assuming GCC/Clang on Aarch64.

  1. Only assuming GCC/Clang option for Aarch64.

ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod")
ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON")
ocv_update(CPU_NEON_FP16_FLAGS_ON "-march=armv8.2-a+fp16")
ocv_update(CPU_NEON_FP16_IMPLIES "NEON")
ocv_update(CPU_NEON_BF16_FLAGS_ON "-march=armv8.2-a+fp16+bf16")

CMake will go through, but let's keep things correct.

  1. Failure of try_compile due to missing pragmas

This is a typical pragma for Arm try_compile

#if defined __GNUC__ && (defined __arm__ || defined __aarch64__)

and it ends up here

#else
#error "FP16 is not supported"
#endif

As far as I checked on Visual Studio 2022, ARM64 can handle NEON, FP16, NEON_FP16 and NEON_DOTPROD.
I couldn't make it compile on NEON_BF16.

  1. Fail in CV_FP16

The compiler can compile NEON and FP16, but it doesn't have a native single half type like on Arm + GCC.
And it hits this line, which is native x86_64.

#if CV_FP16
__m128 v = _mm_load_ss(&x);
w = (ushort)_mm_cvtsi128_si32(_mm_cvtps_ph(v, 0));
#else

So for this issue, I need to rework the FP16, but for this PR, I remove this from the focus of this PR

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=ARMv7,ARMv8

@opencv-alalek opencv-alalek added optimization category: build/install platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Dec 14, 2023
@opencv-alalek opencv-alalek added this to the 4.9.0 milestone Dec 14, 2023
@asmorkalov asmorkalov requested a review from vpisarev December 14, 2023 15:45
@tomoaki0705
Copy link
Copy Markdown
Contributor Author

tomoaki0705 commented Dec 15, 2023

Hmm, I think I have to give up re-working the FP16 part.

_ARM64_DISTINCT_NEON_TYPES

Thanks to @shibayan in #21630, this macro seems to be important.
This macro is added in CMake directly

if(AARCH64 AND NOT MSVC_VERSION LESS 1930)
set(OPENCV_EXTRA_FLAGS "${OPENCV_EXTRA_FLAGS} /D _ARM64_DISTINCT_NEON_TYPES")
endif()

I found one comment on community of the Visual Studio

Note you are not intended to use it by directly defining the _ARM64_DISTINCT_NEON_TYPES symbol per this blog post.

And the blog post says following.

New ARM64 compiler flags: /Zc:arm64-aliased-neon-types- and /Zc:arm64-aliased-neon-types. When you pass /Zc:arm64-aliased-neon-types- to cl.exe, the compiler will treat NEON intrinsic types as distinct types for ARM64 as defined by the Procedure Call Standard for the Arm 64-bit Architecture, which is consistent with Clang and GCC.

Well, the quoted comment doesn't appear directly, so I "guess" the blog was updated after the comment.
Anyway, the original bug mentioned in #16027 was fixed in Visual Studio 2022 (17.2). and rather directly adding the macro definition, we should use /Zc:arm64-aliased-neon-types-

Still, there are other things to consider.
Changing the option doesn't help the situation.

FP16 instructions disappear

I face the same situation as the previous comment

There is, however, still a compiler issue here: When using /Zc:arm64-aliased-neon-types-, all of the half-precision intrinsics are missing: vcvt_f32_f16, vcvt_f16_f32, etc.

I see the same issue.
Below is a portion of arm_neon.h on my VS 2022 (17.8)

#if !defined(_ARM64_DISTINCT_NEON_TYPES)
#define vget_lane_f16(Dm, lane)     neon_dups16((Dm), (lane))

#define vcvt_f32_f16(src)               __n128_to_float32x4_t(neon_fcvtl_32(__float16x4_t_to_n64(src)))

#define vcvt_f16_f32(src)               __n64_to_float16x4_t(neon_fcvtn_32(__float32x4_t_to_n128(src)))

Please note that specifying /Zc:arm64-aliased-neon-types- is equivalent to define _ARM64_DISTINCT_NEON_TYPES so all f16 functions will not appear and end up undeclared identifier/identifier not found

Reversing the switch neither works

The flag /Zc:arm64-aliased-neon-types- is actually an opt in version of /Zc:arm64-aliased-neon-types . There is a difference at the end of the option.
So reverting this flag will bring back fp16 instructions.
Still, this ends up in no vreinterpret_xx_xx instructions in arm64_neon.h

//vreinterpret
#if !defined(_ARM64_DISTINCT_NEON_TYPES)
#define vreinterpret_f32_s8(a)         (a)
#define vreinterpret_f32_s16(a)        (a)

So at this point, (it's the last thing I want to say, but) the VS + ARM64 is not fully ready, yet.
To check the situation, here's some test from godbolt

workaround

The only way which works is fulfilling all following options

  • Set either /D _ARM64_DISTINCT_NEON_TYPES or /Zc:arm64-aliased-neon-types- using cmake
  • Specify -DCPU_BASELINE=NEON instead of FP16 when configuring.
    this will allow user to use every NEON instructions but FP16.

Additional notes

I tried both CMake 3.25.0 and 3.25.1, and at least in 3.25.0, there's a bug which fails to handle try_compile on Windows + ARM64.
Who anyone facing same issue, hope it helps you.

@asmorkalov
Copy link
Copy Markdown
Contributor

@tomoaki0705 I merged the first your patch. Please rebase and fix conflicts.

@tomoaki0705 tomoaki0705 force-pushed the support_arm64_windows branch from ae6850e to 5d4c507 Compare December 19, 2023 00:32
@tomoaki0705
Copy link
Copy Markdown
Contributor Author

tomoaki0705 commented Dec 19, 2023

I pushed the resolved version.
For FP16 issue on ARM64, it's really not clear how the future Visual Studio will handle it in the future, so I left it as-is.
This means, by configuring as

cmake -G"Visual Studio 17 2022" -A ARM64 -DCMAKE_SYSTEM_NAME=Windows  -DCMAKE_SYSTEM_PROCESSOR=ARM64  ..

Will end up as an compile error

error C2065: '__m128': undeclared identifier

and we have to remember that it's a known issue.

I also had to update the ocv_cpu_aarch64_baseline_merge_feature_options

CMake Error at cmake/OpenCVCompilerOptimizations.cmake:494 (string):
  string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
  cmake/OpenCVCompilerOptimizations.cmake:600 (ocv_cpu_aarch64_baseline_merge_feature_options)
  cmake/OpenCVCompilerOptions.cmake:342 (include)
  CMakeLists.txt:656 (include)


CMake Error at cmake/OpenCVCompilerOptimizations.cmake:494 (string):
  string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
  cmake/OpenCVCompilerOptimizations.cmake:600 (ocv_cpu_aarch64_baseline_merge_feature_options)
  cmake/OpenCVCompilerOptions.cmake:342 (include)
  CMakeLists.txt:656 (include)


@tomoaki0705 tomoaki0705 force-pushed the support_arm64_windows branch from 5d4c507 to b6ec9b9 Compare December 19, 2023 00:40
@asmorkalov asmorkalov merged commit 283407e into opencv:4.x Dec 19, 2023
@tomoaki0705 tomoaki0705 deleted the support_arm64_windows branch December 19, 2023 11:15
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: build/install optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants