build: prepare to build for ARM64 on Windows with Visual Studio by tomoaki0705 · Pull Request #24698 · opencv/opencv

tomoaki0705 · 2023-12-14T14:03:38Z

Sorry to post many PRs, but I came across with building for Windows on ARM64.
I faced several errors that assuming GCC/Clang on Aarch64.

Only assuming GCC/Clang option for Aarch64.

opencv/cmake/OpenCVCompilerOptimizations.cmake

Lines 352 to 356 in 3d1f646

    
           ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod") 
        
           ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON") 
        
           ocv_update(CPU_NEON_FP16_FLAGS_ON "-march=armv8.2-a+fp16") 
        
           ocv_update(CPU_NEON_FP16_IMPLIES "NEON") 
        
           ocv_update(CPU_NEON_BF16_FLAGS_ON "-march=armv8.2-a+fp16+bf16")

CMake will go through, but let's keep things correct.

Failure of try_compile due to missing pragmas

This is a typical pragma for Arm try_compile

opencv/cmake/checks/cpu_neon_fp16.cpp

Line 3 in 3d1f646

#if defined __GNUC__ && (defined __arm__ || defined __aarch64__)

and it ends up here

opencv/cmake/checks/cpu_neon_fp16.cpp

Lines 38 to 40 in 3d1f646

    
           #else 
        
           #error "FP16 is not supported" 
        
           #endif

As far as I checked on Visual Studio 2022, ARM64 can handle NEON, FP16, NEON_FP16 and NEON_DOTPROD.
I couldn't make it compile on NEON_BF16.

Fail in CV_FP16

The compiler can compile NEON and FP16, but it doesn't have a native single half type like on Arm + GCC.
And it hits this line, which is native x86_64.

opencv/modules/core/include/opencv2/core/cvdef.h

Lines 903 to 906 in 3d1f646

    
           #if CV_FP16 
        
               __m128 v = _mm_load_ss(&x); 
        
               w = (ushort)_mm_cvtsi128_si32(_mm_cvtps_ph(v, 0)); 
        
           #else

So for this issue, I need to rework the FP16, but for this PR, I remove this from the focus of this PR

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=ARMv7,ARMv8

tomoaki0705 · 2023-12-15T15:03:21Z

Hmm, I think I have to give up re-working the FP16 part.

`_ARM64_DISTINCT_NEON_TYPES`

Thanks to @shibayan in #21630, this macro seems to be important.
This macro is added in CMake directly

opencv/cmake/OpenCVCompilerOptions.cmake

Lines 336 to 338 in 3d1f646

    
             if(AARCH64 AND NOT MSVC_VERSION LESS 1930) 
        
               set(OPENCV_EXTRA_FLAGS "${OPENCV_EXTRA_FLAGS} /D _ARM64_DISTINCT_NEON_TYPES") 
        
             endif()

I found one comment on community of the Visual Studio

Note you are not intended to use it by directly defining the _ARM64_DISTINCT_NEON_TYPES symbol per this blog post.

And the blog post says following.

New ARM64 compiler flags: /Zc:arm64-aliased-neon-types- and /Zc:arm64-aliased-neon-types. When you pass /Zc:arm64-aliased-neon-types- to cl.exe, the compiler will treat NEON intrinsic types as distinct types for ARM64 as defined by the Procedure Call Standard for the Arm 64-bit Architecture, which is consistent with Clang and GCC.

Well, the quoted comment doesn't appear directly, so I "guess" the blog was updated after the comment.
Anyway, the original bug mentioned in #16027 was fixed in Visual Studio 2022 (17.2). and rather directly adding the macro definition, we should use /Zc:arm64-aliased-neon-types-

Still, there are other things to consider.
Changing the option doesn't help the situation.

`FP16` instructions disappear

I face the same situation as the previous comment

There is, however, still a compiler issue here: When using /Zc:arm64-aliased-neon-types-, all of the half-precision intrinsics are missing: vcvt_f32_f16, vcvt_f16_f32, etc.

I see the same issue.
Below is a portion of arm_neon.h on my VS 2022 (17.8)

#if !defined(_ARM64_DISTINCT_NEON_TYPES)
#define vget_lane_f16(Dm, lane)     neon_dups16((Dm), (lane))

#define vcvt_f32_f16(src)               __n128_to_float32x4_t(neon_fcvtl_32(__float16x4_t_to_n64(src)))

#define vcvt_f16_f32(src)               __n64_to_float16x4_t(neon_fcvtn_32(__float32x4_t_to_n128(src)))

Please note that specifying /Zc:arm64-aliased-neon-types- is equivalent to define _ARM64_DISTINCT_NEON_TYPES so all f16 functions will not appear and end up undeclared identifier/identifier not found

Reversing the switch neither works

The flag /Zc:arm64-aliased-neon-types- is actually an opt in version of /Zc:arm64-aliased-neon-types . There is a difference at the end of the option.
So reverting this flag will bring back fp16 instructions.
Still, this ends up in no vreinterpret_xx_xx instructions in arm64_neon.h

//vreinterpret
#if !defined(_ARM64_DISTINCT_NEON_TYPES)
#define vreinterpret_f32_s8(a)         (a)
#define vreinterpret_f32_s16(a)        (a)

So at this point, (it's the last thing I want to say, but) the VS + ARM64 is not fully ready, yet.
To check the situation, here's some test from godbolt

workaround

The only way which works is fulfilling all following options

Set either /D _ARM64_DISTINCT_NEON_TYPES or /Zc:arm64-aliased-neon-types- using cmake
Specify -DCPU_BASELINE=NEON instead of FP16 when configuring.
this will allow user to use every NEON instructions but FP16.

Additional notes

I tried both CMake 3.25.0 and 3.25.1, and at least in 3.25.0, there's a bug which fails to handle try_compile on Windows + ARM64.
Who anyone facing same issue, hope it helps you.

asmorkalov · 2023-12-18T16:59:21Z

@tomoaki0705 I merged the first your patch. Please rebase and fix conflicts.

tomoaki0705 · 2023-12-19T00:40:54Z

I pushed the resolved version.
For FP16 issue on ARM64, it's really not clear how the future Visual Studio will handle it in the future, so I left it as-is.
This means, by configuring as

cmake -G"Visual Studio 17 2022" -A ARM64 -DCMAKE_SYSTEM_NAME=Windows  -DCMAKE_SYSTEM_PROCESSOR=ARM64  ..

Will end up as an compile error

error C2065: '__m128': undeclared identifier

and we have to remember that it's a known issue.

I also had to update the ocv_cpu_aarch64_baseline_merge_feature_options

CMake Error at cmake/OpenCVCompilerOptimizations.cmake:494 (string):
  string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
  cmake/OpenCVCompilerOptimizations.cmake:600 (ocv_cpu_aarch64_baseline_merge_feature_options)
  cmake/OpenCVCompilerOptions.cmake:342 (include)
  CMakeLists.txt:656 (include)


CMake Error at cmake/OpenCVCompilerOptimizations.cmake:494 (string):
  string sub-command REPLACE requires at least four arguments.
Call Stack (most recent call first):
  cmake/OpenCVCompilerOptimizations.cmake:600 (ocv_cpu_aarch64_baseline_merge_feature_options)
  cmake/OpenCVCompilerOptions.cmake:342 (include)
  CMakeLists.txt:656 (include)

opencv-alalek added optimization category: build/install platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Dec 14, 2023

opencv-alalek added this to the 4.9.0 milestone Dec 14, 2023

asmorkalov requested a review from vpisarev December 14, 2023 15:45

tomoaki0705 force-pushed the support_arm64_windows branch from ae6850e to 5d4c507 Compare December 19, 2023 00:32

prepare to build for ARM64 on Windows with Visual Studio

b6ec9b9

tomoaki0705 force-pushed the support_arm64_windows branch from 5d4c507 to b6ec9b9 Compare December 19, 2023 00:40

vpisarev approved these changes Dec 19, 2023

View reviewed changes

asmorkalov assigned vpisarev Dec 19, 2023

asmorkalov merged commit 283407e into opencv:4.x Dec 19, 2023

tomoaki0705 deleted the support_arm64_windows branch December 19, 2023 11:15

tomoaki0705 mentioned this pull request Dec 22, 2023

cmake: brush up the cmake part of merging option #24745

Merged

6 tasks

asmorkalov mentioned this pull request Jan 19, 2024

5.x merge 4.x #24862

Merged

tomoaki0705 mentioned this pull request Feb 21, 2024

Windows ARM build is broken for FP16 arithmetics #25052

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build: prepare to build for ARM64 on Windows with Visual Studio#24698

build: prepare to build for ARM64 on Windows with Visual Studio#24698
asmorkalov merged 1 commit intoopencv:4.xfrom
tomoaki0705:support_arm64_windows

tomoaki0705 commented Dec 14, 2023 •

edited by opencv-alalek

Loading

Uh oh!

tomoaki0705 commented Dec 15, 2023 •

edited

Loading

Uh oh!

asmorkalov commented Dec 18, 2023

Uh oh!

tomoaki0705 commented Dec 19, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod")
	ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON")
	ocv_update(CPU_NEON_FP16_FLAGS_ON "-march=armv8.2-a+fp16")
	ocv_update(CPU_NEON_FP16_IMPLIES "NEON")
	ocv_update(CPU_NEON_BF16_FLAGS_ON "-march=armv8.2-a+fp16+bf16")

	#if CV_FP16
	__m128 v = _mm_load_ss(&x);
	w = (ushort)_mm_cvtsi128_si32(_mm_cvtps_ph(v, 0));
	#else

Uh oh!

Conversation

tomoaki0705 commented Dec 14, 2023 • edited by opencv-alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

tomoaki0705 commented Dec 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

_ARM64_DISTINCT_NEON_TYPES

FP16 instructions disappear

Reversing the switch neither works

workaround

Additional notes

Uh oh!

asmorkalov commented Dec 18, 2023

Uh oh!

tomoaki0705 commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tomoaki0705 commented Dec 14, 2023 •

edited by opencv-alalek

Loading

tomoaki0705 commented Dec 15, 2023 •

edited

Loading

`_ARM64_DISTINCT_NEON_TYPES`

`FP16` instructions disappear

tomoaki0705 commented Dec 19, 2023 •

edited

Loading