Skip to content

Add arm64EC pipeline. Disable all SSE usage for ARM64EC#2491

Closed
cbezault wants to merge 6 commits intomicrosoft:mainfrom
cbezault:arm64ec3
Closed

Add arm64EC pipeline. Disable all SSE usage for ARM64EC#2491
cbezault wants to merge 6 commits intomicrosoft:mainfrom
cbezault:arm64ec3

Conversation

@cbezault
Copy link
Contributor

@cbezault cbezault commented Jan 19, 2022

Add ARM64EC to the pipeline. We do this via the pre-existing VCLIBS_TARGET_ARCHITECTURE cmake variable.
To compile ARM64EC code one uses the ARM64 targeting compiler and passes /arm64EC.
We also need to explicitly pass /machine:arm64ec to the linker.

When compiling vector_algorithms.cpp some of the intrinsics did not have specialized EC versions available. Either way we shouldn't rely on the emulation of SSE intrinsics.

Resolves #2310.

@cbezault cbezault requested a review from a team as a code owner January 19, 2022 22:49
@StephanTLavavej StephanTLavavej added the infrastructure Related to repository automation label Jan 20, 2022
@StephanTLavavej

This comment has been minimized.

Copy link
Contributor

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't "approve" without seeing this run, but what's here so far looks good and I want to set my "last review" point.

_Advance_bytes(_First, 16);
} while (_First != _Stop_at);
}
#endif // !defined(_M_ARM64EC)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If _M_ARM64EC contains only fallback, it should not be called at all; instead _USE_STD_VECTOR_ALGORITHMS should be set to 0, like for _M_HYBRID.

Maybe they should even not exist, like for _M_ARM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mimic the behavior of _M_HYBRID I just put this PR out as it was to get the idea out there.

Copy link
Contributor

@strega-nil-ms strega-nil-ms May 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree with this; I believe that arm64EC should be treated like ARM64 for purposes of vector instructions, since it is "just" ARM64 code with an emulator-friendly ABI.

However, I'll note that we currently don't have vector algorithms for ARM64, so the outcome should be the same.

@cbezault
Copy link
Contributor Author

We really should consider creating NEON versions of all of these vector algorithms for ARM64.

@AlexGuteniev
Copy link
Contributor

We really should consider creating NEON versions of all of these vector algorithms for ARM64.

Tracked as #813

(I'm not participating in it as I don't own an ARM)

@CaseyCarter
Copy link
Contributor

(I'm not participating in it as I don't own an ARM)

Write all the code you like - but it will cost you an ARM and a leg!

@StephanTLavavej
Copy link
Member

I've pushed a conflict-free merge with main plus the new pool from #2496 so we can begin to gather test results here.

@StephanTLavavej
Copy link
Member

Ok, looks like we've got actionable build failures. Example:

FAILED: stl/CMakeFiles/msvcp_objects.dir/src/_toupper.cpp.obj 
C:\PROGRA~1\MICROS~1\2022\Preview\VC\Tools\MSVC\1431~1.311\bin\Hostx64\arm64\cl.exe  /nologo /TP -DCRTDLL2 -DNTDDI_VERSION=NTDDI_WIN10 -DSTRICT -DWIN32_LEAN_AND_MEAN -D_ALLOW_ITERATOR_DEBUG_LEVEL_MISMATCH -D_ARM64_ -D_CRTBLD -D_CRT_DECLARE_NONSTDC_NAMES=1 -D_CRT_STDIO_ARBITRARY_WIDE_SPECIFIERS -D_DLL -D_HAS_OLD_IOSTREAMS_MEMBERS=1 -D_STL_CONCRT_SUPPORT -D_STL_WIN32_WINNT=0x0A00 -D_VCRT_ALLOW_INTERNALS -D_VCRT_WIN32_WINNT=0x0A00 -D_WIN32_WINNT=0x0A00 -I"C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.31103\crt\src\vcruntime" -I"C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.31103\crt\src\concrt" -IC:\a\1\s\stl\inc /arm64EC /diagnostics:caret /W4 /WX /w14265 /w15038 /d1FastFail /guard:cf /Z7 /Gm- /Gy /Zp8 /std:c++latest /permissive- /Zc:threadSafeInit- /Zl /ZH:SHA_256 /O2 /Os /GL /EHsc /showIncludes /Fostl\CMakeFiles\msvcp_objects.dir\src\_toupper.cpp.obj /Fdstl\CMakeFiles\msvcp_objects.dir\ /FS -c C:\a\1\s\stl\src\_toupper.cpp
C:\Program Files\Microsoft Visual Studio\2022\Preview\VC\Tools\MSVC\14.31.31103\include\intrin.h(962,18): fatal error C1083: Cannot open include file: 'softintrin.h': No such file or directory
        #include <softintrin.h>
                 ^

This looks "good" in the sense that I see /arm64EC being passed to Hostx64\arm64\cl.exe so I think this is a real failure involving intrinsics, and not installer-related stuff.

@cbezault
Copy link
Contributor Author

Interesting, I didn't see that failure locally.

@cbezault
Copy link
Contributor Author

On my personal machine the first Windows SDK that has softintrin.h is 10.0.20348.

@StephanTLavavej
Copy link
Member

Currently we aren't using WinSDK 10.0.20348.0 because Clang 12 targeting ARM64 was incompatible with it. LLVM-51128 was fixed and I am 95% confident that the fix got into Clang 13.0.1. I'm uncertain whether even newer WinSDKs are affected.

@StephanTLavavej
Copy link
Member

I've pushed another merge with main to use the final toolset.

@StephanTLavavej
Copy link
Member

I believe we'll need Clang 14 and update to the latest WinSDK before this PR can make progress. (In theory Clang 13.0.1 should be sufficient, but I don't expect VS to pick up that point release.)

@cbezault
Copy link
Contributor Author

cbezault commented May 18, 2022

Seems like this is unblocked because we're using a newer WinSDK now?

@StephanTLavavej
Copy link
Member

Good catch - VS picked up Clang 13.0.1 and we were able to upgrade to the Win11 SDK 22000. I've removed the blocked label; now merge conflicts need to be resolved.

@StephanTLavavej StephanTLavavej added the ARM64 Related to the ARM64 architecture label May 25, 2022
Copy link
Contributor

@strega-nil-ms strega-nil-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than that, I am of the opinion that the correct thing here is to instead say that ARM64EC ⇒ ¬_USE_STD_VECTOR_ALGORITHMS for now, until we implement vector algorithms for ARM64 in which case we should use the ARM64 vector algorithms.

To me, this means that vector_algorithms.cpp:13 should be changed:

-#if defined(_M_IX86) || defined(_M_X64)
+// equivalent to _USE_STD_VECTOR_ALGORITHMS in <xutility>
+#if (defined(_M_IX86) || defined(_M_X64)) && !defined(_M_CEE_PURE) && !defined(_M_HYBRID) && !defined(_M_ARM64EC)

and then change xutility:24

-#if (defined(_M_IX86) || defined(_M_X64)) && !defined(_M_CEE_PURE) && !defined(_M_HYBRID)
+#if (defined(_M_IX86) || defined(_M_X64)) && !defined(_M_CEE_PURE) && !defined(_M_HYBRID) && !defined(_M_ARM64EC)

add_library(libcpmt${FLAVOR_SUFFIX}_eha OBJECT ${EHA_SOURCES})
target_compile_definitions(libcpmt${FLAVOR_SUFFIX}_eha PRIVATE "${THIS_CONFIG_DEFINITIONS};_ANNOTATE_VECTOR")
target_compile_options(libcpmt${FLAVOR_SUFFIX}_eha PRIVATE "${THIS_CONFIG_COMPILE_OPTIONS};/EHa")
set_target_properties(libcpmt${FLAVOR_SUFFIX}_eha PROPERTIES STATIC_LIBRARY_OPTIONS "/machine:${VCLIBS_TARGET_ARCHITECTURE}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcpmt*_eha is not a static library, it's an object library (hence should not require STATIC_LIBRARY_OPTIONS)

@strega-nil-ms
Copy link
Contributor

@cbezault are you interested in finishing this PR? otherwise, I'll mark as draft and maybe take it up myself.

@strega-nil-ms strega-nil-ms removed their assignment May 26, 2022
@strega-nil-ms strega-nil-ms marked this pull request as draft May 31, 2022 17:55
@strega-nil-ms
Copy link
Contributor

Marking as draft since there's been no response.

@cbezault
Copy link
Contributor Author

I'm interested in finishing this. I do not agree that ARM64EC should imply ARM64 for the sake of determining if we use the vector algorithms or not. In my other PR that actually adds some native ARM64 vector algorithms I found that binaries that were already running in x64 mode actually performed better when calling into x64 code with emulated SSE intrinsics. This is likely because of the overhead in translating between the x64 and ARM64EC abis as well as things like the Adjuster Thunks outlined in the ARM64EC abi doc. I'm curious to know if there is a way to provide both an ARM64EC variant of these functions and also an x64 variant in the same lib.

@StephanTLavavej
Copy link
Member

Thanks again for working on this PR. Because we have severely limited maintainer capacity for the foreseeable future, and this PR has steadily become outdated relative to main, we're going to close it without merging. #2310 still tracks the need for ARM64EC coverage (and #820 tracks the need for ARM64 proper).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64 Related to the ARM64 architecture infrastructure Related to repository automation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make ARM64EC built on GitHub

6 participants