Add Intel hardware intrinsic APIs to CoreFX #23489

fiigii · 2017-08-22T22:14:57Z

This PR is the complete design of API Proposal: Add Intel hardware intrinsic functions and namespace #22940.

For building with the current code base, const parameter modifiers and [intrinsic] is temporarily removed.

@russellhadley @terrajobst @CarolEidt @mellinoe

fiigii · 2017-08-22T22:22:09Z

@AlexGhiondea @joperezr The building system setting is not correct yet, could you help me?

benaadams · 2017-08-22T22:31:47Z

Oh my! 😄 🎉

jkotas · 2017-08-22T22:51:57Z

src/System.Runtime.Intrinsics.X86/ref/System.Runtime.Intrinsics.X86.cs

+        public static ulong AndNot(ulong left, ulong right) { throw null; }
+        public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; }
+        public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; }
+        public static uint BitFieldExtract(uint value, uint contr


How did you decide which overloads should get the ref overload - e.g. why is there no ref T overload for StoreAligned or StoreAlignedNonTemporal?

I think it maybe best to skip the ref overloads in the initial iteration of this. There is number of open questions around them.

Currently, only the most common memory-access intrinsics (e.g., Load, Store, BroadcastVector128ToVector256) have ref T overloads, and all the alignment-aware (e.g., Load/StoreAligned, Extract, Insert, etc.) or performance-sensitive (i.e. Gether*) intrinsics just have pointer versions.

On modern Intel CPUs, these "most common memory-access intrinsics" have almost same performance over aligned and unaligned memory.

@jkotas you can't guarantee the location/alignment of a ref unless its fixed; when you'd have a pointer anyway?

I have picked bad examples. @pentp comment has better ones.

Also, the ref APIs are dangerous but they appear safe on the surface. I am not sure whether it is a good idea. I guess it is why the folks opted for not adding them during API review and preferred Span instead. They may deserve another look.

Can implement the Span on top of the ref and have the jit hoist checks for loops; whereas if on pointer with fixed it wouldn't be able to do that? (Also the Span versions can be in corefx; whereas if it was in coreclr it would have versioning issues effecting inlines? - am hazy on versioning inlines)

if on pointer with fixed it wouldn't be able to do that

Yes, you can do this. But I am not sure whether it is really needed. I think it would be fine to start with unsafe APIs only, tell folks to pin, and see where it does not work.

I am not sure whether the pining is really a problem for these sort of APIs. I won't be surprised if folks will want to pin in a lot of cases when using these APIs anyway because of they will want the alignment.

Also the Span versions can be in corefx

If Span versions are in corefx, they would have to be on a different type. Span is CoreLib for .NET Core, and this is .NET Core only feature, so I do not see any issues around this.

@fiigii Could you please delete the ref overloads? We should start with unsafe pointers only. Once we gain more experience with this, we can add ref or Span overloads if they are really needed.

@jkotas Sounds reasonable to me, will remove ref overloads.

justinvp · 2017-08-22T23:31:51Z

src/System.Runtime.Intrinsics.X86/ref/System.Runtime.Intrinsics.X86.cs

@@ -0,0 +1,1115 @@
+namespace System.Runtime.Intrinsics.X86


Needs a license header like the other ref files, e.g.:

corefx/src/System.Runtime/ref/System.Runtime.cs

Lines 1 to 6 in fd3004d

// Licensed to the .NET Foundation under one or more agreements.

// The .NET Foundation licenses this file to you under the MIT license.

// See the LICENSE file in the project root for more information.

// ------------------------------------------------------------------------------

// Changes to this file must follow the http://aka.ms/api-review process.

// ------------------------------------------------------------------------------

justinvp · 2017-08-22T23:32:12Z

src/System.Runtime.Intrinsics.X86/ref/System.Runtime.Intrinsics.X86.cs

+        public static ulong AndNot(ulong left, ulong right) { throw null; }
+        public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; }
+        public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; }
+        public static uint BitFieldExtract(uint value, uint contr


Should these have explicit values?

Could you explain a little bit more? Github differ does not seem to show the correct code location that you referred.

Not sure what happened with GitHub but these comments were for the enums.

Got it. Will add explicit values for enums.

justinvp · 2017-08-22T23:32:18Z

src/System.Runtime.Intrinsics.X86/ref/System.Runtime.Intrinsics.X86.cs

+        public static ulong AndNot(ulong left, ulong right) { throw null; }
+        public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; }
+        public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; }
+        public static uint BitFieldExtract(uint value, uint contr


Should these have explicit values?

justinvp · 2017-08-22T23:33:15Z

src/System.Runtime.Intrinsics.X86/src/System/Runtime/Intrinsics/X86/Avx.cs

@@ -0,0 +1,577 @@
+// =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+


Missing license header. Applies throughout.

// Licensed to the .NET Foundation under one or more agreements. // The .NET Foundation licenses this file to you under the MIT license. // See the LICENSE file in the project root for more information.

Thanks for pointing out. Will do.

justinvp · 2017-08-22T23:39:16Z

src/System.Runtime.Intrinsics.X86/src/System/Runtime/Intrinsics/X86/Lzcnt.cs

@@ -0,0 +1,27 @@
+// =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+//
+// Lzcnt.cs


Nit: Should these file headers be removed and replaced with a <Summary/> doc comments on the class? e.g.:

// Licensed to the .NET Foundation under one or more agreements. // The .NET Foundation licenses this file to you under the MIT license. // See the LICENSE file in the project root for more information. using System; using System.Runtime.Intrinsics; namespace System.Runtime.Intrinsics.X86 { /// <summary> /// A class that implements intrinsic functions to provide access to Intel LZCNT instructions. /// </summary> public static class Lzcnt { public static bool IsSupported { get; } // unsigned int _lzcnt_u32 (unsigned int a) public static uint LeadingZeroCount(uint value) { throw new NotImplementedException(); } // unsigned __int64 _lzcnt_u64 (unsigned __int64 a) public static ulong LeadingZeroCount(ulong value) { throw new NotImplementedException(); } } }

Yes let's please use XML docs. I don't know how directly these end up translating to documentation, but it's good to write them like they would go into the docs. So you might miss off "A class that..."

Shall I use XML docs for each function's comment, e.g., // unsigned int _lzcnt_u32 (unsigned int a) ?

pentp · 2017-08-22T23:59:17Z

Shouldn't there be ref overfloads for Sse3.LoadDqu and Avx.LoadDqu as these are specifically for unaligned loads?
And if alignment requirements allow then also Sse2.MaskMove, Sse2.StoreHigh, Sse2.StoreLow, Sse3.LoadAndDuplicate, Avx.MaskLoad, Avx2.ExtractVector128, Avx2.GatherVector*, Avx2.GatherMaskVector*, Avx2.MaskLoad, Avx2.MaskStore.

In general, ref overloads are useful for all instructions that don't explicitly forbid unaligned access because using fixed to pin memory does not align it and the subsequent pointer based overload is still going to do an unaligned access...

fiigii · 2017-08-23T00:12:53Z

Shouldn't there be ref overfloads for Sse3.LoadDqu and Avx.LoadDqu as these are specifically for unaligned loads?

Correct.

danmoseley · 2017-08-23T00:17:46Z

Unsafe code may only appear if compiling with /unsafe is fixed by <AllowUnsafeBlocks>true</AllowUnsafeBlocks> in the csproj.

tannergooding · 2017-08-23T00:23:15Z

src/System.Runtime.Intrinsics.X86/System.Runtime.Intrinsics.X86.sln

+		Release|x64 = Release|x64
+	EndGlobalSection
+	GlobalSection(ProjectConfigurationPlatforms) = postSolution
+		{53134B0C-0D57-481B-B84E-D1991E8D54FF}.Debug|x64.ActiveCfg = netcoreapp-Debug|x64


Shouldn't this be AnyCPU?

tannergooding · 2017-08-23T00:25:47Z

src/System.Runtime.Intrinsics.X86/ref/System.Runtime.Intrinsics.X86.csproj

+    <ProjectGuid>{650277B5-9423-4ACE-BB54-2659995B21C7}</ProjectGuid>
+    <IsPartialFacadeAssembly Condition="'$(TargetGroup)' == 'netfx' OR '$(TargetGroup)' == 'net46'">true</IsPartialFacadeAssembly>
+  </PropertyGroup>
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'net46-Debug|x64'" />


Currently, building system setting is not correct yet, and I am not familiar with this. Need you guys' help 😄

fiigii · 2017-08-23T00:25:58Z

And if alignment requirements allow then also Sse2.MaskMove, Sse2.StoreHigh, Sse2.StoreLow, Sse3.LoadAndDuplicate, Avx.MaskLoad, Avx2.ExtractVector128, Avx2.GatherVector*, Avx2.GatherMaskVector*, Avx2.MaskLoad, Avx2.MaskStore.

I suggest that just have ref T overloads on Load, Store, and Broadcast at the initial iteration.

Additionally, although certain instructions "don't explicitly forbid unaligned access", we design some of the intrinsics to be alignment-aware. For example, sometimes 32-byte alignment cannot guarantee (but 16-byte alignment can), so AVX 256-bit memory load/store would cause much more cache line split. Developers can use Avx.Insert/ExtractVector128 to load/store data 128-bit by 128-bit to get better performance.

tannergooding · 2017-08-23T00:27:27Z

src/System.Runtime.Intrinsics.X86/src/System/Runtime/Intrinsics/X86/Aes.cs

+{
+    public static class Aes 
+    {
+        public static bool IsSupported { get; }


It would be useful to have a comment indicating what the corresponding CPUID check is...

Do you think it is necessary to managed language users?

I would find it useful to be documented here, at the very least so I don't have to go delving into the CoreCLR repo to find the actual check.

I think it is something some users will care about and may help future contributors find the correct location to add new/missing APIs.

tannergooding · 2017-08-23T00:28:21Z

src/System.Runtime.Intrinsics.X86/src/System/Runtime/Intrinsics/X86/Enums.cs

+
+    public enum FloatComparisonMode : byte
+    {
+        EqualOrderedNonSignaling,


Is the order here important? If so, should that be documented?

Yes, will do.

tannergooding · 2017-08-23T00:29:35Z

src/System.Runtime.Intrinsics.X86/src/System/Runtime/Intrinsics/X86/Sse2.cs

+    /// SSE2 class provides access to 128-bit SSE/SSE2 SIMD instructions
+    /// </para>
+
+    public static class Sse2


Did we end up determining whether it was better to have these explicitly separated into an Sse and Sse2 class?

Post x64 are there any CPU that don't have both? Does coreclr run on an x86/x64 CPU without Sse2?

Does coreclr run on an x86/x64 CPU without Sse2?

I do not think so. As I know, the codegen bottom-line of CoreCLR is SSE2.

RyuJIT requires both. However, this is more of a consistency thing than anything else. There was some discussion on the other thread as well with arguments for both sides (joining or splitting).

Personally, I want a clear distinction between each instruction set and think they should not be combined just because the runtime CoreFX is mostly closely tied to requires it.

I also feel that combining them blurs the lines of the requirements for the design of future additions or architectures.

tannergooding · 2017-08-23T00:31:09Z

src/System.Runtime.Intrinsics/src/System/Runtime/Intrinsics/Types.cs

+{
+    // 128 bit types
+    [StructLayout(LayoutKind.Sequential, Size = 16)]
+    public struct Vector128<T> where T : struct 


Should these be in their own files (one for Vector128, one for Vector256)?

How does C#/CoreFX convention deal with this?

Maybe they should be in their own files that is good for extending to future ISA support (i.e. Vector512<T>)?

I think the convention is generally one type per file, but it isn't explicitly documented in the coding style guidelines.

Maybe someone else can comment?

Do you plan to have the actual implementation in CoreLib? If yes, you should not need this file at all.

@jkotas These intrinsic APIs will be implemented in S.P.CoreLib. Do you mean that we do not need all the files under System.Runtime.Intrinsics/src/ and System.Runtime.Intrinsics.X86/src/ in CoreFX?

If these APIs will be implemented in S.P.CoreLib, you do not need the files under src. You just need the files under ref.

Got it. Will change code to address feedback, then move code under src to mscorlib.

tannergooding · 2017-08-23T00:31:32Z

Did we end up determining what to do about scalar overloads?

This refers to both adding them (which will be required if we want managed implementations of the System.Math functions) and the naming convention (since both sqrtss and sqrtps take and return a Vector128<float>).

stephentoub · 2017-08-23T02:07:25Z

src/System.Runtime.Intrinsics.X86/src/System/Runtime/Intrinsics/X86/Sse42.cs

+        public static bool CompareImplicitLength(Vector128<byte> left, Vector128<byte> right, ResultsFlag flag, StringComparisonMode mode)
+        {
+            throw new NotImplementedException(); 
+        }


Nit: the formatting is different in some cases, e.g. everything on one line versus split across multiple lines... is there a reason for when you've used one or the other?

Will make them consistent with everything on one line.

stephentoub · 2017-08-23T02:08:03Z

src/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs

+    [StructLayout(LayoutKind.Sequential, Size = 16)]
+    public struct Vector128<T> where T : struct 
+    {
+        public static bool IsSupported {get;}


We always have the implementations in the refs throw rather than returning dummy values, e.g.

public static bool IsSupported { get { throw null; } }

Thanks! Will fix.

4creators · 2017-08-23T21:45:52Z

Pattern Function<T> used for several method definitions could be problematic as generic parameter restriction to struct does not eliminate invalid generic parameters i.e. ExtractInt<RSAParameters> - this syntax is valid with current definition and code will compile but code is invalid at runtime - Jit will be unable to generate correct intrinsic translation to processor instruction. This is a general problem which pops up in all cases where generic parameter in method definition is restricted to struct only. IMO it would be much better to implement all allowed overloads i.e. for byte, sbyte, short, ushort, int, uint, long, ulong, what would increase number of methods but would eliminate all method misuses. With that change users would not be able to write code which compiles but is invalid during runtime.

4creators · 2017-08-23T21:37:30Z

src/System.Runtime.Intrinsics.X86/src/System/Runtime/Intrinsics/X86/Avx.cs

+        public static long ExtractLong<T>(Vector256<T> value, byte index) where T : struct { throw new NotImplementedException(); }
+        // __int64 _mm256_extract_epi64 (__m256i a, const int index)
+        public static ulong ExtractUlong<T>(Vector256<T> value, byte index) where T : struct { throw new NotImplementedException(); }
+


Pattern ExtractXxx<T> could be problematic as generic parameter restriction to struct does not eliminate invalid method uses i.e. ExtractInt<RSAParameters> - this syntax is valid with current definition but it is invalid at runtime. This is a general problem which pops up in all cases where generic parameter in method definition is used and restricted to struct only.

fiigii · 2017-08-23T22:03:56Z

@4creators The generic type parameters only can be "numerical types" (e.g., byte, short, long, etc), which is same as System.Numerics.Vector<T>. At least, the runtime will restrict passed type parameters correctly.

4creators · 2017-08-23T22:23:13Z

The generic type parameters only can be "numerical types" (e.g., byte, short, long, etc), which is same as System.Numerics.Vector<T>

@fiigii The code in PR does not introduce that restriction so I would agree with the above statement only to the extent that it is an implicit assumption. If it would be implemented in way it is done in Vector<T> implementation it would be OK but unfortunately only to some extent.

The problem with this code is that each generic method may accept different groups of "numerical types" i.e. ExtractUlong<T> in principle would require "numerical integral types" generic parameters while some other methods would require "numerical floating types" and essentially it is quite difficult to implement that kind of generic parameters restrictions on a method by method basis. Unfortunately current C# syntax is not very helpful here.

One can assume that even in the case of ExtractUlong<T> method use of floats as T would be OK as conversion on binary level will succeed but in reality it may lead to unexpected and hard to detect errors. Bcs of this I would still be for changes to the implementation. Anyway base method definition in C/C++ intrinsics requires that parameters to that function would be integral types -> __m256i

__int64 _mm256_extract_epi64 (__m256i a, const int index);

At least, the runtime will restrict passed type parameters correctly.

I do not think that it is OK to defer errors to runtime - there were long discussions about that problem and conclusion was to enforce correctness at compile time - this is why there was an implementation problem of immediate values passed to some intrinsics. Common ground was to get compile time support fro immediate values (const or attribute) from Roslyn.

fiigii · 2017-08-23T22:35:41Z

The code in PR does not introduce that restriction

@4creators Thanks for pointing out. We need to clarify it in documents.

sharwell · 2017-09-01T17:48:54Z

❓ Can someone explain the aversion to ref Vector128<T>? I came in late (from vacation) and see that these overloads were apparently removed but pointer versions were kept. It seems like for most of the operations the opposite is what we want (pointer versions unnecessary; ref Vector128<T>/etc sufficient and preferred).

If we can get the same performance and improved clarity/safety from properly expressing the reference semantics, why bother bloating the metadata with pointer versions of the signatures?

jkotas · 2017-09-01T17:55:23Z

The overloads that were removed were ref T overloads, e.g. Vector128<T> Load<T>(ref T vector).

ref Vector128<T> overloads would be fine with me.

sharwell · 2017-09-01T18:08:48Z

The overloads that were removed were ref T overloads

Hmm. I would only expect to see ref T overloads in cases like broadcast, where only a single element will be referenced. Most¹ of the pointers in the current signatures can be replaced by either ref T or ref Vector128<T>/ref Vector256<T>, and currently I don't see any downside of this.

¹ One could easily argue that signatures involving a pointer from which an offset is taken, e.g. gather, should stay as pointers. I would have no problem with that. My concerns are limited to cases where the referenced memory is exactly the memory at the pointer location.

mellinoe · 2017-09-01T18:14:54Z

Most¹ of the pointers in the current signatures can be replaced by either ref T or ref Vector128/ref Vector256, and currently I don't see any downside of this.

The obvious downside is that you can no longer pass in pointers to those functions. To some extent they are equivalent, but it requires you to use extra conversion operations which you might like to avoid, depending on the surrounding code. I don't see a problem with adding them on top, but it's also something we could consider adding later after we understand the usage patterns a bit better.

sharwell · 2017-09-01T18:21:23Z

The obvious downside is that you can no longer pass in pointers to those functions.

The required conversion would be a NOP in the generated X86 code (post-JIT), but would appear as an additional call to an intrinsic in metadata. The question I have is, in code that uses pointers and intrinsics, how many times would you actually need to make the conversion? It seems most cases would require very few conversions.

fiigii · 2017-09-01T18:39:28Z

The question I have is, in code that uses pointers and intrinsics, how many times would you actually need to make the conversion?

For Load and Store, I am pretty sure that the conversion will be required in most of the situations. The most common SIMD programming pattern is: load a bunch of data from memory to vectors and write back results to memory after SIMD computing complete. Original data in memory is usually organized with its owen format, which is not aware of the SIMD Vector128/256<T>.

fiigii · 2017-09-01T21:34:14Z

Update

Separate SSE intrinsics from Sse2 class. New class Sse added.

sharwell · 2017-09-01T22:34:58Z

Original data in memory is usually organized with its owen format

@fiigii Methods supporting this statement would not be affected by my request. My request specifically targets cases where the target of a load or store does match the layout for the vector.

fiigii · 2017-09-08T15:48:25Z

@dotnet-bot test this please

mmitche · 2017-09-08T21:55:34Z

@dotnet-bot test this please (accidentlaly deleted VMs while in use)

fiigii · 2017-09-08T23:51:05Z

@dotnet-bot test Linux x64 Release Build

fiigii · 2017-09-11T05:31:15Z

@mellinoe I have addressed all the feedback and this PR passed all tests. Could you merge it?

mellinoe · 2017-09-11T17:47:59Z

src/System.Runtime.Intrinsics.X86/src/Configurations.props

+<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+  <PropertyGroup>
+    <BuildConfigurations>
+      netcoreapp-Windows_NT;


Did we need both Windows_NT and Unix configurations for some reason? I think this should work with just a "netcoreapp" configuration. Other than that it looks good to me to merge.

This change is from @weshaggard 's suggestion #23489 (comment) , for "just a 'netcoreapp' configuration" causes build errors.

Thanks for the reminder. That makes sense.

tannergooding · 2017-09-11T18:38:25Z

🎉 🎈 🎂

ghost · 2018-02-25T13:43:26Z

@fiigii congrats, look like it is amazing work!
I jump in many months after and I have to say that I struggle to understand the usage of this whole feature, do you have a sample somewhere demonstrating a basic usage for a Vector3 or 4 type for instance?
Thanks

fiigii · 2018-02-26T08:19:19Z

do you have a sample somewhere demonstrating a basic usage for a Vector3 or 4 type for instance?

@noahfalk Thank you for you interest on the project. Don't you mean the Vector128<T> and Vector256<T>? If yes, I suggest you to google "SIMD programming" that would give you some examples in C++ and you can treat Vector128<T> as __m128/__m128d/__m128i and Vector256<T> as __m256/__m256d/__m256i in C++.
Here is a good serial of SIMD programming tutorials https://felix.abecassis.me/?s=sse

ghost · 2018-02-26T08:29:57Z

@fiigii thanks for answering! Yes, I know the SIMD assembly counterpart, I wasn't just aware of the purpose of these new types Vector128<T> and Vector256<T>.
So I'm supposed to use the intrinsics methods in System.Runtime.Intrinsics.X86 namespace to manipulate these types, right?

What I meant by Vector3 is if I want to declare a new Vector3 type containing X, Y and Z properties: how should I do?

Now, based on your answer, I believe that I should do something like:

defining a struct
the X, Y, Z as properties
Storing internally an array of 3 floats
Use the S.R.I.X86 methods to do all the loading/storing/arithmetic jobs.

Right?

I'm looking forward to port my 3D Math lib with these new feature and also the struct ref and Span<T> (for List of Vector3 for instance), the speed bump should be great!

fiigii · 2018-02-26T08:54:24Z

new types Vector128 and Vector256.
So I'm supposed to use the intrinsics methods in System.Runtime.Intrinsics.X86 namespace to manipulate these types, right?

Yes.

What I meant by Vector3 is if I want to declare a new Vector3 type containing X, Y and Z properties: how should I do?

You have two ways:

AoS (array of structures), you can use Vector128<T> to contain X, Y, Z and manipulate these vectors via System.Runtime.Intrinsics.X86 intrinsics (or you can use System.Numerics.Vector3 and related intrinsics as well that is a serial of "higher-level" abstract APIs).
SoA (structure of arrays), you can use three Vector128<T> or Vector256<T> to contains 4/8 3D float vectors, which leverages wider-SIMD-registers on newer architectures (AVX/AVX2).

Now, based on your answer, I believe that I should do something like:
Use the S.R.I.X86 methods to do all the loading/storing/arithmetic jobs.

Usually, "defining a struct" may not necessary, and S.R.I.X86 provides unsafe load/store intrinsics that operate over raw pointers.

ghost · 2018-02-26T10:04:35Z

Ok, I understand the intend better now: I should use System.Numerics for common arithmetic, but if I want to perform custom SIMD x86 based code in safe mode, I can rely on your work.
Sorry but I had a lot to catch up with these kind of features in C#.

Btw, with a simple program like this:

using System.Runtime.Intrinsics.X86;
namespace test01
{
    class Program
    {
        static void Main(string[] args)
        {
            var sse = Sse.IsSupported;
            var sse2 = Sse2.IsSupported;
            var sse3 = Sse3.IsSupported;
            var sse41 = Sse41.IsSupported;
            var sse42 = Sse42.IsSupported;
            var avx = Avx.IsSupported;
            var avx2 = Avx2.IsSupported;

            Console.WriteLine($"SSE Supported: {sse}, SSE2 Supported: {sse2}, SSE3 Supported: {sse3}, SSE41 Supported: {sse41}, SSE42 Supported: {sse42}, AVX Supported: {avx}, AVX2 Supported: {avx2}");

            Debugger.Break();
            var v1 = new System.Numerics.Vector4(1, 2, 3, 4);
            var v2 = new System.Numerics.Vector4(10, 11, 12, 12);
            var res = System.Numerics.Vector4.Add(v1, v2);

            Console.WriteLine(res.X);
        }
    }
}

I got only SSE with IsSupported equals true, all the others are false but my CPU (i7-6700) does support them all, any idea? (project is .net Core 2.1.0-preview2-26131-06) and the code runs in x64.

(maybe I should create a dedicated GitHub Issue, please advise)

fiigii · 2018-02-26T10:11:43Z

I got only SSE with IsSupported equals true, all the others are false but my CPU (i7-6700) does

Since other ISA classes are not yet fully implemented in the JIT compiler. If you want to use other implemented intrinsic, please set the environment variable COMPlus_EnableIncompleteISAClass=1.

ghost · 2018-02-26T10:48:51Z

Sorry to bother, but I've set the System Env variable COMPlus_EnableIncompleteISAClass=1 and ran my project under a shell command with the variable on and it's not working better.
I've also set the debug/Env variable setting in Visual Studio, doesn't work either.

fiigii · 2018-02-26T10:52:46Z

@nockawa Sorry, I forgot to say that the variable COMPlus_EnableIncompleteISAClass=1 only work with debug build of CoreCLR. So that you need to download and build CoreCLR from source code, rather than install .NET Core SDK. https://github.com/dotnet/coreclr

Add Intel hardware intrinsic APIs to CoreFX Commit migrated from dotnet/corefx@651372e

dnfclas added the cla-already-signed label Aug 22, 2017

karelz added the area-System.Runtime label Aug 22, 2017

karelz assigned fiigii and joperezr Aug 22, 2017

jkotas reviewed Aug 22, 2017

View reviewed changes

justinvp reviewed Aug 22, 2017

View reviewed changes

tannergooding reviewed Aug 23, 2017

View reviewed changes

stephentoub reviewed Aug 23, 2017

View reviewed changes

4creators reviewed Aug 23, 2017

View reviewed changes

Add Intel hardware intrinsic APIs to CoreFX

06c518b

mellinoe reviewed Sep 11, 2017

View reviewed changes

mellinoe merged commit 651372e into dotnet:master Sep 11, 2017

karelz added this to the 2.1.0 milestone Oct 11, 2017

fiigii mentioned this pull request Apr 20, 2018

Fix SSE2 SetZeroVector128 on float type dotnet/coreclr#17691

Closed

fiigii mentioned this pull request Jan 31, 2020

API Proposal: Add Intel hardware intrinsic functions and namespace dotnet/runtime#23057

Closed

ektrah mentioned this pull request Jan 31, 2020

S.R.CS.Unsafe: Aligned vs Unaligned method name suffix dotnet/runtime#23505

Closed

CarolEidt mentioned this pull request Jan 31, 2020

Intel HW Intrinsics classes should use inheritance dotnet/runtime#25953

Closed

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/corefx#23489 from fiigii/IntelIntrinsic

fb58506

Add Intel hardware intrinsic APIs to CoreFX Commit migrated from dotnet/corefx@651372e

	// Licensed to the .NET Foundation under one or more agreements.
	// The .NET Foundation licenses this file to you under the MIT license.
	// See the LICENSE file in the project root for more information.
	// ------------------------------------------------------------------------------
	// Changes to this file must follow the http://aka.ms/api-review process.
	// ------------------------------------------------------------------------------

		@@ -0,0 +1,577 @@
		// =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+

Add Intel hardware intrinsic APIs to CoreFX #23489

Add Intel hardware intrinsic APIs to CoreFX #23489

Uh oh!

Conversation

fiigii commented Aug 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fiigii commented Aug 22, 2017

Uh oh!

benaadams commented Aug 22, 2017

Uh oh!

jkotas Aug 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fiigii Aug 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danmoseley Aug 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pentp commented Aug 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fiigii commented Aug 23, 2017

Uh oh!

danmoseley commented Aug 23, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fiigii commented Aug 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

fiigii commented Aug 22, 2017 •

edited

Loading

jkotas Aug 22, 2017 •

edited

Loading

fiigii Aug 22, 2017 •

edited

Loading

danmoseley Aug 23, 2017 •

edited

Loading

pentp commented Aug 22, 2017 •

edited

Loading

fiigii commented Aug 23, 2017 •

edited

Loading

benaadams Aug 23, 2017 •

edited

Loading