Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@fiigii
Copy link
Contributor

@fiigii fiigii commented Aug 22, 2017

This PR is the complete design of API Proposal: Add Intel hardware intrinsic functions and namespace #22940.

For building with the current code base, const parameter modifiers and [intrinsic] is temporarily removed.

@russellhadley @terrajobst @CarolEidt @mellinoe

@fiigii
Copy link
Contributor Author

fiigii commented Aug 22, 2017

@AlexGhiondea @joperezr The building system setting is not correct yet, could you help me?

@benaadams
Copy link
Member

Oh my! 😄 🎉

public static ulong AndNot(ulong left, ulong right) { throw null; }
public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; }
public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; }
public static uint BitFieldExtract(uint value, uint contr
Copy link
Member

@jkotas jkotas Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you decide which overloads should get the ref overload - e.g. why is there no ref T overload for StoreAligned or StoreAlignedNonTemporal?

I think it maybe best to skip the ref overloads in the initial iteration of this. There is number of open questions around them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only the most common memory-access intrinsics (e.g., Load, Store, BroadcastVector128ToVector256) have ref T overloads, and all the alignment-aware (e.g., Load/StoreAligned, Extract, Insert, etc.) or performance-sensitive (i.e. Gether*) intrinsics just have pointer versions.

Copy link
Contributor Author

@fiigii fiigii Aug 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On modern Intel CPUs, these "most common memory-access intrinsics" have almost same performance over aligned and unaligned memory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas you can't guarantee the location/alignment of a ref unless its fixed; when you'd have a pointer anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have picked bad examples. @pentp comment has better ones.

Also, the ref APIs are dangerous but they appear safe on the surface. I am not sure whether it is a good idea. I guess it is why the folks opted for not adding them during API review and preferred Span instead. They may deserve another look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can implement the Span on top of the ref and have the jit hoist checks for loops; whereas if on pointer with fixed it wouldn't be able to do that? (Also the Span versions can be in corefx; whereas if it was in coreclr it would have versioning issues effecting inlines? - am hazy on versioning inlines)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if on pointer with fixed it wouldn't be able to do that

Yes, you can do this. But I am not sure whether it is really needed. I think it would be fine to start with unsafe APIs only, tell folks to pin, and see where it does not work.

I am not sure whether the pining is really a problem for these sort of APIs. I won't be surprised if folks will want to pin in a lot of cases when using these APIs anyway because of they will want the alignment.

Also the Span versions can be in corefx

If Span versions are in corefx, they would have to be on a different type. Span is CoreLib for .NET Core, and this is .NET Core only feature, so I do not see any issues around this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiigii Could you please delete the ref overloads? We should start with unsafe pointers only. Once we gain more experience with this, we can add ref or Span overloads if they are really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas Sounds reasonable to me, will remove ref overloads.

@@ -0,0 +1,1115 @@
namespace System.Runtime.Intrinsics.X86
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a license header like the other ref files, e.g.:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
// ------------------------------------------------------------------------------
// Changes to this file must follow the http://aka.ms/api-review process.
// ------------------------------------------------------------------------------

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

public static ulong AndNot(ulong left, ulong right) { throw null; }
public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; }
public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; }
public static uint BitFieldExtract(uint value, uint contr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these have explicit values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain a little bit more? Github differ does not seem to show the correct code location that you referred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what happened with GitHub but these comments were for the enums.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will add explicit values for enums.

public static ulong AndNot(ulong left, ulong right) { throw null; }
public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; }
public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; }
public static uint BitFieldExtract(uint value, uint contr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these have explicit values?

@@ -0,0 +1,577 @@
// =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing license header. Applies throughout.

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out. Will do.

@@ -0,0 +1,27 @@
// =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
//
// Lzcnt.cs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should these file headers be removed and replaced with a <Summary/> doc comments on the class? e.g.:

// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System;
using System.Runtime.Intrinsics;

namespace System.Runtime.Intrinsics.X86
{
    /// <summary>
    /// A class that implements intrinsic functions to provide access to Intel LZCNT instructions.
    /// </summary>
    public static class Lzcnt
    {
        public static bool IsSupported { get; }

        // unsigned int _lzcnt_u32 (unsigned int a)
        public static uint LeadingZeroCount(uint value) { throw new NotImplementedException(); }
        // unsigned __int64 _lzcnt_u64 (unsigned __int64 a)
        public static ulong LeadingZeroCount(ulong value) { throw new NotImplementedException(); }
    }
}

Copy link
Member

@danmoseley danmoseley Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes let's please use XML docs. I don't know how directly these end up translating to documentation, but it's good to write them like they would go into the docs. So you might miss off "A class that..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I use XML docs for each function's comment, e.g., // unsigned int _lzcnt_u32 (unsigned int a) ?

@pentp
Copy link

pentp commented Aug 22, 2017

Shouldn't there be ref overfloads for Sse3.LoadDqu and Avx.LoadDqu as these are specifically for unaligned loads?
And if alignment requirements allow then also Sse2.MaskMove, Sse2.StoreHigh, Sse2.StoreLow, Sse3.LoadAndDuplicate, Avx.MaskLoad, Avx2.ExtractVector128, Avx2.GatherVector*, Avx2.GatherMaskVector*, Avx2.MaskLoad, Avx2.MaskStore.

In general, ref overloads are useful for all instructions that don't explicitly forbid unaligned access because using fixed to pin memory does not align it and the subsequent pointer based overload is still going to do an unaligned access...

@fiigii
Copy link
Contributor Author

fiigii commented Aug 23, 2017

Shouldn't there be ref overfloads for Sse3.LoadDqu and Avx.LoadDqu as these are specifically for unaligned loads?

Correct.

@danmoseley
Copy link
Member

Unsafe code may only appear if compiling with /unsafe is fixed by <AllowUnsafeBlocks>true</AllowUnsafeBlocks> in the csproj.

Release|x64 = Release|x64
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{53134B0C-0D57-481B-B84E-D1991E8D54FF}.Debug|x64.ActiveCfg = netcoreapp-Debug|x64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be AnyCPU?

<ProjectGuid>{650277B5-9423-4ACE-BB54-2659995B21C7}</ProjectGuid>
<IsPartialFacadeAssembly Condition="'$(TargetGroup)' == 'netfx' OR '$(TargetGroup)' == 'net46'">true</IsPartialFacadeAssembly>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'net46-Debug|x64'" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AnyCPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, building system setting is not correct yet, and I am not familiar with this. Need you guys' help 😄

@fiigii
Copy link
Contributor Author

fiigii commented Aug 23, 2017

And if alignment requirements allow then also Sse2.MaskMove, Sse2.StoreHigh, Sse2.StoreLow, Sse3.LoadAndDuplicate, Avx.MaskLoad, Avx2.ExtractVector128, Avx2.GatherVector*, Avx2.GatherMaskVector*, Avx2.MaskLoad, Avx2.MaskStore.

I suggest that just have ref T overloads on Load, Store, and Broadcast at the initial iteration.

Additionally, although certain instructions "don't explicitly forbid unaligned access", we design some of the intrinsics to be alignment-aware. For example, sometimes 32-byte alignment cannot guarantee (but 16-byte alignment can), so AVX 256-bit memory load/store would cause much more cache line split. Developers can use Avx.Insert/ExtractVector128 to load/store data 128-bit by 128-bit to get better performance.

{
public static class Aes
{
public static bool IsSupported { get; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to have a comment indicating what the corresponding CPUID check is...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it is necessary to managed language users?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it useful to be documented here, at the very least so I don't have to go delving into the CoreCLR repo to find the actual check.

I think it is something some users will care about and may help future contributors find the correct location to add new/missing APIs.


public enum FloatComparisonMode : byte
{
EqualOrderedNonSignaling,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order here important? If so, should that be documented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will do.

/// SSE2 class provides access to 128-bit SSE/SSE2 SIMD instructions
/// </para>

public static class Sse2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we end up determining whether it was better to have these explicitly separated into an Sse and Sse2 class?

Copy link
Member

@benaadams benaadams Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post x64 are there any CPU that don't have both? Does coreclr run on an x86/x64 CPU without Sse2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does coreclr run on an x86/x64 CPU without Sse2?

I do not think so. As I know, the codegen bottom-line of CoreCLR is SSE2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RyuJIT requires both. However, this is more of a consistency thing than anything else. There was some discussion on the other thread as well with arguments for both sides (joining or splitting).

Personally, I want a clear distinction between each instruction set and think they should not be combined just because the runtime CoreFX is mostly closely tied to requires it.

I also feel that combining them blurs the lines of the requirements for the design of future additions or architectures.

{
// 128 bit types
[StructLayout(LayoutKind.Sequential, Size = 16)]
public struct Vector128<T> where T : struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be in their own files (one for Vector128, one for Vector256)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does C#/CoreFX convention deal with this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe they should be in their own files that is good for extending to future ISA support (i.e. Vector512<T>)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the convention is generally one type per file, but it isn't explicitly documented in the coding style guidelines.

Maybe someone else can comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to have the actual implementation in CoreLib? If yes, you should not need this file at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas These intrinsic APIs will be implemented in S.P.CoreLib. Do you mean that we do not need all the files under System.Runtime.Intrinsics/src/ and System.Runtime.Intrinsics.X86/src/ in CoreFX?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these APIs will be implemented in S.P.CoreLib, you do not need the files under src. You just need the files under ref.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Will change code to address feedback, then move code under src to mscorlib.

@tannergooding
Copy link
Member

tannergooding commented Aug 23, 2017

Did we end up determining what to do about scalar overloads?

This refers to both adding them (which will be required if we want managed implementations of the System.Math functions) and the naming convention (since both sqrtss and sqrtps take and return a Vector128<float>).

public static bool CompareImplicitLength(Vector128<byte> left, Vector128<byte> right, ResultsFlag flag, StringComparisonMode mode)
{
throw new NotImplementedException();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the formatting is different in some cases, e.g. everything on one line versus split across multiple lines... is there a reason for when you've used one or the other?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make them consistent with everything on one line.

[StructLayout(LayoutKind.Sequential, Size = 16)]
public struct Vector128<T> where T : struct
{
public static bool IsSupported {get;}
Copy link
Member

@stephentoub stephentoub Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always have the implementations in the refs throw rather than returning dummy values, e.g.

public static bool IsSupported { get { throw null; } }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will fix.

@4creators
Copy link
Contributor

4creators commented Aug 23, 2017

Pattern Function<T> used for several method definitions could be problematic as generic parameter restriction to struct does not eliminate invalid generic parameters i.e. ExtractInt<RSAParameters> - this syntax is valid with current definition and code will compile but code is invalid at runtime - Jit will be unable to generate correct intrinsic translation to processor instruction. This is a general problem which pops up in all cases where generic parameter in method definition is restricted to struct only. IMO it would be much better to implement all allowed overloads i.e. for byte, sbyte, short, ushort, int, uint, long, ulong, what would increase number of methods but would eliminate all method misuses. With that change users would not be able to write code which compiles but is invalid during runtime.

public static long ExtractLong<T>(Vector256<T> value, byte index) where T : struct { throw new NotImplementedException(); }
// __int64 _mm256_extract_epi64 (__m256i a, const int index)
public static ulong ExtractUlong<T>(Vector256<T> value, byte index) where T : struct { throw new NotImplementedException(); }

Copy link
Contributor

@4creators 4creators Aug 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern ExtractXxx<T> could be problematic as generic parameter restriction to struct does not eliminate invalid method uses i.e. ExtractInt<RSAParameters> - this syntax is valid with current definition but it is invalid at runtime. This is a general problem which pops up in all cases where generic parameter in method definition is used and restricted to struct only.

@fiigii
Copy link
Contributor Author

fiigii commented Aug 23, 2017

@4creators The generic type parameters only can be "numerical types" (e.g., byte, short, long, etc), which is same as System.Numerics.Vector<T>. At least, the runtime will restrict passed type parameters correctly.

@4creators
Copy link
Contributor

4creators commented Aug 23, 2017

The generic type parameters only can be "numerical types" (e.g., byte, short, long, etc), which is same as System.Numerics.Vector<T>

@fiigii The code in PR does not introduce that restriction so I would agree with the above statement only to the extent that it is an implicit assumption. If it would be implemented in way it is done in Vector<T> implementation it would be OK but unfortunately only to some extent.

The problem with this code is that each generic method may accept different groups of "numerical types" i.e. ExtractUlong<T> in principle would require "numerical integral types" generic parameters while some other methods would require "numerical floating types" and essentially it is quite difficult to implement that kind of generic parameters restrictions on a method by method basis. Unfortunately current C# syntax is not very helpful here.

One can assume that even in the case of ExtractUlong<T> method use of floats as T would be OK as conversion on binary level will succeed but in reality it may lead to unexpected and hard to detect errors. Bcs of this I would still be for changes to the implementation. Anyway base method definition in C/C++ intrinsics requires that parameters to that function would be integral types -> __m256i

__int64 _mm256_extract_epi64 (__m256i a, const int index);

At least, the runtime will restrict passed type parameters correctly.

I do not think that it is OK to defer errors to runtime - there were long discussions about that problem and conclusion was to enforce correctness at compile time - this is why there was an implementation problem of immediate values passed to some intrinsics. Common ground was to get compile time support fro immediate values (const or attribute) from Roslyn.

@fiigii
Copy link
Contributor Author

fiigii commented Aug 23, 2017

The code in PR does not introduce that restriction

@4creators Thanks for pointing out. We need to clarify it in documents.

@sharwell
Copy link

sharwell commented Sep 1, 2017

❓ Can someone explain the aversion to ref Vector128<T>? I came in late (from vacation) and see that these overloads were apparently removed but pointer versions were kept. It seems like for most of the operations the opposite is what we want (pointer versions unnecessary; ref Vector128<T>/etc sufficient and preferred).

If we can get the same performance and improved clarity/safety from properly expressing the reference semantics, why bother bloating the metadata with pointer versions of the signatures?

@jkotas
Copy link
Member

jkotas commented Sep 1, 2017

The overloads that were removed were ref T overloads, e.g. Vector128<T> Load<T>(ref T vector).

ref Vector128<T> overloads would be fine with me.

@sharwell
Copy link

sharwell commented Sep 1, 2017

The overloads that were removed were ref T overloads

Hmm. I would only expect to see ref T overloads in cases like broadcast, where only a single element will be referenced. Most¹ of the pointers in the current signatures can be replaced by either ref T or ref Vector128<T>/ref Vector256<T>, and currently I don't see any downside of this.

¹ One could easily argue that signatures involving a pointer from which an offset is taken, e.g. gather, should stay as pointers. I would have no problem with that. My concerns are limited to cases where the referenced memory is exactly the memory at the pointer location.

@mellinoe
Copy link
Contributor

mellinoe commented Sep 1, 2017

Most¹ of the pointers in the current signatures can be replaced by either ref T or ref Vector128/ref Vector256, and currently I don't see any downside of this.

The obvious downside is that you can no longer pass in pointers to those functions. To some extent they are equivalent, but it requires you to use extra conversion operations which you might like to avoid, depending on the surrounding code. I don't see a problem with adding them on top, but it's also something we could consider adding later after we understand the usage patterns a bit better.

@sharwell
Copy link

sharwell commented Sep 1, 2017

The obvious downside is that you can no longer pass in pointers to those functions.

The required conversion would be a NOP in the generated X86 code (post-JIT), but would appear as an additional call to an intrinsic in metadata. The question I have is, in code that uses pointers and intrinsics, how many times would you actually need to make the conversion? It seems most cases would require very few conversions.

@fiigii
Copy link
Contributor Author

fiigii commented Sep 1, 2017

The question I have is, in code that uses pointers and intrinsics, how many times would you actually need to make the conversion?

For Load and Store, I am pretty sure that the conversion will be required in most of the situations. The most common SIMD programming pattern is: load a bunch of data from memory to vectors and write back results to memory after SIMD computing complete. Original data in memory is usually organized with its owen format, which is not aware of the SIMD Vector128/256<T>.

@fiigii
Copy link
Contributor Author

fiigii commented Sep 1, 2017

Update

Separate SSE intrinsics from Sse2 class. New class Sse added.

@sharwell
Copy link

sharwell commented Sep 1, 2017

Original data in memory is usually organized with its owen format

@fiigii Methods supporting this statement would not be affected by my request. My request specifically targets cases where the target of a load or store does match the layout for the vector.

@fiigii
Copy link
Contributor Author

fiigii commented Sep 8, 2017

@dotnet-bot test this please

@mmitche
Copy link
Member

mmitche commented Sep 8, 2017

@dotnet-bot test this please (accidentlaly deleted VMs while in use)

@fiigii
Copy link
Contributor Author

fiigii commented Sep 8, 2017

@dotnet-bot test Linux x64 Release Build

@fiigii
Copy link
Contributor Author

fiigii commented Sep 11, 2017

@mellinoe I have addressed all the feedback and this PR passed all tests. Could you merge it?

<Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<BuildConfigurations>
netcoreapp-Windows_NT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we need both Windows_NT and Unix configurations for some reason? I think this should work with just a "netcoreapp" configuration. Other than that it looks good to me to merge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is from @weshaggard 's suggestion #23489 (comment) , for "just a 'netcoreapp' configuration" causes build errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. That makes sense.

@mellinoe mellinoe merged commit 651372e into dotnet:master Sep 11, 2017
@tannergooding
Copy link
Member

🎉 🎈 🎂

@karelz karelz added this to the 2.1.0 milestone Oct 11, 2017
@ghost
Copy link

ghost commented Feb 25, 2018

@fiigii congrats, look like it is amazing work!
I jump in many months after and I have to say that I struggle to understand the usage of this whole feature, do you have a sample somewhere demonstrating a basic usage for a Vector3 or 4 type for instance?
Thanks

@fiigii
Copy link
Contributor Author

fiigii commented Feb 26, 2018

do you have a sample somewhere demonstrating a basic usage for a Vector3 or 4 type for instance?

@noahfalk Thank you for you interest on the project. Don't you mean the Vector128<T> and Vector256<T>? If yes, I suggest you to google "SIMD programming" that would give you some examples in C++ and you can treat Vector128<T> as __m128/__m128d/__m128i and Vector256<T> as __m256/__m256d/__m256i in C++.
Here is a good serial of SIMD programming tutorials https://felix.abecassis.me/?s=sse

@ghost
Copy link

ghost commented Feb 26, 2018

@fiigii thanks for answering! Yes, I know the SIMD assembly counterpart, I wasn't just aware of the purpose of these new types Vector128<T> and Vector256<T>.
So I'm supposed to use the intrinsics methods in System.Runtime.Intrinsics.X86 namespace to manipulate these types, right?

What I meant by Vector3 is if I want to declare a new Vector3 type containing X, Y and Z properties: how should I do?

Now, based on your answer, I believe that I should do something like:

  • defining a struct
  • the X, Y, Z as properties
  • Storing internally an array of 3 floats
  • Use the S.R.I.X86 methods to do all the loading/storing/arithmetic jobs.

Right?

I'm looking forward to port my 3D Math lib with these new feature and also the struct ref and Span<T> (for List of Vector3 for instance), the speed bump should be great!

@fiigii
Copy link
Contributor Author

fiigii commented Feb 26, 2018

new types Vector128 and Vector256.
So I'm supposed to use the intrinsics methods in System.Runtime.Intrinsics.X86 namespace to manipulate these types, right?

Yes.

What I meant by Vector3 is if I want to declare a new Vector3 type containing X, Y and Z properties: how should I do?

You have two ways:

  1. AoS (array of structures), you can use Vector128<T> to contain X, Y, Z and manipulate these vectors via System.Runtime.Intrinsics.X86 intrinsics (or you can use System.Numerics.Vector3 and related intrinsics as well that is a serial of "higher-level" abstract APIs).
  2. SoA (structure of arrays), you can use three Vector128<T> or Vector256<T> to contains 4/8 3D float vectors, which leverages wider-SIMD-registers on newer architectures (AVX/AVX2).

Now, based on your answer, I believe that I should do something like:
Use the S.R.I.X86 methods to do all the loading/storing/arithmetic jobs.

Usually, "defining a struct" may not necessary, and S.R.I.X86 provides unsafe load/store intrinsics that operate over raw pointers.

@ghost
Copy link

ghost commented Feb 26, 2018

Ok, I understand the intend better now: I should use System.Numerics for common arithmetic, but if I want to perform custom SIMD x86 based code in safe mode, I can rely on your work.
Sorry but I had a lot to catch up with these kind of features in C#.

Btw, with a simple program like this:

using System.Runtime.Intrinsics.X86;
namespace test01
{
    class Program
    {
        static void Main(string[] args)
        {
            var sse = Sse.IsSupported;
            var sse2 = Sse2.IsSupported;
            var sse3 = Sse3.IsSupported;
            var sse41 = Sse41.IsSupported;
            var sse42 = Sse42.IsSupported;
            var avx = Avx.IsSupported;
            var avx2 = Avx2.IsSupported;

            Console.WriteLine($"SSE Supported: {sse}, SSE2 Supported: {sse2}, SSE3 Supported: {sse3}, SSE41 Supported: {sse41}, SSE42 Supported: {sse42}, AVX Supported: {avx}, AVX2 Supported: {avx2}");

            Debugger.Break();
            var v1 = new System.Numerics.Vector4(1, 2, 3, 4);
            var v2 = new System.Numerics.Vector4(10, 11, 12, 12);
            var res = System.Numerics.Vector4.Add(v1, v2);

            Console.WriteLine(res.X);
        }
    }
}

I got only SSE with IsSupported equals true, all the others are false but my CPU (i7-6700) does support them all, any idea? (project is .net Core 2.1.0-preview2-26131-06) and the code runs in x64.

(maybe I should create a dedicated GitHub Issue, please advise)

@fiigii
Copy link
Contributor Author

fiigii commented Feb 26, 2018

I got only SSE with IsSupported equals true, all the others are false but my CPU (i7-6700) does

Since other ISA classes are not yet fully implemented in the JIT compiler. If you want to use other implemented intrinsic, please set the environment variable COMPlus_EnableIncompleteISAClass=1.

@ghost
Copy link

ghost commented Feb 26, 2018

Sorry to bother, but I've set the System Env variable COMPlus_EnableIncompleteISAClass=1 and ran my project under a shell command with the variable on and it's not working better.
I've also set the debug/Env variable setting in Visual Studio, doesn't work either.

@fiigii
Copy link
Contributor Author

fiigii commented Feb 26, 2018

@nockawa Sorry, I forgot to say that the variable COMPlus_EnableIncompleteISAClass=1 only work with debug build of CoreCLR. So that you need to download and build CoreCLR from source code, rather than install .NET Core SDK. https://github.com/dotnet/coreclr

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.