-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Add Intel hardware intrinsic APIs to CoreFX #23489
Conversation
|
@AlexGhiondea @joperezr The building system setting is not correct yet, could you help me? |
|
Oh my! 😄 🎉 |
| public static ulong AndNot(ulong left, ulong right) { throw null; } | ||
| public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; } | ||
| public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; } | ||
| public static uint BitFieldExtract(uint value, uint contr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you decide which overloads should get the ref overload - e.g. why is there no ref T overload for StoreAligned or StoreAlignedNonTemporal?
I think it maybe best to skip the ref overloads in the initial iteration of this. There is number of open questions around them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, only the most common memory-access intrinsics (e.g., Load, Store, BroadcastVector128ToVector256) have ref T overloads, and all the alignment-aware (e.g., Load/StoreAligned, Extract, Insert, etc.) or performance-sensitive (i.e. Gether*) intrinsics just have pointer versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On modern Intel CPUs, these "most common memory-access intrinsics" have almost same performance over aligned and unaligned memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas you can't guarantee the location/alignment of a ref unless its fixed; when you'd have a pointer anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have picked bad examples. @pentp comment has better ones.
Also, the ref APIs are dangerous but they appear safe on the surface. I am not sure whether it is a good idea. I guess it is why the folks opted for not adding them during API review and preferred Span instead. They may deserve another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can implement the Span on top of the ref and have the jit hoist checks for loops; whereas if on pointer with fixed it wouldn't be able to do that? (Also the Span versions can be in corefx; whereas if it was in coreclr it would have versioning issues effecting inlines? - am hazy on versioning inlines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if on pointer with fixed it wouldn't be able to do that
Yes, you can do this. But I am not sure whether it is really needed. I think it would be fine to start with unsafe APIs only, tell folks to pin, and see where it does not work.
I am not sure whether the pining is really a problem for these sort of APIs. I won't be surprised if folks will want to pin in a lot of cases when using these APIs anyway because of they will want the alignment.
Also the Span versions can be in corefx
If Span versions are in corefx, they would have to be on a different type. Span is CoreLib for .NET Core, and this is .NET Core only feature, so I do not see any issues around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fiigii Could you please delete the ref overloads? We should start with unsafe pointers only. Once we gain more experience with this, we can add ref or Span overloads if they are really needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas Sounds reasonable to me, will remove ref overloads.
| @@ -0,0 +1,1115 @@ | |||
| namespace System.Runtime.Intrinsics.X86 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a license header like the other ref files, e.g.:
corefx/src/System.Runtime/ref/System.Runtime.cs
Lines 1 to 6 in fd3004d
| // Licensed to the .NET Foundation under one or more agreements. | |
| // The .NET Foundation licenses this file to you under the MIT license. | |
| // See the LICENSE file in the project root for more information. | |
| // ------------------------------------------------------------------------------ | |
| // Changes to this file must follow the http://aka.ms/api-review process. | |
| // ------------------------------------------------------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
| public static ulong AndNot(ulong left, ulong right) { throw null; } | ||
| public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; } | ||
| public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; } | ||
| public static uint BitFieldExtract(uint value, uint contr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these have explicit values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain a little bit more? Github differ does not seem to show the correct code location that you referred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what happened with GitHub but these comments were for the enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will add explicit values for enums.
| public static ulong AndNot(ulong left, ulong right) { throw null; } | ||
| public static uint BitFieldExtract(uint value, uint start, uint length) { throw null; } | ||
| public static ulong BitFieldExtract(ulong value, ulong start, ulong length) { throw null; } | ||
| public static uint BitFieldExtract(uint value, uint contr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these have explicit values?
| @@ -0,0 +1,577 @@ | |||
| // =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing license header. Applies throughout.
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out. Will do.
| @@ -0,0 +1,27 @@ | |||
| // =+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | |||
| // | |||
| // Lzcnt.cs | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Should these file headers be removed and replaced with a <Summary/> doc comments on the class? e.g.:
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using System;
using System.Runtime.Intrinsics;
namespace System.Runtime.Intrinsics.X86
{
/// <summary>
/// A class that implements intrinsic functions to provide access to Intel LZCNT instructions.
/// </summary>
public static class Lzcnt
{
public static bool IsSupported { get; }
// unsigned int _lzcnt_u32 (unsigned int a)
public static uint LeadingZeroCount(uint value) { throw new NotImplementedException(); }
// unsigned __int64 _lzcnt_u64 (unsigned __int64 a)
public static ulong LeadingZeroCount(ulong value) { throw new NotImplementedException(); }
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes let's please use XML docs. I don't know how directly these end up translating to documentation, but it's good to write them like they would go into the docs. So you might miss off "A class that..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I use XML docs for each function's comment, e.g., // unsigned int _lzcnt_u32 (unsigned int a) ?
|
Shouldn't there be In general, |
Correct. |
|
|
| Release|x64 = Release|x64 | ||
| EndGlobalSection | ||
| GlobalSection(ProjectConfigurationPlatforms) = postSolution | ||
| {53134B0C-0D57-481B-B84E-D1991E8D54FF}.Debug|x64.ActiveCfg = netcoreapp-Debug|x64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be AnyCPU?
| <ProjectGuid>{650277B5-9423-4ACE-BB54-2659995B21C7}</ProjectGuid> | ||
| <IsPartialFacadeAssembly Condition="'$(TargetGroup)' == 'netfx' OR '$(TargetGroup)' == 'net46'">true</IsPartialFacadeAssembly> | ||
| </PropertyGroup> | ||
| <PropertyGroup Condition="'$(Configuration)|$(Platform)' == 'net46-Debug|x64'" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AnyCPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, building system setting is not correct yet, and I am not familiar with this. Need you guys' help 😄
I suggest that just have Additionally, although certain instructions "don't explicitly forbid unaligned access", we design some of the intrinsics to be alignment-aware. For example, sometimes 32-byte alignment cannot guarantee (but 16-byte alignment can), so AVX 256-bit memory load/store would cause much more cache line split. Developers can use |
| { | ||
| public static class Aes | ||
| { | ||
| public static bool IsSupported { get; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to have a comment indicating what the corresponding CPUID check is...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is necessary to managed language users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it useful to be documented here, at the very least so I don't have to go delving into the CoreCLR repo to find the actual check.
I think it is something some users will care about and may help future contributors find the correct location to add new/missing APIs.
|
|
||
| public enum FloatComparisonMode : byte | ||
| { | ||
| EqualOrderedNonSignaling, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the order here important? If so, should that be documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will do.
| /// SSE2 class provides access to 128-bit SSE/SSE2 SIMD instructions | ||
| /// </para> | ||
|
|
||
| public static class Sse2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we end up determining whether it was better to have these explicitly separated into an Sse and Sse2 class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post x64 are there any CPU that don't have both? Does coreclr run on an x86/x64 CPU without Sse2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does coreclr run on an x86/x64 CPU without Sse2?
I do not think so. As I know, the codegen bottom-line of CoreCLR is SSE2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RyuJIT requires both. However, this is more of a consistency thing than anything else. There was some discussion on the other thread as well with arguments for both sides (joining or splitting).
Personally, I want a clear distinction between each instruction set and think they should not be combined just because the runtime CoreFX is mostly closely tied to requires it.
I also feel that combining them blurs the lines of the requirements for the design of future additions or architectures.
| { | ||
| // 128 bit types | ||
| [StructLayout(LayoutKind.Sequential, Size = 16)] | ||
| public struct Vector128<T> where T : struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be in their own files (one for Vector128, one for Vector256)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does C#/CoreFX convention deal with this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe they should be in their own files that is good for extending to future ISA support (i.e. Vector512<T>)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention is generally one type per file, but it isn't explicitly documented in the coding style guidelines.
Maybe someone else can comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to have the actual implementation in CoreLib? If yes, you should not need this file at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas These intrinsic APIs will be implemented in S.P.CoreLib. Do you mean that we do not need all the files under System.Runtime.Intrinsics/src/ and System.Runtime.Intrinsics.X86/src/ in CoreFX?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these APIs will be implemented in S.P.CoreLib, you do not need the files under src. You just need the files under ref.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. Will change code to address feedback, then move code under src to mscorlib.
|
Did we end up determining what to do about This refers to both adding them (which will be required if we want managed implementations of the |
| public static bool CompareImplicitLength(Vector128<byte> left, Vector128<byte> right, ResultsFlag flag, StringComparisonMode mode) | ||
| { | ||
| throw new NotImplementedException(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: the formatting is different in some cases, e.g. everything on one line versus split across multiple lines... is there a reason for when you've used one or the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will make them consistent with everything on one line.
| [StructLayout(LayoutKind.Sequential, Size = 16)] | ||
| public struct Vector128<T> where T : struct | ||
| { | ||
| public static bool IsSupported {get;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always have the implementations in the refs throw rather than returning dummy values, e.g.
public static bool IsSupported { get { throw null; } }There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Will fix.
|
Pattern Function<T> used for several method definitions could be problematic as generic parameter restriction to struct does not eliminate invalid generic parameters i.e. |
| public static long ExtractLong<T>(Vector256<T> value, byte index) where T : struct { throw new NotImplementedException(); } | ||
| // __int64 _mm256_extract_epi64 (__m256i a, const int index) | ||
| public static ulong ExtractUlong<T>(Vector256<T> value, byte index) where T : struct { throw new NotImplementedException(); } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pattern ExtractXxx<T> could be problematic as generic parameter restriction to struct does not eliminate invalid method uses i.e. ExtractInt<RSAParameters> - this syntax is valid with current definition but it is invalid at runtime. This is a general problem which pops up in all cases where generic parameter in method definition is used and restricted to struct only.
|
@4creators The generic type parameters only can be "numerical types" (e.g., |
@fiigii The code in PR does not introduce that restriction so I would agree with the above statement only to the extent that it is an implicit assumption. If it would be implemented in way it is done in The problem with this code is that each generic method may accept different groups of "numerical types" i.e. One can assume that even in the case of __int64 _mm256_extract_epi64 (__m256i a, const int index);
I do not think that it is OK to defer errors to runtime - there were long discussions about that problem and conclusion was to enforce correctness at compile time - this is why there was an implementation problem of immediate values passed to some intrinsics. Common ground was to get compile time support fro immediate values (const or attribute) from Roslyn. |
@4creators Thanks for pointing out. We need to clarify it in documents. |
|
❓ Can someone explain the aversion to If we can get the same performance and improved clarity/safety from properly expressing the reference semantics, why bother bloating the metadata with pointer versions of the signatures? |
|
The overloads that were removed were
|
Hmm. I would only expect to see ¹ One could easily argue that signatures involving a pointer from which an offset is taken, e.g. gather, should stay as pointers. I would have no problem with that. My concerns are limited to cases where the referenced memory is exactly the memory at the pointer location. |
The obvious downside is that you can no longer pass in pointers to those functions. To some extent they are equivalent, but it requires you to use extra conversion operations which you might like to avoid, depending on the surrounding code. I don't see a problem with adding them on top, but it's also something we could consider adding later after we understand the usage patterns a bit better. |
The required conversion would be a NOP in the generated X86 code (post-JIT), but would appear as an additional call to an intrinsic in metadata. The question I have is, in code that uses pointers and intrinsics, how many times would you actually need to make the conversion? It seems most cases would require very few conversions. |
For |
UpdateSeparate SSE intrinsics from |
@fiigii Methods supporting this statement would not be affected by my request. My request specifically targets cases where the target of a load or store does match the layout for the vector. |
|
@dotnet-bot test this please |
|
@dotnet-bot test this please (accidentlaly deleted VMs while in use) |
|
@dotnet-bot test Linux x64 Release Build |
|
@mellinoe I have addressed all the feedback and this PR passed all tests. Could you merge it? |
| <Project ToolsVersion="14.0" DefaultTargets="Build" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> | ||
| <PropertyGroup> | ||
| <BuildConfigurations> | ||
| netcoreapp-Windows_NT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we need both Windows_NT and Unix configurations for some reason? I think this should work with just a "netcoreapp" configuration. Other than that it looks good to me to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is from @weshaggard 's suggestion #23489 (comment) , for "just a 'netcoreapp' configuration" causes build errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder. That makes sense.
|
🎉 🎈 🎂 |
|
@fiigii congrats, look like it is amazing work! |
@noahfalk Thank you for you interest on the project. Don't you mean the |
|
@fiigii thanks for answering! Yes, I know the SIMD assembly counterpart, I wasn't just aware of the purpose of these new types What I meant by Now, based on your answer, I believe that I should do something like:
Right? I'm looking forward to port my 3D Math lib with these new feature and also the |
Yes.
You have two ways:
Usually, "defining a struct" may not necessary, and S.R.I.X86 provides unsafe load/store intrinsics that operate over raw pointers. |
|
Ok, I understand the intend better now: I should use System.Numerics for common arithmetic, but if I want to perform custom SIMD x86 based code in safe mode, I can rely on your work. Btw, with a simple program like this: I got only SSE with (maybe I should create a dedicated GitHub Issue, please advise) |
Since other ISA classes are not yet fully implemented in the JIT compiler. If you want to use other implemented intrinsic, please set the environment variable |
|
Sorry to bother, but I've set the System Env variable |
|
@nockawa Sorry, I forgot to say that the variable |
Add Intel hardware intrinsic APIs to CoreFX Commit migrated from dotnet/corefx@651372e
This PR is the complete design of API Proposal: Add Intel hardware intrinsic functions and namespace #22940.
For building with the current code base,
constparameter modifiers and[intrinsic]is temporarily removed.@russellhadley @terrajobst @CarolEidt @mellinoe