Eliminate range checks from ConcatAsHexSuffix + Instrinsics#18406
Eliminate range checks from ConcatAsHexSuffix + Instrinsics#18406analogrelay merged 7 commits intodotnet:masterfrom
Conversation
| buffer[i + 3] = (char)hexEncodeMap[(number >> 20) & 0xF]; | ||
| buffer[i + 2] = (char)hexEncodeMap[(number >> 24) & 0xF]; | ||
| buffer[i + 1] = (char)hexEncodeMap[(number >> 28) & 0xF]; | ||
| buffer[i] = tupleSeparator; |
There was a problem hiding this comment.
it can be vectorized I think, load tupleNumber into an 256bit vector, do some shuffle magic and dump to a string (ideally - to FastAllocateString)
There was a problem hiding this comment.
FastAllocateString is internal so it can't be used here in ASP.NET Core?
256bit vec is too large, 128bit should work, as there get 9 elements written to buffer. So with Vector128<ushor> it's 8 elements + the remaining 1.
Preferable this should be done in a different PR -- but nice idea!
Nitpicking: uint and little-endian 😉 (but that's irrelevant to the idea).
There was a problem hiding this comment.
@gfoidl @benaadams here is a vectorized string ToHex(int num) : https://pastebin.com/F2ZUYkwG
There was a problem hiding this comment.
Looks good.
Makes it sense to add int -> hex to runtime? I think it's quite common to convert a number to hex. Therefore is int.ToString("X"), but it allocates a string. So there should be something like int.ToString(Span<char>, Format) (I know the API needs work, maybe on a better type like a special formatter) and at this central place all kind of optimization could take place (SIMD or SWAR).
There was a problem hiding this comment.
Makes it sense to add int -> hex to runtime?
That conversation has been going on for years https://github.com/dotnet/corefx/issues/10013
There was a problem hiding this comment.
Gowan then
Method | Branch | Mean | Error | Op/s | Allocated | Precent |
------------------ |--------|---------:|----------:|-------------:|----------:|--------:|
ConcatAsHexSuffix | master | 21.18 ns | 0.1040 ns | 47,221,986.9 | 34.33 MB | |
ConcatAsHexSuffix | pr | 17.26 ns | 0.0844 ns | 57,953,720.1 | 34.33 MB | 123 % |
There was a problem hiding this comment.
I have to search within my brain, but I believe there is a SWAR variant that needs less instructions than this SIMD variant. I'll give an update here...(later today).
There was a problem hiding this comment.
That conversation has been going on for years
Maybe split the hex-part out of PopulateSpanWithHexSuffix to have such a method at least here in aspnetcore?
| return string.Create(length, (str, separator, number), s_populateSpanWithHexSuffix); | ||
| } | ||
|
|
||
| private static void PopulateSpanWithHexSuffix(Span<char> buffer, (string str, char separator, uint number) tuple) |
There was a problem hiding this comment.
I'll start a new comment to keep it clean.
I have a variant with SWAR
SWAR code
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static void Number2HexSwar(int number, ref char dest)
{
Debug.Assert(Bmi2.X64.IsSupported);
ulong input = (ulong)(uint)number; // cast to uint avoids movsxd and uses mov
ulong isolatedNibbles = Bmi2.X64.ParallelBitDeposit(input, 0x0F0F0F0F_0F0F0F0Ful);
Debug.Assert((isolatedNibbles & 0xF0F0F0F0_F0F0F0F0ul) == 0);
// JIT will treat these as constants
ulong packedAscii0 = Packed('0');
ulong packedCorrection = Packed(0x80 - ('9' - '0' + 1));
ulong packed0x80 = Packed(0x80);
ulong packedHexShiftToAF = Packed('A' - '0' - 10);
ulong asciiHexNumbers = isolatedNibbles + packedAscii0;
ulong asciiHexChars = isolatedNibbles + packedCorrection;
ulong flag = asciiHexChars & packed0x80;
ulong hexAFMask = flag - (flag >> 7);
ulong hex = asciiHexNumbers + (hexAFMask & packedHexShiftToAF);
hex = BinaryPrimitives.ReverseEndianness(hex);
ulong hex0 = hex;
ulong hex1 = hex >> 32;
Unsafe.Add(ref Unsafe.As<char, ulong>(ref dest), 0) = Bmi2.X64.ParallelBitDeposit(hex0, 0x00FF00FF_00FF00FFul);
Unsafe.Add(ref Unsafe.As<char, ulong>(ref dest), 1) = Bmi2.X64.ParallelBitDeposit(hex1, 0x00FF00FF_00FF00FFul);
static ulong Packed(ulong a) => a * 0x01010101_01010101ul;
}92ec46c to
619933b
Compare
|
Improved with feedback |
|
With the swar Though |
|
Thanks for the test. So let's leave the SWAR variant out. |
Co-Authored-By: Günther Foidl <gue@korporal.at>
BrennanConroy
left a comment
There was a problem hiding this comment.
The bit manipulation looks good 👍
Anyone else want to take a look?
| // This must be explicity typed as ReadOnlySpan<byte> | ||
| // This then becomes a non-allocating mapping to the data section of the assembly. | ||
| // If it is a var, Span<byte> or byte[], it allocates the byte array per call. | ||
| ReadOnlySpan<byte> hexEncodeMap = new byte[] { (byte)'0', (byte)'1', (byte)'2', (byte)'3', (byte)'4', (byte)'5', (byte)'6', (byte)'7', (byte)'8', (byte)'9', (byte)'A', (byte)'B', (byte)'C', (byte)'D', (byte)'E', (byte)'F' }; |
There was a problem hiding this comment.
nit: this is the same as the asciiUpperCaseData above, so we can just lift it outside the if statement.
|
@halter73 let's get this in soon. |
Co-Authored-By: Brennan <brecon@microsoft.com>

Uses dotnet/runtime#1644 for the non-instrinsic path to eliminate range checks/branches