Optimize FindFirstCharToEncode for JavaScriptEncoder.Default and Relaxed using Sse2 intrinsics.#41933
Conversation
using Sse2 intrinsics.
| <RootNamespace>System.Text.Encodings.Web</RootNamespace> | ||
| <AllowUnsafeBlocks>true</AllowUnsafeBlocks> | ||
| <Configurations>netstandard-Debug;netstandard-Release;netstandard2.1-Debug;netstandard2.1-Release</Configurations> | ||
| <Configurations>netcoreapp-Debug;netcoreapp-Release;netstandard-Debug;netstandard-Release;netstandard2.1-Debug;netstandard2.1-Release</Configurations> |
There was a problem hiding this comment.
Adding a new netcoreapp config to the source implementation (to make use of intrinsics which are only available on core).
cc @ericstj
There was a problem hiding this comment.
So adding a config here is OK, so long as its in packages. One thing you need to be careful about is adding a configuration where an app might rollforward with a newer package and downgrade the inbox implementation with lesser one.
Consider: App targets netcoreapp3.0, references a new package, say from 5.0 wave. App rolls forward to 3.1. 5.0 binary might have higher version than 3.1 inbox, but be missing your feature. One way to mitigate this is to give the netcoreapp build a higher version. Another way to mitigate is to make sure you never target a TFM that would be the target of a rollforward (EG: target netcoreapp3.0 instead of netcoreapp3.1).
...ystem.Text.Encodings.Web/src/System/Text/Encodings/Web/DefaultJavaScriptEncoderBasicLatin.cs
Show resolved
Hide resolved
| // The worst case encoding is 6 output chars per input char: [input] U+FFFF -> [output] "\uFFFF" | ||
| // We don't need to worry about astral code points since they're represented as encoded | ||
| // surrogate pairs in the output. | ||
| public override int MaxOutputCharactersPerInputCharacter => 12; // "\uFFFF\uFFFF" is the longest encoded form |
There was a problem hiding this comment.
The code below is duplicated from
Derive this class from DefaultJavaScriptEncoder, so that the code can be reused (from the base-class)?
There was a problem hiding this comment.
I considered that, but DefaultJavaScriptEncoder is a sealed class and hence we can't derive from it.
There was a problem hiding this comment.
Removing sealed might not be an option (perf-wise), so maybe move the "shared" code to a helper and use from both places?
There was a problem hiding this comment.
I moved the shared code into a common static helper class. TryEncodeUnicodeScalar cannot be moved fully since it calls the virtual WillEncode which is different and extracting out the common snippets outside of that within TryEncodeUnicodeScalar doesn't seem worth it (especially since UnsafeRelaxedJavaScriptEncoder.TryEncodeUnicodeScalar has difference there too).
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/Sse2Helper.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/Sse2Helper.cs
Outdated
Show resolved
Hide resolved
| int processNextSixteen = idx + 16; | ||
| Debug.Assert(processNextSixteen <= utf8Text.Length); | ||
| if (index != 0) | ||
| { |
There was a problem hiding this comment.
Move this block into a separate method, to keep the scan-loop tight?
There was a problem hiding this comment.
Moving this out to another method slows the "all-ascii" code path quite a bit (~2x).
There was a problem hiding this comment.
The culprit here was passing idx by ref.
It was causing a bunch of stack<->register moves:
00007FFD76D8B0DA mov ecx,dword ptr [rsp+48h]
00007FFD76D8B0DE inc ecx
00007FFD76D8B0E0 mov dword ptr [rsp+48h],ecx
| while (idx < processNextSixteen) | ||
| { | ||
| Debug.Assert((ptr + idx) <= (ptr + utf8Text.Length)); | ||
|
|
||
| if (UnicodeUtility.IsAsciiCodePoint(ptr[idx])) | ||
| { | ||
| if (DoesAsciiNeedEncoding(ptr[idx]) == 1) | ||
| { | ||
| goto Return; | ||
| } | ||
| idx++; | ||
| } | ||
| else | ||
| { | ||
| OperationStatus opStatus = UnicodeHelpers.DecodeScalarValueFromUtf8(utf8Text.Slice(idx), out uint nextScalarValue, out int utf8BytesConsumedForScalar); | ||
|
|
||
| Debug.Assert(nextScalarValue <= int.MaxValue); | ||
| if (opStatus != OperationStatus.Done || WillEncode((int)nextScalarValue)) | ||
| { | ||
| goto Return; | ||
| } | ||
|
|
||
| Debug.Assert(opStatus == OperationStatus.Done); | ||
| idx += utf8BytesConsumedForScalar; | ||
| } | ||
| } |
There was a problem hiding this comment.
The same code is used in the sequential path. Refactor to a method?
There was a problem hiding this comment.
I have in-line gotos here making it hard to refactor out into a method. Let me try and see if perf is impacted.
There was a problem hiding this comment.
There's a small regression when I do this (~10%, goes from 12.5 ns to 14 ns, dealing with 7 characters).
Here's basically what I changed:
while (idx < utf8Text.Length)
{
if (NeedsEscapingHelper(ptr, ref idx, utf8Text))
{
goto Return;
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private unsafe bool NeedsEscapingHelper(byte* ptr, ref int idx, ReadOnlySpan<byte> utf8Text)
{
Debug.Assert((ptr + idx) <= (ptr + utf8Text.Length));
bool result = true;
if (UnicodeUtility.IsAsciiCodePoint(ptr[idx]))
{
if (DoesAsciiNeedEncoding(ptr[idx]) == 1)
{
goto Return;
}
idx++;
}
else
{
OperationStatus opStatus = UnicodeHelpers.DecodeScalarValueFromUtf8(utf8Text.Slice(idx), out uint nextScalarValue, out int utf8BytesConsumedForScalar);
Debug.Assert(nextScalarValue <= int.MaxValue);
if (opStatus != OperationStatus.Done || WillEncode((int)nextScalarValue))
{
goto Return;
}
Debug.Assert(opStatus == OperationStatus.Done);
idx += utf8BytesConsumedForScalar;
}
result = false;
Return:
return result;
}I want to avoid the code duplication, but for now, leaning towards leaving it as is.
The disassembly grows as well from 99 instructions to 114 (if we just look at the remainder processing while loop).
sharplab.io
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/UnsafeRelaxedJavaScriptEncoder.cs
Show resolved
Hide resolved
| if (needsEscaping == 0) | ||
| { | ||
| needsEscaping = WillEncode(value) ? 1 : -1; | ||
| _asciiNeedsEscaping[value] = needsEscaping; |
There was a problem hiding this comment.
DoesAsciiNeedEncoding is used quite often, so this method should be as tiny as possible.
Therefore move this lazy assignement of the values to the ctor (WillEncode is already known there)?
Then this method is just the array-lookup.
There was a problem hiding this comment.
Which ctor? TextEncoder is an abstract class.
There was a problem hiding this comment.
Which ctor?
Currently there is none, but you can create a protected ctor?
There was a problem hiding this comment.
The types that implement TextEncoder are also what implement WillEncode. If we added a protected ctor to TextEncoder it will end up calling WillEncode on the concrete types before the concrete type's ctor finishes.
So:
DefaultJavaScriptEncoder ctor starts -> Calls base TextEncoder ctor -> Calls DefaultJavaScriptEncoder.WillEncode -> This fails because the DefaultJavaScriptEncoder Ctor isn't complete yet (including setting up _allowedCharacters).
We could, theoretically, copy/paste and override the base-types FindFirstCharacterToEncodeUtf8 for DefaultJavaScriptEncoder and duplicate that logic. For that concrete type, we could optimize the DoesAsciiNeedEncoding and init the array up-front.
There was a problem hiding this comment.
Or call an initialize-method (outside of the loop which uses DoesAsciiNeedEncoding) that ensures the array is populated with data?
There was a problem hiding this comment.
I think the ~80% case has been addressed already (with the built-in encoders). I am less inclined to optimize TextEncoder itself any further since most people wouldn't be calling those methods anyway (at least when it comes to S.T.Json).
We can do that separately, for 5.0/master (I'd like to port this PR to 3.1). Looks like we can get ~15% perf win from doing so.
private bool _asciiCacheNeedsInit = true;
public virtual unsafe int FindFirstCharacterToEncodeUtf8(ReadOnlySpan<byte> utf8Text)
{
if (_asciiCacheNeedsInit)
{
InitializeAsciiCache();
}
// ... rest of the current impl
}
[MethodImpl(MethodImplOptions.NoInlining)]
private void InitializeAsciiCache()
{
Debug.Assert(_asciiCacheNeedsInit);
for (int i = 0; i < _asciiNeedsEscaping.Length; i++)
{
_asciiNeedsEscaping[i] = WillEncode(i) ? 1 : -1;
}
_asciiCacheNeedsInit = false;
}
src/System.Text.Encodings.Web/src/System/Text/Internal/AllowedCharactersBitmap.cs
Outdated
Show resolved
Hide resolved
|
Unrelated infra failure ( Sending Job to Windows.10.Amd64.Client19H1.ES.Open... cc @dotnet/dnceng Unrelated test failure on netcoreapp-Linux-Debug-x64-RedHat.7.Amd64.Open: System.Text.RegularExpressions.Tests.RegexCacheTests.Ctor_Cache_Uses_dictionary_linked_list_switch_does_not_throw https://github.com/dotnet/corefx/issues/41981 |
|
Any other feedback? |
| { | ||
| Debug.Assert(Sse2.IsSupported); | ||
|
|
||
| Vector128<short> mask = Sse2.CompareLessThan(sourceValue, s_mask_UInt16_0x20); // Space ' ', anything in the control characters range |
There was a problem hiding this comment.
Comment could add "anything above short.MaxValue but less than or equal char.MaxValue" since a signed short is used.
There was a problem hiding this comment.
I mentioned that below (in CreateAsciiMask), but sure, can add the comment here too.
| Vector128<sbyte> sourceValue = Sse2.LoadVector128(startingAddress); | ||
|
|
||
| Vector128<sbyte> mask = Sse2Helper.CreateAsciiMask(sourceValue); | ||
| int index = Sse2.MoveMask(mask.AsByte()); |
There was a problem hiding this comment.
Why is AsByte called (since we just initialized for sbyte)?
There was a problem hiding this comment.
This we can remove, since we can use the sbyte overload. Good catch :)
However, it doesn't have any impact on the disassembly (so there is literally zero perf impact of doing either).
...ystem.Text.Encodings.Web/src/System/Text/Encodings/Web/DefaultJavaScriptEncoderBasicLatin.cs
Show resolved
Hide resolved
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/Sse2Helper.cs
Show resolved
Hide resolved
…xed using Sse2 intrinsics. (dotnet#41933) * Optimize FindFirstCharToEncode for JavaScriptEncoder.Default and Relaxed using Sse2 intrinsics. * Create an Sse2Helper and improve perf of TextEncoder and AllowedCharactersBitmap * Loop unroll FindFirstCharacterToEncode * Improve code coverage. * Add more tests for surrogate pairs and fix call to WillEncode. * Address PR feedback - remove some code duplication. * Move DefaultJavaScriptEncoder to separate file and override EncodeUtf8 with better caching. * Add default replacement character as a test. * Address nits.
|
Did someone explicitly signoff before merging? |
|
FWIW I was just going to approve it before it was merged. |
…xed using Sse2 intrinsics. (dotnet/corefx#41933) * Optimize FindFirstCharToEncode for JavaScriptEncoder.Default and Relaxed using Sse2 intrinsics. * Create an Sse2Helper and improve perf of TextEncoder and AllowedCharactersBitmap * Loop unroll FindFirstCharacterToEncode * Improve code coverage. * Add more tests for surrogate pairs and fix call to WillEncode. * Address PR feedback - remove some code duplication. * Move DefaultJavaScriptEncoder to separate file and override EncodeUtf8 with better caching. * Add default replacement character as a test. * Address nits. Commit migrated from dotnet/corefx@c4b93b6
Follow-up to #41845 to optimize the built-in
JavascriptEncoderusing a similar approach.Also added some tests to improve code coverage.
Next Steps:
S.T.Jsoncc @GrabYourPitchforks, @gfoidl, @tarekgh, @tannergooding, @ViktorHofer, @steveharter