Optimize TextEncoder.FindFirstCharacterToEncodeUtf8#42177
Optimize TextEncoder.FindFirstCharacterToEncodeUtf8#42177gfoidl wants to merge 43 commits intodotnet:masterfrom gfoidl:TextEncoder_ImproveEscapingCheck
Conversation
…eEscapingCheck
…ith CreateEscapingMaskSse2 (renamed from CreateEscapingMask)
Somehow this reference was left there, it's not needed anymore.
System\Text\Encodings\Web\Ssse3Helper.cs(101,13): error SA1115: (NETCORE_ENGINEERING_TELEMETRY=Build) The parameter should begin on the line after the previous parameter.
This reverts commit a513077.
…into ImproveEscapingCheck
…h for actual initialization
For TextEncoder this method changes the contract (ref-assembly), because it was `protected`. In order to keep the contract, this method was moved to BitHelper.
| Vector128<sbyte> bitMaskLookup, | ||
| Vector128<sbyte> bitPosLookup, | ||
| Vector128<sbyte> nibbleMaskSByte, | ||
| Vector128<sbyte> nullMaskSByte) |
There was a problem hiding this comment.
I need this arguments in order to be able to hoist these vectors outside the loop (in TextEncoder) as the JIT won't do it here.
Codegen for DefaultJavaScriptEncoderBasicLatin.FindFirstCharacterToEncode is affected only in minimal way:
G_M60692_IG11: ;; bbWeight=4
C4C17A6F27 vmovdqu xmm4, xmmword ptr [r15]
C4C159636710 vpacksswb xmm4, xmm4, xmmword ptr [r15+16]
C5F828E8 vmovaps xmm5, xmm0
C5F828F1 vmovaps xmm6, xmm1
C5F828FA vmovaps xmm7, xmm2
C57828C3 vmovaps xmm8, xmm3
C5B172D404 vpsrld xmm9, xmm4, 4
C531DBCF vpand xmm9, xmm9, xmm7
C5D9DBE7 vpand xmm4, xmm4, xmm7
C4E25100E4 vpshufb xmm4, xmm5, xmm4
C4C24900E9 vpshufb xmm5, xmm6, xmm9
C5D1DBE4 vpand xmm4, xmm5, xmm4
C4C15964E8 vpcmpgtb xmm5, xmm4, xmm8
C5B964E4 vpcmpgtb xmm4, xmm8, xmm4
C5D1EBE4 vpor xmm4, xmm5, xmm4
C5F9D7C4 vpmovmskb eax, xmm4
85C0 test eax, eax
0F851B010000 jne G_M60692_IG17
4983C720 add r15, 32
4D3BFD cmp r15, r13
76A8 jbe SHORT G_M60692_IG11Those vmovaps aren't needed at all*...
In the benchmarks this is just above noise. For larger inputs it's not measurable, for smaller one (just so that vectorization kicks in):
| Method | TestStringData | Mean | Error | StdDev | Ratio |
|--------------- |--------------------- |----------:|----------:|----------:|------:|
| PR_42073 | (16, (...)dcsv) [26] | 8.911 ns | 0.0505 ns | 0.0447 ns | 1.00 |
| PR_TextEncoder | (16, (...)dcsv) [26] | 9.085 ns | 0.0178 ns | 0.0158 ns | 1.02 |
| | | | | | |
| PR_42073 | (9, -1, rddnegsne) | 10.245 ns | 0.0445 ns | 0.0372 ns | 1.00 |
| PR_TextEncoder | (9, -1, rddnegsne) | 10.656 ns | 0.0152 ns | 0.0135 ns | 1.04 |
I think this is acceptable (especially if the JIT will be fixed so it doesn't emit these moves...I believe there is already on issue out, but I don't find it rigth now (or is that issue only for scalar-registers?)).
If not code for CreateEscapingMask has to be duplicated.
* if the register allocator would make these actually zero-latency, then there's still the instructions that need to be fetched, decoded, ... and it makes the loop larger.
| // casted to signed byte. | ||
| int index = Sse2.MoveMask(sourceValue); | ||
|
|
||
| if (index == 0) |
There was a problem hiding this comment.
Flipped the if, as this results in better code-gen for the vectorized path (SSSE3).
| { | ||
| byte* p = (byte*)startingAddress; | ||
| if (DoesAsciiNeedEncoding(p[0])) goto Return; | ||
| if (DoesAsciiNeedEncoding(p[1])) goto Return1; |
There was a problem hiding this comment.
These multiple returns break the dependency-chain on idx -- and it's inspired from SpanHelpers<T>.IndexOf.
Although I have no strong opinion whether to keep it or not.
With SSSE3 available this won't be executed anyway, but I kept it for other platforms where SSSE3 isn't available.
| } | ||
| startingAddress = (sbyte*)ptr + idx; | ||
| } | ||
| while (utf8Text.Length - 16 >= idx); |
There was a problem hiding this comment.
utf8Text.Length - 16 is kept in a register 😉
|
|
||
| private readonly int[] _asciiNeedsEscaping = new int[0x80]; | ||
| private bool _isAsciiCacheInitialized; | ||
| private readonly bool[] _asciiNeedsEscaping = new bool[0x80]; |
There was a problem hiding this comment.
Changed to bool[] as I didn't see any benefit from using int[] with 1 or -1 (and forgot to ask in the review in that PR).
|
The |
This way (as this PR builds upon the other PRs, so I think it's easier -- although this PR is complete so far). |
src/System.Text.Encodings.Web/src/System/Text/Encodings/Web/TextEncoder.cs
Outdated
Show resolved
Hide resolved
Although it's a one-time initialization (per instance of TextEncoder), this can be done O(n) instead of O(2n) (where n = 128) and with calling WillEncode just once per ASCII-value instead of two times. Won't be a huge improvement, but the first call to FindFirstCharacterToEncodeUtf8 will benefit a bit from this change. Cf. https://github.com/dotnet/corefx/pull/42177/files#r339950018
I got trapped by corefx not zero-initializing locals...
|
@GrabYourPitchforks this PR is now ready for review. |
|
Ok, I'll recover a bit from your last PR and get to this next week. :) |
|
Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:
|
|
Just curious, what's the test strategy for this and #42073 - how do we cover the various paths given our x64 machines are presumably homogenous? Another refcount for https://github.com/dotnet/corefx/issues/36113 ? |
Ideally yes. |
This PR is based on #42073 (
that's why it's marked as draft, to see the changes after #42073 got merged: https://github.com/gfoidl/corefx/compare/ImproveEscapingCheck...gfoidl:TextEncoder_ImproveEscapingCheckEdit: #42073 is now merged, so this one is ready for review)Description
Only the ASCII-path was touched.
DoesAsciiNeedEncodingreduced to be a plain lookup -> removed the lazy-initialization for each value and perform the initialization at once at first use(see Optimize FindFirstCharToEncode for JavaScriptEncoder.Default and Relaxed using Sse2 intrinsics. #41933 (comment) and Optimize FindFirstCharToEncode for JavaScriptEncoder.Default using Ssse3 intrinsics #42073 (comment))
if SSSE3 is available, use the same bitmask-approach as in Optimize FindFirstCharToEncode for JavaScriptEncoder.Default using Ssse3 intrinsics #42073, with the difference that this bitmask is created at runtime, based on
WillEncodeBenchmarks
Benchmarks are based on the ones from #41845 (https://gist.github.com/ahsonkhan/c566f5e7d65c1fde5a83a67be290c4ee#file-test_escapingbenchmark-cs got adapted for this)
with SSSE3 enabled
with SSSE3 disabled
/cc: @ahsonkhan