This repository was archived by the owner on Jan 23, 2023. It is now read-only.
[WIP] Use SequenceEqual with Intrinsics to use both Vector sizes at once#22019
Closed
benaadams wants to merge 1 commit intodotnet:masterfrom
Closed
[WIP] Use SequenceEqual with Intrinsics to use both Vector sizes at once#22019benaadams wants to merge 1 commit intodotnet:masterfrom
benaadams wants to merge 1 commit intodotnet:masterfrom
Conversation
CarolEidt
reviewed
Jan 16, 2019
Member
Author
|
Not quite sure where I've broken it atm |
77d17ed to
37284d2
Compare
Member
|
Performance numbers to show that this actually makes a difference? |
37284d2 to
ee51327
Compare
Member
Author
Currently regresses 😢 |
Member
Author
|
Improves for large lengths > 512, but regresses for shorter lengths, which isn't what I was going for... |
9f69600 to
39ec7b2
Compare
jkotas
reviewed
Jan 20, 2019
| Unsafe.ReadUnaligned<Vector<byte>>(ref Unsafe.AddByteOffset(ref second, n)); | ||
| } | ||
| // Use Sse2.IsSupported check to remove this branch at Jit time if the CPU does not support it. | ||
| else if (!Avx2.IsSupported && Sse2.IsSupported) |
Member
There was a problem hiding this comment.
Is the redundant !Avx2.IsSupported needed here? I would expect that the if (Avx2.IsSupported) above will be allow JIT to dead-code eliminate the else part.
9e23426 to
07fa382
Compare
Member
Author
|
Tried lots of permutations nothing that's a significant and consistent win; will try another time. Has given me ideas on the other vectorized methods where there should be easier wins. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The .NET Vector type is always the largest supported size; so if Avx2 is enabled byte sequences have to be 32 or larger to benefit from vectorization.
Using the Intrinsics we can enable it using Sse2 for sizes 16 or larger and then switch to Avx2 for 32 or larger on Avx2 machines.
Also some tweaks for codegen compactness
The intrinsic variant is slightly better than the output the Jit generates as it drops a
vmovapsinstructionG_M17140_IG11: C4A179100409 vmovupd xmm0, xmmword ptr [rcx+r9] C4A179100C0A vmovupd xmm1, xmmword ptr [rdx+r9] - C5FC28D0 vmovaps ymm2, ymm0 - C5ED76D1 vpcmpeqd ymm2, ymm1 + C5F974C1 vpcmpeqb xmm0, xmm0, xmm1 - C5FDD7C2 vpmovmskb eax, ymm2 + C579D7D0 vpmovmskb r10d, xmm0 - 83F8FF cmp eax, -1 + 4183FAFF cmp r10d, -1 - 0F858E000000 jne G_M17103_IG14 + 752B jne SHORT G_M17140_IG14/cc @CarolEidt @fiigii @tannergooding