Merged
Conversation
|
Tagging subscribers to this area: @tannergooding |
Member
Author
|
In theory, the following implementation should be faster private static Vector4 Lerp(Vector4 value1, Vector4 value2, float amount)
{
// x86 with FMA
Vector128<float> amountVec = Vector128.Create(amount);
return Fma.MultiplyAdd(amountVec, value2.AsVector128(),
Fma.MultiplyAddNegated(amountVec, value1.AsVector128(), value1.AsVector128())).AsVector4();
}but only in some sort of fast-math mode |
src/libraries/System.Private.CoreLib/src/System/Numerics/Vector4.cs
Outdated
Show resolved
Hide resolved
Member
Author
|
Vector2: Before vzeroupper
mov qword ptr [rsp+08H], rcx
mov qword ptr [rsp+10H], rdx
G_M38716_IG02:
vmovss xmm0, dword ptr [rsp+10H]
vmovss xmm1, dword ptr [rsp+08H]
vsubss xmm0, xmm0, xmm1
vmulss xmm0, xmm0, xmm2
vaddss xmm0, xmm0, xmm1
vmovss xmm1, dword ptr [rsp+14H]
vmovss xmm3, dword ptr [rsp+0CH]
vsubss xmm1, xmm1, xmm3
vmulss xmm1, xmm1, xmm2
vaddss xmm1, xmm1, xmm3
vxorps xmm2, xmm2
vmovss xmm2, xmm2, xmm1
vpslldq xmm2, 4
vmovss xmm2, xmm2, xmm0
vmovaps xmm0, xmm2
vmovd rax, xmm0
G_M38716_IG03:
ret
; Total bytes of code: 88After: push rax
vzeroupper
vmovd xmm0, rcx
vmovd xmm1, rdx
G_M37838_IG02:
vsubps xmm1, xmm0
vxorps xmm3, xmm3
vmovss xmm3, xmm3, xmm2
vpslldq xmm3, 4
vmovss xmm3, xmm3, xmm2
vmovaps xmm2, xmm3
vmulps xmm1, xmm2
vmovsd qword ptr [rsp], xmm1
vmovsd xmm1, qword ptr [rsp]
vaddps xmm0, xmm1
vmovd rax, xmm0
G_M37838_IG03:
add rsp, 8
ret
; Total bytes of code: 67 |
Member
|
Closing and reopening to retrigger the run against current master. It should be good to merge once tests pass. |
tannergooding
approved these changes
May 4, 2020
|
Hello @tannergooding! Because this pull request has the p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (
|
Member
Author
|
@tannergooding the failing job is a known issue: #35812 |
Member
|
Thanks! Merged. Updating |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Before:
After: