Optimize Vector4.Lerp by EgorBo · Pull Request #35525 · dotnet/runtime

EgorBo · 2020-04-27T17:27:12Z

Before:

G_M10056_IG01:
       sub      rsp, 56
       vzeroupper 
       vmovaps  qword ptr [rsp+20H], xmm6
       vmovaps  qword ptr [rsp+10H], xmm7
       vmovaps  qword ptr [rsp], xmm8
G_M10056_IG02:
       vmovss   xmm0, dword ptr [rdx]
       vmovss   xmm1, dword ptr [rdx+4]
       vmovss   xmm2, dword ptr [rdx+8]
       vmovss   xmm4, dword ptr [rdx+12]
       vmovss   xmm5, dword ptr [r8]
       vmovss   xmm6, dword ptr [r8+4]
       vmovss   xmm7, dword ptr [r8+8]
       vmovss   xmm8, dword ptr [r8+12]
       vsubss   xmm5, xmm5, xmm0
       vmulss   xmm5, xmm5, xmm3
       vaddss   xmm0, xmm5, xmm0
       vsubss   xmm5, xmm6, xmm1
       vmulss   xmm5, xmm5, xmm3
       vaddss   xmm1, xmm5, xmm1
       vsubss   xmm5, xmm7, xmm2
       vmulss   xmm5, xmm5, xmm3
       vaddss   xmm2, xmm5, xmm2
       vsubss   xmm5, xmm8, xmm4
       vmulss   xmm3, xmm5, xmm3
       vaddss   xmm3, xmm3, xmm4
       vxorps   xmm4, xmm4
       vmovss   xmm4, xmm4, xmm3
       vpslldq  xmm4, 4
       vmovss   xmm4, xmm4, xmm2
       vpslldq  xmm4, 4
       vmovss   xmm4, xmm4, xmm1
       vpslldq  xmm4, 4
       vmovss   xmm4, xmm4, xmm0
       vmovaps  xmm0, xmm4
       vmovupd  xmmword ptr [rcx], xmm0
       mov      rax, rcx
G_M10056_IG03:
       vmovaps  xmm6, qword ptr [rsp+20H]
       vmovaps  xmm7, qword ptr [rsp+10H]
       vmovaps  xmm8, qword ptr [rsp]
       add      rsp, 56
       ret      
; Total bytes of code: 182

After:

       vzeroupper 
G_M18874_IG02:
       vmovupd  xmm0, xmmword ptr [r8]
       vmovupd  xmm1, xmmword ptr [rdx]
       vsubps   xmm0, xmm1
       vbroadcastss xmm3, xmm3
       vmulps   xmm0, xmm3
       vaddps   xmm0, xmm1, xmm0
       vmovupd  xmmword ptr [rcx], xmm0
       mov      rax, rcx
G_M18874_IG03:
       ret

ghost · 2020-04-27T17:27:15Z

Tagging subscribers to this area: @tannergooding
Notify danmosemsft if you want to be subscribed.

EgorBo · 2020-04-27T17:27:56Z

In theory, the following implementation should be faster

private static Vector4 Lerp(Vector4 value1, Vector4 value2, float amount)
{
    // x86 with FMA
    Vector128<float> amountVec = Vector128.Create(amount);
    return Fma.MultiplyAdd(amountVec, value2.AsVector128(), 
        Fma.MultiplyAddNegated(amountVec, value1.AsVector128(), value1.AsVector128())).AsVector4();
}

but only in some sort of fast-math mode

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector4.cs

EgorBo · 2020-04-27T17:56:13Z

Vector2:

Before

       vzeroupper 
       mov      qword ptr [rsp+08H], rcx
       mov      qword ptr [rsp+10H], rdx
G_M38716_IG02:
       vmovss   xmm0, dword ptr [rsp+10H]
       vmovss   xmm1, dword ptr [rsp+08H]
       vsubss   xmm0, xmm0, xmm1
       vmulss   xmm0, xmm0, xmm2
       vaddss   xmm0, xmm0, xmm1
       vmovss   xmm1, dword ptr [rsp+14H]
       vmovss   xmm3, dword ptr [rsp+0CH]
       vsubss   xmm1, xmm1, xmm3
       vmulss   xmm1, xmm1, xmm2
       vaddss   xmm1, xmm1, xmm3
       vxorps   xmm2, xmm2
       vmovss   xmm2, xmm2, xmm1
       vpslldq  xmm2, 4
       vmovss   xmm2, xmm2, xmm0
       vmovaps  xmm0, xmm2
       vmovd    rax, xmm0
G_M38716_IG03:
       ret      
; Total bytes of code: 88

After:

       push     rax
       vzeroupper 
       vmovd    xmm0, rcx
       vmovd    xmm1, rdx
G_M37838_IG02:
       vsubps   xmm1, xmm0
       vxorps   xmm3, xmm3
       vmovss   xmm3, xmm3, xmm2
       vpslldq  xmm3, 4
       vmovss   xmm3, xmm3, xmm2
       vmovaps  xmm2, xmm3
       vmulps   xmm1, xmm2
       vmovsd   qword ptr [rsp], xmm1
       vmovsd   xmm1, qword ptr [rsp]
       vaddps   xmm0, xmm1
       vmovd    rax, xmm0
G_M37838_IG03:
       add      rsp, 8
       ret      
; Total bytes of code: 67

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector3.cs

tannergooding · 2020-05-04T15:45:34Z

Closing and reopening to retrigger the run against current master. It should be good to merge once tests pass.

ghost · 2020-05-04T15:49:46Z

Hello @tannergooding!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

EgorBo · 2020-05-04T20:57:59Z

@tannergooding the failing job is a known issue: #35812

tannergooding · 2020-05-04T21:07:26Z

Thanks! Merged.

Updating Vector2/3/4 to be consistent is just pending final approval here at which point we can fix them up: #35529
That will also open things up to use System.Runtime.Intrinsics.Fma when available.

Optimize Vector4.Lerp

c9b5579

EgorBo added the area-System.Numerics label Apr 27, 2020

EgorBo requested a review from tannergooding April 27, 2020 17:27

tannergooding reviewed Apr 27, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector4.cs Outdated Show resolved Hide resolved

tannergooding reviewed Apr 27, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector4.cs Show resolved Hide resolved

Implement for Vector2 and Vector4

154b39d

Rollback Vector3

b4e33f6

tannergooding reviewed Apr 27, 2020

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/Numerics/Vector3.cs Show resolved Hide resolved

tannergooding mentioned this pull request Apr 27, 2020

Vector2/4.Lerp do not always return value2 when amount is 1 #35529

Closed

jaredpar mentioned this pull request Apr 27, 2020

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

tannergooding closed this May 4, 2020

tannergooding reopened this May 4, 2020

tannergooding approved these changes May 4, 2020

View reviewed changes

tannergooding added the auto-merge label May 4, 2020

tannergooding merged commit 2848dbf into dotnet:master May 4, 2020

EgorBo deleted the vector4-lerp branch May 25, 2020 11:54

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Vector4.Lerp#35525

Optimize Vector4.Lerp#35525
tannergooding merged 3 commits intodotnet:masterfrom
EgorBo:vector4-lerp

EgorBo commented Apr 27, 2020 •

edited

Loading

Uh oh!

ghost commented Apr 27, 2020

Uh oh!

EgorBo commented Apr 27, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Apr 27, 2020

Uh oh!

Uh oh!

tannergooding commented May 4, 2020

Uh oh!

ghost commented May 4, 2020

Uh oh!

EgorBo commented May 4, 2020 •

edited

Loading

Uh oh!

tannergooding commented May 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EgorBo commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before:

After:

Uh oh!

ghost commented Apr 27, 2020

Uh oh!

EgorBo commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EgorBo commented Apr 27, 2020

Before

After:

Uh oh!

Uh oh!

tannergooding commented May 4, 2020

Uh oh!

ghost commented May 4, 2020

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

Uh oh!

EgorBo commented May 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tannergooding commented May 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EgorBo commented Apr 27, 2020 •

edited

Loading

EgorBo commented Apr 27, 2020 •

edited

Loading

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (`@msftbot`) and give me an instruction to get started! Learn more here.

EgorBo commented May 4, 2020 •

edited

Loading