Skip to content

Conversation

@a74nh
Copy link
Contributor

@a74nh a74nh commented Sep 7, 2022

For every FMOV emitted by the hardware intrinsics that is currently marked as not skippable: if the src and dest registers are the same and the types match, then the instruction will have no effect and can be safely marked as skippable.

Fixes: #58954

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Sep 7, 2022
@ghost
Copy link

ghost commented Sep 7, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

For every FMOV emitted by the hardware intrinsics that is currently marked as not skippable: if the src and dest registers are the same and the types match, then the instruction will have no effect and can be safely marked as skippable.

Author: a74nh
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@a74nh
Copy link
Contributor Author

a74nh commented Sep 7, 2022

static float Lerp(float v0, float v1, float t) =>
MathF.FusedMultiplyAdd(t, v1,
MathF.FusedMultiplyAdd(-t, v0, v0));

Before:

IN0009: 000000                    stp     fp, lr, [sp, #-0x10]!
IN000a: 000004                    mov     fp, sp
IN0001: 000008                    fmov    s16, s0
IN0002: 00000C                    fmov    s0, s0
IN0003: 000010                    fmov    s17, s2
IN0004: 000014                    fmsub   s0, s0, s17, s16
IN0005: 000018                    fmov    s0, s0
IN0006: 00001C                    fmov    s1, s1
IN0007: 000020                    fmov    s2, s2
IN0008: 000024                    fmadd   s0, s1, s2, s0
IN000b: 000028                    ldp     fp, lr, [sp], #0x10
IN000c: 00002C                    ret     lr

With patch:

IN0005: 000000                    stp     fp, lr, [sp, #-0x10]!
IN0006: 000004                    mov     fp, sp
IN0001: 000008                    fmov    s16, s0
IN0002: 00000C                    fmov    s17, s2
IN0003: 000010                    fmsub   s0, s0, s17, s16
IN0004: 000014                    fmadd   s0, s1, s2, s0
IN0007: 000018                    ldp     fp, lr, [sp], #0x10
IN0008: 00001C                    ret     lr

@a74nh
Copy link
Contributor Author

a74nh commented Sep 7, 2022

Another way would be to do the check inside emitIns_Mov and get rid of canSkip, but that'd be a little messier to get all the logic right.

Alternatively, avoid calling emitIns_Mov() when the check passes in hw intrinsic code.

@a74nh
Copy link
Contributor Author

a74nh commented Sep 7, 2022

@kunalspathak @EgorBo

Copy link
Contributor

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Can you fix the format errors and I can retrigger the runtime CI? I don't think the failures are related to your changes.

For every FMOV emitted by the hardware intrinsics that is currently
marked as not skippable: if the src and dest registers are the same
and the types match, then the instruction will have no effect and
can be safely marked as skippable.
@a74nh a74nh force-pushed the a74nh_runtime_fmov branch from caa83db to 89de55a Compare September 8, 2022 08:44
@jakobbotsch
Copy link
Member

Also cc @tannergooding

assert(intrin.baseType == TYP_DOUBLE);
assert(GetEmitter()->IsMovInstruction(ins));
GetEmitter()->emitIns_Mov(ins, emitSize, targetReg, op1Reg, /* canSkip */ false, opt);
bool canSkip = ((targetReg == op1Reg) && (intrin.baseType == intrin.op1->gtType));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this one should always be skippable since DuplicateToVector64(double) is a "nop" (due to double being 64-bits and already in a SIMD register).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to confirm what path can hit this, however. Is it a synthesized node in lowering or elsewhere?

We don't have a managed size equivalent here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to confirm what path can hit this, however. Is it a synthesized node in lowering or elsewhere?

This was the smallest example I found in spmi replay:
dump-225213.txt

(The Assert hit was just added to catch this)

@a74nh
Copy link
Contributor Author

a74nh commented Sep 9, 2022

All updated as suggested - which I think changed every line in the patch.

@a74nh a74nh force-pushed the a74nh_runtime_fmov branch from f6f18a6 to 80e7d14 Compare September 9, 2022 08:09
@kunalspathak kunalspathak merged commit 9b7883b into dotnet:main Sep 12, 2022
@a74nh a74nh deleted the a74nh_runtime_fmov branch September 13, 2022 08:55
@ghost ghost locked as resolved and limited conversation to collaborators Oct 13, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JIT: Redundant fmov's on arm64 for a simple function

4 participants