Skip to content

ARM64: Fix the alignment for Vector64 to 8 bytes#37649

Merged
kunalspathak merged 1 commit intodotnet:masterfrom
kunalspathak:alignment
Jun 9, 2020
Merged

ARM64: Fix the alignment for Vector64 to 8 bytes#37649
kunalspathak merged 1 commit intodotnet:masterfrom
kunalspathak:alignment

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathak kunalspathak commented Jun 9, 2020

Fix the alignment for Vector64 on ARM64 to be 8-bytes instead of 16-bytes.

private static Vector128<int> int_Create(Vector64<int> lower, Vector64<int> upper)
{
    return Vector128.Create(lower, upper);
}

Before, we were creating the following:

G_M60438_IG01:
        A9BD7BFD          stp     fp, lr, [sp,#-48]!
        910003FD          mov     fp, sp
        FD0017A0          str     d0, [fp,#40]
        FD000FA1          str     d1, [fp,#24]
                                                ;; bbWeight=1    PerfScore 3.50
G_M60438_IG02:
        FD4017B0          ldr     d16, [fp,#40]
        FD400FB1          ldr     d17, [fp,#24]
        4E083E20          umov    x0, v17.d[0]
        4E181C10          ins     v16.d[1], x0
        4EB01E00          mov     v0.16b, v16.16b
                                                ;; bbWeight=1    PerfScore 6.50
G_M60438_IG03:
        A8C37BFD          ldp     fp, lr, [sp],#48
        D65F03C0          ret     lr

Now, we don't allocate lot of stack space:

G_M60438_IG01:
        A9BE7BFD          stp     fp, lr, [sp,#-32]!
        910003FD          mov     fp, sp
        FD000FA0          str     d0, [fp,#24]
        FD000BA1          str     d1, [fp,#16]
                                                ;; bbWeight=1    PerfScore 3.50
G_M60438_IG02:
        FD400FB0          ldr     d16, [fp,#24]
        FD400BB1          ldr     d17, [fp,#16]
        4E083E20          umov    x0, v17.d[0]
        4E181C10          ins     v16.d[1], x0
        4EB01E00          mov     v0.16b, v16.16b
                                                ;; bbWeight=1    PerfScore 6.50
G_M60438_IG03:
        A8C27BFD          ldp     fp, lr, [sp],#32
        D65F03C0          ret     lr
                                                ;; bbWeight=1    PerfScore 2.00

Fixes: #37429

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 9, 2020
return 16;
// preferred alignment for 64-bit vectors is 8-bytes.
// For everything else, 16-bytes.
return (size == 8) ? 8 : 16;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, the actual alignment of the ABI types is here: https://github.com/dotnet/runtime/blob/master/src/coreclr/src/vm/methodtablebuilder.cpp#L9541-L9580
We can't necessarily get this info from the VM because Vector2/Vector3/Vector4, and Vector<T> aren't classified as the ABI types, but we still align them as such if they are on the stack.

@kunalspathak kunalspathak marked this pull request as ready for review June 9, 2020 18:24
@kunalspathak
Copy link
Contributor Author

@CarolEidt , @dotnet/jit-contrib

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me and matches what I read from the ABI specs.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@kunalspathak kunalspathak merged commit 7bf37b0 into dotnet:master Jun 9, 2020
@kunalspathak kunalspathak deleted the alignment branch June 9, 2020 19:02
@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARM64: Investigate why more stack space is allocated than needed and why they are not aligned

4 participants