Skip to content

Conversation

@amanasifkhalid
Copy link
Contributor

@amanasifkhalid amanasifkhalid commented Feb 12, 2024

Part of #94549. Implements the following encodings:

  • IF_SVE_FZ_2A
  • IF_SVE_HG_2A (SVE2)
  • IF_SVE_GZ_3A
  • IF_SVE_GV_3A
  • IF_SVE_GY_3B_D (SVE2)
  • IF_SVE_GY_3A (SVE2)
  • IF_SVE_DV_4A (SVE2)

cstool output:

sqcvtn        z0.h, { z2.s, z3.s }
sqcvtun       z6.h, { z14.s, z15.s }
uqcvtn        z14.h, { z30.s, z31.s }
bfmlalb       z0.s, z1.h, z0.h[0]
bfmlalt       z2.s, z3.h, z1.h[1]
bfmlslb       z4.s, z5.h, z2.h[2]
bfmlslt       z6.s, z7.h, z3.h[3]
fmlalb        z8.s, z9.h, z4.h[4]
fmlalt        z10.s, z11.h, z5.h[5]
fmlslb        z12.s, z13.h, z6.h[6]
fmlslt        z14.s, z15.h, z7.h[7]
fcmla z0.s, z1.s, z0.s[0], #0
fcmla z2.s, z3.s, z5.s[1], #90
fcmla z4.s, z5.s, z10.s[0], #180
fcmla z6.s, z7.s, z15.s[1], #270

JitDisasm output:

sqcvtn  z0.h, { z2.s, z3.s }
sqcvtun z6.h, { z14.s, z15.s }
uqcvtn  z14.h, { z30.s, z31.s }
bfmlalb z0.s, z1.h, z0.h[0]
bfmlalt z2.s, z3.h, z1.h[1]
bfmlslb z4.s, z5.h, z2.h[2]
bfmlslt z6.s, z7.h, z3.h[3]
fmlalb  z8.s, z9.h, z4.h[4]
fmlalt  z10.s, z11.h, z5.h[5]
fmlslb  z12.s, z13.h, z6.h[6]
fmlslt  z14.s, z15.h, z7.h[7]
fcmla   z0.s, z1.s, z0.s[0], #0
fcmla   z2.s, z3.s, z5.s[1], #90
fcmla   z4.s, z5.s, z10.s[0], #180
fcmla   z6.s, z7.s, z15.s[1], #270

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 12, 2024
@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 12, 2024
@amanasifkhalid
Copy link
Contributor Author

cc @dotnet/arm64-contrib

@ghost ghost assigned amanasifkhalid Feb 12, 2024
@ghost
Copy link

ghost commented Feb 12, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Implements the following encodings:

  • IF_SVE_FZ_2A
  • IF_SVE_HG_2A (SVE2)
  • IF_SVE_GZ_3A
  • IF_SVE_GV_3A
  • IF_SVE_GY_3B_D (SVE2)
  • IF_SVE_GY_3A (SVE2)
  • IF_SVE_DV_4A (SVE2)

cstool output:

sqcvtn        z0.h, { z2.s, z3.s }
sqcvtun       z6.h, { z14.s, z15.s }
uqcvtn        z14.h, { z30.s, z31.s }
bfmlalb       z0.s, z1.h, z0.h[0]
bfmlalt       z2.s, z3.h, z1.h[1]
bfmlslb       z4.s, z5.h, z2.h[2]
bfmlslt       z6.s, z7.h, z3.h[3]
fmlalb        z8.s, z9.h, z4.h[4]
fmlalt        z10.s, z11.h, z5.h[5]
fmlslb        z12.s, z13.h, z6.h[6]
fmlslt        z14.s, z15.h, z7.h[7]
fcmla z0.s, z1.s, z0.s[0], #0
fcmla z2.s, z3.s, z5.s[1], #90
fcmla z4.s, z5.s, z10.s[0], #180
fcmla z6.s, z7.s, z15.s[1], #270

JitDisasm output:

sqcvtn  z0.h, { v2.s, v3.s }, 
sqcvtun z6.h, { v14.s, v15.s }, 
uqcvtn  z14.h, { v30.s, v31.s },
bfmlalb z0.s, z1.h, z0.h[0]
bfmlalt z2.s, z3.h, z1.h[1]
bfmlslb z4.s, z5.h, z2.h[2]
bfmlslt z6.s, z7.h, z3.h[3]
fmlalb  z8.s, z9.h, z4.h[4]
fmlalt  z10.s, z11.h, z5.h[5]
fmlslb  z12.s, z13.h, z6.h[6]
fmlslt  z14.s, z15.h, z7.h[7]
fcmla   z0.s, z1.s, z0.s[0], #0
fcmla   z2.s, z3.s, z5.s[1], #90
fcmla   z4.s, z5.s, z10.s[0], #180
fcmla   z6.s, z7.s, z15.s[1], #270
Author: amanasifkhalid
Assignees: -
Labels:

area-CodeGen-coreclr, arch-arm64-sve

Milestone: -


case INS_sve_bfmul:
assert(opt = INS_OPTS_SCALABLE_H);
assert(opt == INS_OPTS_SCALABLE_H);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

//------------------------------------------------------------------------
// emitDispVectorRegPair: Display a pair of vector registers
//
void emitter::emitDispVectorRegPair(regNumber reg, insOpts opt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reuse emitDispSveConsecutiveRegList instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can; updated.

@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98310

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Hopefully when capstone gets updated this month, we will be able to decode the unsupported ones.

@amanasifkhalid amanasifkhalid merged commit c7253b1 into dotnet:main Feb 15, 2024
@amanasifkhalid amanasifkhalid deleted the sve-fz-2a branch February 15, 2024 04:24
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants