[Draft] Accelerate `Half` with FP16 ISA #122649

anthonycanino · 2025-12-18T20:45:27Z

Draft PR for in-progress work to accelerate System.Half with FP16 ISA.

Current work done:

Add a TYP_HALF to the .NET runtime, which is treated like a TYP_SIMDXX, but with some notable differences. Namely, a TYP_HALF is passed around via the xmm registers, and while it will pass a varTypeIsStruct test, it must be treated as a primitive in other places.
Accelerate System.Half operations with the TYP_HALF and some FP16 intrinsics. Not every possible function has been accelerated yet.

For discussion:

I have currently worked around some checks to make TYP_HALF behave like a struct and a primitive. It's very ad-hoc at the moment.
Much of the work to transform the named System.Half intrinsics into a sequence of intrinsic nodes is done in importcall.cpp and might want to be moved up into some of the gtNewSimdXX nodes.

anthonycanino · 2025-12-18T20:50:41Z

@tannergooding @jakobbotsch please take a look when you get a chance.

jakobbotsch · 2026-01-05T09:32:12Z

src/coreclr/jit/codegencommon.cpp


 #if defined(TARGET_AMD64) && !defined(UNIX_AMD64_ABI)
-    assert(!varTypeIsStruct(treeNode));
+    assert(!varTypeIsStruct(treeNode) || treeNode->TypeGet() == TYP_HALF);


Suggested change

assert(!varTypeIsStruct(treeNode) || treeNode->TypeGet() == TYP_HALF);

assert(!varTypeIsStruct(treeNode) || treeNode->TypeIs(TYP_HALF));

How do SIMD types avoid hitting this assert?

I believe its due to on windows the SIMD type would be passed/returned via a buffer, which we have avoided doing with the half type and should have been transformed prior.

That's my understanding as well.

Notably that is "incorrect" as SIMD types are supposed to be returned in register on Windows and this is a known inconsistency we want to fix long term (#9578)

anthonycanino · 2026-01-06T12:47:42Z

@dotnet/intel @tannergooding may I get some high level feedback on the structure of the PR?

src/coreclr/jit/codegenxarch.cpp

tannergooding · 2026-01-06T16:34:34Z

src/coreclr/jit/compiler.cpp

+    if (!compOpportunisticallyDependsOn(InstructionSet_AVX10v1))
+    {
+        return false;
+    }


We need this last, not first, otherwise code gets tagged as benefiting from using AVX10v1 unnecessarily

src/coreclr/jit/compiler.cpp

tannergooding · 2026-01-06T16:47:37Z

src/coreclr/jit/emitxarch.cpp

-               // kmov instructions reach this path with EA_8BYTE size, even on x86
-               || IsKMOVInstruction(ins)


What's the reason for removing this part of the assert?

Think that was an error, will fix.

tannergooding · 2026-01-06T16:49:34Z

src/coreclr/jit/emitxarch.cpp


+        case INS_vmovsh:
+        {
+            hasSideEffect = false;


Doesn't this have a side effect of clearing the upper-bits?

That is, it always does DEST[MAXVL:128] := 0

You are correct, I will change.

tannergooding · 2026-01-06T16:51:02Z

src/coreclr/jit/emitxarch.cpp


 #if defined(TARGET_AMD64)
        case INS_movsxd:
+        case INS_vmovsh:


This isn't TARGET_AMD64 exclusive as vmovsh is listed with V/V for support, so is valid for both 64 and 32-bit mode.

tannergooding · 2026-01-06T16:51:35Z

src/coreclr/jit/emitxarch.cpp

+            if (IsXMMReg(reg))
+            {
+                return emitXMMregName(reg);
+            }


This shouldn't be TARGET_AMD64 exclusive either.

tannergooding · 2026-01-06T16:52:44Z

src/coreclr/jit/emitxarch.cpp

    else if (code & 0xFF000000)
    {
-        if (size == EA_2BYTE)
+        if (size == EA_2BYTE && (ins != INS_vmovsh && ins != INS_vaddsh))


Can we just use && !IsSimdInstruction(ins)?

tannergooding · 2026-01-06T16:54:14Z

src/coreclr/jit/emitxarch.cpp

        case INS_movapd:
        case INS_movupd:
+        // todo-xarch-half: come back to fix
+        case INS_vmovsh:


Shouldn't this be grouped with vmovss and vmovsd? While we may not have exact numbers, I'd expect it to have identical perf/latency to those rather than the more general movaps and friends.

tannergooding · 2026-01-06T16:55:22Z

src/coreclr/jit/emitxarch.cpp

            float insLatency = insLatencyInfos[ins];

+            // todo-xarch-half: hacking an exit on the unhandled ins to make prototyping easier
+            if (ins == INS_vcvtss2sh || ins == INS_vcvtsh2ss || ins == INS_vaddsh || ins == INS_vsubsh ||


I think we want to put most of these with the v*ss and v*sd equivalents prior to mergine this PR.

Yes, and for the above, I will get the proper numbers before putting the PR in non-draft.

tannergooding · 2026-01-06T16:59:38Z

src/coreclr/jit/gentree.cpp

+                // todo-half: this is only to create zero constant half nodes for use in instrincis, anything
+                // else will not work


Not sure I understand this comment.

Presumably we just need a FloatingPointUtils::convertDoubleToHalf(...) method which returns a float16_t type (these were added in C++23, which is newer than our baseline, so we'd just typedef uint16_t float16_t; for the time being).

We then vecCon->gtSimdVal.f16[i] = cnsVal

tannergooding · 2026-01-06T17:04:24Z

src/coreclr/jit/gentree.h

+            {
+                if (arg->IsCnsFltOrDbl())
+                {
+                    simdVal.f16[argIdx] = static_cast<uint16_t>(arg->AsDblCon()->DconValue());


This looks incorrect as it does a double->uint16_t cast, when we rather need double->float16_t

tannergooding · 2026-01-06T17:07:37Z

src/coreclr/jit/hwintrinsiccodegenxarch.cpp

                    }
                }
-                else if (node->TypeIs(TYP_VOID))
+                else if (node->TypeIs(TYP_VOID) || node->TypeIs(TYP_INT))


What's the reason for this change?

Think it was also a bug, I have removed.

src/coreclr/jit/importer.cpp

tannergooding · 2026-01-06T17:10:08Z

src/coreclr/jit/importer.cpp

+                if (sizeBytes < getMinVectorByteLength())
                {
-                    *pSimdBaseJitType = simdBaseType;
+                    // The struct itself is accelerated, in this case, it is `Half`.


Add an assert(sizeBytes == 2) in case we add other sizes in the future?

tannergooding · 2026-01-06T17:12:45Z

src/coreclr/jit/importercalls.cpp

+                break;
+            }
+
+            case NI_System_Half_op_Increment:


Some of these, like Increment/Decrement, could be merged as well using lookupHalfIntrinsic

tannergooding · 2026-01-06T17:20:58Z

src/coreclr/jit/instr.cpp

+    if (srcSize == 2)
+        return INS_vmovsh;


General convention is to have braces, particularly if it is part of an if/else chain:

Suggested change

if (srcSize == 2)

return INS_vmovsh;

if (srcSize == 2)

{

return INS_vmovsh;

}

tannergooding · 2026-01-06T17:23:15Z

src/coreclr/jit/lower.cpp

+    // if (node->TypeGet() == TYP_HALF)
+    //{
+    //     return false;
+    // }


tannergooding · 2026-01-06T17:24:23Z

src/coreclr/jit/lsrabuild.cpp

+                    case TYP_HALF:
+#ifdef TARGET_X86
+                        useCandidates = RBM_FLOATRET;
+#else
+                    useCandidates = RBM_FLOATRET.GetFloatRegSet();
+#endif
+                        break;


This looks to be identical to the TYP_FLOAT path and can be collapsed to share it:

Suggested change

case TYP_HALF:

#ifdef TARGET_X86

useCandidates = RBM_FLOATRET;

#else

useCandidates = RBM_FLOATRET.GetFloatRegSet();

#endif

break;

case TYP_HALF:

tannergooding · 2026-01-06T17:24:59Z

src/coreclr/jit/lsrabuild.cpp

                        // We ONLY want the valid double register in the RBM_DOUBLERET mask.
 #ifdef TARGET_AMD64
                        useCandidates = (RBM_DOUBLERET & RBM_ALLDOUBLE).GetFloatRegSet();
 #else
                    useCandidates = (RBM_DOUBLERET & RBM_ALLDOUBLE).GetFloatRegSet();
 #endif // TARGET_AMD64


not related to this PR, but these two paths are the same

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 18, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 18, 2025

build-analysis bot mentioned this pull request Dec 19, 2025

[android][arm64] System.Net.Sockets.Tests.SendTo_SyncForceNonBlocking.Datagram_UDP_ShouldImplicitlyBindLocalEndpoint fails with NetworkUnreachable #120526

Open

jakobbotsch reviewed Jan 5, 2026

View reviewed changes

anthonycanino force-pushed the half-xmm-struct-abi branch from 3b8abaa to f633726 Compare January 5, 2026 19:52

This was referenced Jan 5, 2026

[mono] mono_thread_info_install_interrupt: previous_token should be INTERRUPT_STATE #122669

Open

iOS.Device test WorkItemExecutions #122874

Open

tannergooding reviewed Jan 6, 2026

View reviewed changes

src/coreclr/jit/codegenxarch.cpp Show resolved Hide resolved

tannergooding reviewed Jan 6, 2026

View reviewed changes

src/coreclr/jit/compiler.cpp Show resolved Hide resolved

tannergooding reviewed Jan 6, 2026

View reviewed changes

src/coreclr/jit/importer.cpp Show resolved Hide resolved

tannergooding reviewed Jan 6, 2026

View reviewed changes

build-analysis bot mentioned this pull request Jan 8, 2026

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

3 tasks

anthonycanino added 29 commits January 13, 2026 13:06

Rewrote the half importation to follow the standard hwintrinsic pattern.

8a22001

Add Half comparison intrinsics.

682a2b9

Add inc and dec.

b916907

Add sqrt, min, max.

6972a44

Add round and fma.

b58a5d6

Formatting.

f868b20

Edits.

aa0395b

Rework to treat TYP_HALF as a struct but not TYP_STRUCT.

ea83676

Struct promotion

a35fe42

Typo.

46115c6

Adding get intrinsics.

d051fe3

wasm fix

ec24fa1

Bug fix.

c3fa2eb

Bug fix.

b0c01b1

Bug fix.

938b9bf

Bug fix.

0c67154

Bug fix.

05f5534

Bug fix.

ffd9254

Jit Format.

3a95a4e

(WIP) fixes.

7a00f23

Reverting change.

e8c07a9

More review fixes.

eeb14ee

Added FloatingPointUtils::convertDoubleToFloat16.

11bd60f

Missing file.

d69d37a

Fix.

65a3227

Adding round related intrinsics.

2e01f73

Add Ceiling, Floor, and Truncate optimization.

ad7eb33

Adding perfscore numbers.

84cde35

Number fixes.

4235f30

anthonycanino force-pushed the half-xmm-struct-abi branch from 3537f96 to 4235f30 Compare January 13, 2026 21:39

	assert(!varTypeIsStruct(treeNode) \|\| treeNode->TypeGet() == TYP_HALF);
	assert(!varTypeIsStruct(treeNode) \|\| treeNode->TypeIs(TYP_HALF));

		// kmov instructions reach this path with EA_8BYTE size, even on x86
		\|\| IsKMOVInstruction(ins)

		// todo-half: this is only to create zero constant half nodes for use in instrincis, anything
		// else will not work

[Draft] Accelerate Half with FP16 ISA #122649

Are you sure you want to change the base?

[Draft] Accelerate Half with FP16 ISA #122649

Uh oh!

Conversation

anthonycanino commented Dec 18, 2025

Uh oh!

anthonycanino commented Dec 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anthonycanino Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anthonycanino commented Jan 6, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergooding Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Draft] Accelerate `Half` with FP16 ISA #122649

[Draft] Accelerate `Half` with FP16 ISA #122649

anthonycanino Jan 5, 2026 •

edited

Loading

tannergooding Jan 6, 2026 •

edited

Loading