Skip to content

Conversation

@anthonycanino
Copy link
Contributor

Draft PR for in-progress work to accelerate System.Half with FP16 ISA.

Current work done:

  1. Add a TYP_HALF to the .NET runtime, which is treated like a TYP_SIMDXX, but with some notable differences. Namely, a TYP_HALF is passed around via the xmm registers, and while it will pass a varTypeIsStruct test, it must be treated as a primitive in other places.

  2. Accelerate System.Half operations with the TYP_HALF and some FP16 intrinsics. Not every possible function has been accelerated yet.

For discussion:

  1. I have currently worked around some checks to make TYP_HALF behave like a struct and a primitive. It's very ad-hoc at the moment.

  2. Much of the work to transform the named System.Half intrinsics into a sequence of intrinsic nodes is done in importcall.cpp and might want to be moved up into some of the gtNewSimdXX nodes.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 18, 2025
@anthonycanino
Copy link
Contributor Author

@tannergooding @jakobbotsch please take a look when you get a chance.


#if defined(TARGET_AMD64) && !defined(UNIX_AMD64_ABI)
assert(!varTypeIsStruct(treeNode));
assert(!varTypeIsStruct(treeNode) || treeNode->TypeGet() == TYP_HALF);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert(!varTypeIsStruct(treeNode) || treeNode->TypeGet() == TYP_HALF);
assert(!varTypeIsStruct(treeNode) || treeNode->TypeIs(TYP_HALF));

How do SIMD types avoid hitting this assert?

Copy link
Contributor Author

@anthonycanino anthonycanino Jan 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe its due to on windows the SIMD type would be passed/returned via a buffer, which we have avoided doing with the half type and should have been transformed prior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my understanding as well.

Notably that is "incorrect" as SIMD types are supposed to be returned in register on Windows and this is a known inconsistency we want to fix long term (#9578)

@anthonycanino
Copy link
Contributor Author

@dotnet/intel @tannergooding may I get some high level feedback on the structure of the PR?

Comment on lines 578 to 581
if (!compOpportunisticallyDependsOn(InstructionSet_AVX10v1))
{
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this last, not first, otherwise code gets tagged as benefiting from using AVX10v1 unnecessarily

Comment on lines -5393 to -5394
// kmov instructions reach this path with EA_8BYTE size, even on x86
|| IsKMOVInstruction(ins)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for removing this part of the assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think that was an error, will fix.


case INS_vmovsh:
{
hasSideEffect = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this have a side effect of clearing the upper-bits?

That is, it always does DEST[MAXVL:128] := 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, I will change.


#if defined(TARGET_AMD64)
case INS_movsxd:
case INS_vmovsh:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't TARGET_AMD64 exclusive as vmovsh is listed with V/V for support, so is valid for both 64 and 32-bit mode.

Comment on lines +11819 to +11822
if (IsXMMReg(reg))
{
return emitXMMregName(reg);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be TARGET_AMD64 exclusive either.

else if (code & 0xFF000000)
{
if (size == EA_2BYTE)
if (size == EA_2BYTE && (ins != INS_vmovsh && ins != INS_vaddsh))
Copy link
Member

@tannergooding tannergooding Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use && !IsSimdInstruction(ins)?

case INS_movapd:
case INS_movupd:
// todo-xarch-half: come back to fix
case INS_vmovsh:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be grouped with vmovss and vmovsd? While we may not have exact numbers, I'd expect it to have identical perf/latency to those rather than the more general movaps and friends.

float insLatency = insLatencyInfos[ins];

// todo-xarch-half: hacking an exit on the unhandled ins to make prototyping easier
if (ins == INS_vcvtss2sh || ins == INS_vcvtsh2ss || ins == INS_vaddsh || ins == INS_vsubsh ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to put most of these with the v*ss and v*sd equivalents prior to mergine this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and for the above, I will get the proper numbers before putting the PR in non-draft.

Comment on lines 23081 to 23082
// todo-half: this is only to create zero constant half nodes for use in instrincis, anything
// else will not work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand this comment.

Presumably we just need a FloatingPointUtils::convertDoubleToHalf(...) method which returns a float16_t type (these were added in C++23, which is newer than our baseline, so we'd just typedef uint16_t float16_t; for the time being).

We then vecCon->gtSimdVal.f16[i] = cnsVal

{
if (arg->IsCnsFltOrDbl())
{
simdVal.f16[argIdx] = static_cast<uint16_t>(arg->AsDblCon()->DconValue());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect as it does a double->uint16_t cast, when we rather need double->float16_t

}
}
else if (node->TypeIs(TYP_VOID))
else if (node->TypeIs(TYP_VOID) || node->TypeIs(TYP_INT))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think it was also a bug, I have removed.

if (sizeBytes < getMinVectorByteLength())
{
*pSimdBaseJitType = simdBaseType;
// The struct itself is accelerated, in this case, it is `Half`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an assert(sizeBytes == 2) in case we add other sizes in the future?

break;
}

case NI_System_Half_op_Increment:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these, like Increment/Decrement, could be merged as well using lookupHalfIntrinsic

Comment on lines 2224 to 2229
if (srcSize == 2)
return INS_vmovsh;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General convention is to have braces, particularly if it is part of an if/else chain:

Suggested change
if (srcSize == 2)
return INS_vmovsh;
if (srcSize == 2)
{
return INS_vmovsh;
}

Comment on lines 9781 to 9784
// if (node->TypeGet() == TYP_HALF)
//{
// return false;
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code?

Comment on lines 4387 to 4393
case TYP_HALF:
#ifdef TARGET_X86
useCandidates = RBM_FLOATRET;
#else
useCandidates = RBM_FLOATRET.GetFloatRegSet();
#endif
break;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be identical to the TYP_FLOAT path and can be collapsed to share it:

Suggested change
case TYP_HALF:
#ifdef TARGET_X86
useCandidates = RBM_FLOATRET;
#else
useCandidates = RBM_FLOATRET.GetFloatRegSet();
#endif
break;
case TYP_HALF:

Comment on lines 4402 to 4401
// We ONLY want the valid double register in the RBM_DOUBLERET mask.
#ifdef TARGET_AMD64
useCandidates = (RBM_DOUBLERET & RBM_ALLDOUBLE).GetFloatRegSet();
#else
useCandidates = (RBM_DOUBLERET & RBM_ALLDOUBLE).GetFloatRegSet();
#endif // TARGET_AMD64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR, but these two paths are the same

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants