Skip to content

JIT: Accelerate floating->long casts on x86#125180

Open
saucecontrol wants to merge 2 commits intodotnet:mainfrom
saucecontrol:lng2flt6
Open

JIT: Accelerate floating->long casts on x86#125180
saucecontrol wants to merge 2 commits intodotnet:mainfrom
saucecontrol:lng2flt6

Conversation

@saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Mar 4, 2026

This adds floating->long/ulong cast codegen for AVX-512 and AVX10.2 on x86. With this, all non-overflow casts are now hardware accelerated. This is the last bit pulled from #116805.

Typical Diff (double->long AVX-512):

-       sub      esp, 8
-       vzeroupper 
-       vmovsd   xmm0, qword ptr [esp+0x0C]
-       sub      esp, 8
-       ; npt arg push 0
-       ; npt arg push 1
-       vmovsd   qword ptr [esp], xmm0
-       call     CORINFO_HELP_DBL2LNG
-       ; gcr arg pop 2
+       vmovsd   xmm0, qword ptr [esp+0x04]
+       vcmpordsd k1, xmm0, xmm0
+       vcvttpd2qq xmm1 {k1}{z}, xmm0
+       vcmpge_oqsd k1, xmm0, qword ptr [@RWD00]
+       vpcmpeqd xmm0, xmm0, xmm0
+       vpsrlq   xmm1 {k1}, xmm0, 1
+       vmovd    eax, xmm1
+       vpextrd  edx, xmm1, 1
-       add      esp, 8
        ret      8

+RWD00  	dq	43E0000000000000h
 
-; Total bytes of code 31
+; Total bytes of code 54

Full Diffs

Breakdown of the double->long asm:

; load the scalar double
vmovsd   xmm0, qword ptr [esp+0x04]

; set the low bit of k1 if the scalar value is not NaN
vcmpordsd k1, xmm0, xmm0

; convert, using k1 mask bit.  if the mask bit is not set (meaning we have a NaN), set the value to zero
vcvttpd2qq xmm1 {k1}{z}, xmm0

; set the low bit of k1 if the input was greater than or equal to 2^63 (nearest double greater than long.MaxValue)
vcmpge_oqsd k1, xmm0, qword ptr [@RWD00]

; set all bits of xmm0 to 1
vpcmpeqd xmm0, xmm0, xmm0

; if the low bit of k1 is set (meaning overflow), set the value to xmm0 >>> 1 (0x7FFFFFFFFFFFFFFF), otherwise take the conversion result
vpsrlq   xmm1 {k1}, xmm0, 1

; extract the two 32-bit halves of the long result
vmovd    eax, xmm1
vpextrd  edx, xmm1, 1

Copilot AI review requested due to automatic review settings March 4, 2026 15:43
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 4, 2026
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 4, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends x86 JIT codegen to hardware-accelerate non-overflow floating→long/ulong casts using AVX-512 and AVX10.2, completing the remaining cast-acceleration work pulled from #116805.

Changes:

  • Teach cast helper selection to allow floating↔long casts to stay intrinsic-based on x86 when AVX-512 is available.
  • Add/extend x86 long decomposition logic to generate AVX-512/AVX10.2 sequences for floating→long/ulong and long→floating casts.
  • Introduce a new AVX-512 scalar compare-mask intrinsic and wire it up for immediate bounds + containment.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/coreclr/jit/lowerxarch.cpp Refactors vector constant construction and adds containment support for the new AVX-512 scalar compare-mask intrinsic.
src/coreclr/jit/hwintrinsicxarch.cpp Adds immediate upper-bound handling for the new AVX-512 scalar compare-mask intrinsic.
src/coreclr/jit/hwintrinsiclistxarch.h Introduces AVX512.CompareScalarMask as a new intrinsic mapping to vcmpss/vcmpsd with IMM.
src/coreclr/jit/flowgraph.cpp Updates helper-requirement logic so x86 floating↔long casts can avoid helper calls when AVX-512 is available.
src/coreclr/jit/decomposelongs.cpp Implements the AVX-512/AVX10.2-based lowering/decomposition sequences for floating↔long/ulong on x86.

Copilot AI review requested due to automatic review settings March 4, 2026 16:08
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@saucecontrol saucecontrol marked this pull request as ready for review March 4, 2026 19:31
Copilot AI review requested due to automatic review settings March 4, 2026 19:31
@saucecontrol
Copy link
Member Author

@dotnet/jit-contrib this is ready for review

diffs

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants