JIT/x64: Route FMA embedded-rounding fallback through genFmaIntrinsic#128855
Conversation
Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>
|
/azp run runtime-coreclr jitstress2-jitstressregs |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
This PR fixes an x64 JIT assertion in the AVX-512 FMA embedded-rounding non-immediate fallback path by ensuring codegen routes through the existing FMA helper that performs operand swapping / form selection, instead of emitting a fixed instruction form that can clobber a live source when registers alias under stress.
Changes:
- Route the AVX-512 FMA embedded-rounding jump-table fallback through
genFmaIntrinsic(...)instead of directly emitting the 213 form. - Make
genFmaIntrinsicusable from the jump-table emission by movinggenConsumeMultiOpOperands/genProduceRegresponsibility to callers (and updating the normal AVX-family path accordingly). - Add defensive assertions in the embedded-rounding FMA fallback that operands are not contained.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
tannergooding
left a comment
There was a problem hiding this comment.
CC. @dotnet/jit-contrib, @EgorBo for secondary review. This resolves a jitstress2-jitstressregs failure where NI_AVX512 with embedded rounding was not handling the register preferencing correctly. The bug is resolved by reusing the existing genFmaIntrinsic path.
AVX-512 FMA intrinsics with a non-immediate embedded-rounding operand hit
Assertion failed '(op3Reg != targetReg) || (op1Reg == targetReg)'ingenHWIntrinsic_R_R_R_RMunderjitstress2_jitstressregs0x80(recurrence of #128544). The FMA case ingenNonTableDrivenHWIntrinsicsJumpTableFallbackemitted the 213 form directly, skipping the operand swapping / target-register preferencing thatgenFmaIntrinsicperforms. When LSRA preferencedop2/op3ontotargetReg, the leadingmov targetReg, op1Regclobbered a live source — exactly the alias the assert guards against.Changes (
src/coreclr/jit/hwintrinsiccodegenxarch.cpp)genFmaIntrinsicre-entrant from a jump-table lambda. LiftedgenConsumeMultiOpOperands/genProduceRegout ofgenFmaIntrinsicinto its caller ingenAvxFamilyIntrinsic. The remaining swap+emit is deterministic on register numbers and emits a single instruction per call, so it is safe to invoke once per rounding-mode case.genFmaIntrinsic. Replaced the manualgenHWIntrinsic_R_R_R_RM(ins, attr, targetReg, op1Reg, op2Reg, op3, …)in theNI_AVX512_FusedMultiply*arm ofgenNonTableDrivenHWIntrinsicsJumpTableFallbackwithgenFmaIntrinsic(node, newInstOptions). The outer embedded-rounding path already brackets this fallback with consume/produce, so no extra calls are needed inside the lambda. Asserts that op1/op2/op3 are non-contained are kept as defensive checks against the embedded-rounding R-R-R invariant.The 132/213/231-form selection in
genFmaIntrinsic's no-contained branch now applies on the non-immediate-rounding path as well, eliminating the source-register clobber.