Skip to content

JIT/x64: Route FMA embedded-rounding fallback through genFmaIntrinsic#128855

Merged
tannergooding merged 3 commits into
mainfrom
copilot/fix-avx512-fma-register-assertion
Jun 2, 2026
Merged

JIT/x64: Route FMA embedded-rounding fallback through genFmaIntrinsic#128855
tannergooding merged 3 commits into
mainfrom
copilot/fix-avx512-fma-register-assertion

Conversation

Copilot AI commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

AVX-512 FMA intrinsics with a non-immediate embedded-rounding operand hit Assertion failed '(op3Reg != targetReg) || (op1Reg == targetReg)' in genHWIntrinsic_R_R_R_RM under jitstress2_jitstressregs0x80 (recurrence of #128544). The FMA case in genNonTableDrivenHWIntrinsicsJumpTableFallback emitted the 213 form directly, skipping the operand swapping / target-register preferencing that genFmaIntrinsic performs. When LSRA preferenced op2/op3 onto targetReg, the leading mov targetReg, op1Reg clobbered a live source — exactly the alias the assert guards against.

Changes (src/coreclr/jit/hwintrinsiccodegenxarch.cpp)

  • Make genFmaIntrinsic re-entrant from a jump-table lambda. Lifted genConsumeMultiOpOperands / genProduceReg out of genFmaIntrinsic into its caller in genAvxFamilyIntrinsic. The remaining swap+emit is deterministic on register numbers and emits a single instruction per call, so it is safe to invoke once per rounding-mode case.
  • Route the FMA fallback through genFmaIntrinsic. Replaced the manual genHWIntrinsic_R_R_R_RM(ins, attr, targetReg, op1Reg, op2Reg, op3, …) in the NI_AVX512_FusedMultiply* arm of genNonTableDrivenHWIntrinsicsJumpTableFallback with genFmaIntrinsic(node, newInstOptions). The outer embedded-rounding path already brackets this fallback with consume/produce, so no extra calls are needed inside the lambda. Asserts that op1/op2/op3 are non-contained are kept as defensive checks against the embedded-rounding R-R-R invariant.
auto emitSwCase = [&](int8_t i) {
    insOpts newInstOptions = AddEmbRoundingMode(instOptions, i);
    genFmaIntrinsic(node, newInstOptions);  // was: genHWIntrinsic_R_R_R_RM(ins, attr, targetReg, op1Reg, op2Reg, op3, newInstOptions)
};

The 132/213/231-form selection in genFmaIntrinsic's no-contained branch now applies on the non-immediate-rounding path as well, eliminating the source-register clobber.

Copilot AI review requested due to automatic review settings June 1, 2026 15:05
Copilot AI review requested due to automatic review settings June 1, 2026 15:05
Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>
Copilot AI requested review from Copilot and removed request for Copilot June 1, 2026 15:15
@tannergooding tannergooding marked this pull request as ready for review June 1, 2026 15:16
Copilot AI review requested due to automatic review settings June 1, 2026 15:16
Copilot AI changed the title [WIP] Fix JIT assertion failure in AVX-512 FMA instructions JIT/x64: Route FMA embedded-rounding fallback through genFmaIntrinsic Jun 1, 2026
@tannergooding

Copy link
Copy Markdown
Member

/azp run runtime-coreclr jitstress2-jitstressregs

@azure-pipelines

Copy link
Copy Markdown
Azure Pipelines successfully started running 1 pipeline(s).

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an x64 JIT assertion in the AVX-512 FMA embedded-rounding non-immediate fallback path by ensuring codegen routes through the existing FMA helper that performs operand swapping / form selection, instead of emitting a fixed instruction form that can clobber a live source when registers alias under stress.

Changes:

  • Route the AVX-512 FMA embedded-rounding jump-table fallback through genFmaIntrinsic(...) instead of directly emitting the 213 form.
  • Make genFmaIntrinsic usable from the jump-table emission by moving genConsumeMultiOpOperands / genProduceReg responsibility to callers (and updating the normal AVX-family path accordingly).
  • Add defensive assertions in the embedded-rounding FMA fallback that operands are not contained.

Comment thread src/coreclr/jit/hwintrinsiccodegenxarch.cpp
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 1, 2026 15:35

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@github-actions github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 1, 2026
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tannergooding tannergooding requested a review from EgorBo June 1, 2026 19:54

@tannergooding tannergooding left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @dotnet/jit-contrib, @EgorBo for secondary review. This resolves a jitstress2-jitstressregs failure where NI_AVX512 with embedded rounding was not handling the register preferencing correctly. The bug is resolved by reusing the existing genFmaIntrinsic path.

@tannergooding tannergooding merged commit 9d05a92 into main Jun 2, 2026
137 of 140 checks passed
@tannergooding tannergooding deleted the copilot/fix-avx512-fma-register-assertion branch June 2, 2026 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ci-scan] Test failure: Avx512F.FusedMultiplyAddNegated register assertion in jitstress2-jitstressregs

4 participants