Fix for arm64 outerloop jitstress failures in FMA by dhartglassMSFT · Pull Request #126434 · dotnet/runtime

dhartglassMSFT · 2026-04-01T22:36:02Z

Previous codegen for FMLA, when supplied with a non-constant operand, looked something like:

mov     v0.8b, v8.8b // fill in RMW operand
br      x20
fmla    v0.2s, v9.2s, v10.s[0]
b       G_M19624_IG23
fmla    v0.2s, v9.2s, v10.s[1]
b       G_M19624_IG23
// and so on for each of the immediates

With the recent refactor to move the mov-emit-for-RMW down into the emit routines, the mov is emitted instead when codegen happens for each arm of the jump table:

br      x20
mov     v0.8b, v8.8b                    // fill in RMW operand
fmla    v0.2s, v9.2s, v10.s[0]
b       G_M19624_IG24
mov     v0.8b, v8.8b                   // fill in RMW operand
fmla    v0.2s, v9.2s, v10.s[1]
b       G_M19624_IG24
// etc...

Bug happens because the jump table builder thought each case would contain only 1 instruction, not two.

@a74nh Another option to fix this is to undo the refactor just for this case, and not duplicate the mov in each branch. This would yield slightly smaller codesize. I think it shouldn't matter much overall because the size difference is small, and the non-constant-operand case is rare. Let me know if we should go this route instead

fixes #126379

dotnet-policy-service · 2026-04-01T22:37:11Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

Fixes ARM64 JIT codegen for certain HWIntrinsics that use a non-constant immediate (requiring a jump table) and have RMW semantics (or otherwise emit an extra mov), ensuring the jump table entry-size calculation matches the actual number of emitted instructions.

Changes:

Pass an explicit numInstrs to HWIntrinsicImmOpHelper for RMW SIMDByIndexedElement (4-operand) and RMW shift-by-immediate intrinsics to account for the extra mov that may be emitted.
Add debug assertions that the 2-operand and 3-operand SIMDByIndexedElement immediate cases are not treated as RMW.

src/coreclr/jit/hwintrinsiccodegenarm64.cpp

a74nh · 2026-04-02T07:38:15Z

@a74nh Another option to fix this is to undo the refactor just for this case, and not duplicate the mov in each branch. This would yield slightly smaller codesize. I think it shouldn't matter much overall because the size difference is small, and the non-constant-operand case is rare. Let me know if we should go this route instead

As long as we don't change the code in the emitter, then I'm happy.

If you undid the code in these two codegen functions, then there would be a mov added if required, and then the call to the emitter would be
emitIns_R_R_R_R_I(ins, emitSize, targetReg, targetReg, op2Reg, op3Reg, elementIndex, opt);
which means the emitter wouldn't emit a movprfx.

I'm happy with doing it that way.

Yes, it's only two functions (and they can't get inlined), but we should set a precedence if we need more in the future.

dhartglassMSFT · 2026-04-06T19:11:44Z

/azp run runtime-coreclr jitstress2-jitstressregs

azure-pipelines · 2026-04-06T19:12:02Z

Azure Pipelines successfully started running 1 pipeline(s).

dhartglassMSFT · 2026-04-07T01:33:51Z

jitstress regs failures are infrastructure failures

osx-arm64 stress passed, at least

Previous codegen for FMLA, when supplied with a non-constant operand, looked something like: ``` mov v0.8b, v8.8b // fill in RMW operand br x20 fmla v0.2s, v9.2s, v10.s[0] b G_M19624_IG23 fmla v0.2s, v9.2s, v10.s[1] b G_M19624_IG23 // and so on for each of the immediates ``` With the recent refactor to move the mov-emit-for-RMW down into the emit routines, the mov is emitted instead when codegen happens for each arm of the jump table: ``` br x20 mov v0.8b, v8.8b // fill in RMW operand fmla v0.2s, v9.2s, v10.s[0] b G_M19624_IG24 mov v0.8b, v8.8b // fill in RMW operand fmla v0.2s, v9.2s, v10.s[1] b G_M19624_IG24 // etc... ``` Bug happens because the jump table builder thought each case would contain only 1 instruction, not two. fixes dotnet#126379

initial impl

e37cbb1

dhartglassMSFT requested review from AndyAyersMS, a74nh and Copilot April 1, 2026 22:36

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 1, 2026

dotnet-policy-service bot assigned dhartglassMSFT Apr 1, 2026

Copilot started reviewing on behalf of dhartglassMSFT April 1, 2026 22:37 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

jitformat

833a03b

a74nh reviewed Apr 2, 2026

View reviewed changes

src/coreclr/jit/hwintrinsiccodegenarm64.cpp Show resolved Hide resolved

AndyAyersMS approved these changes Apr 6, 2026

View reviewed changes

dhartglassMSFT mentioned this pull request Apr 6, 2026

arm64: Refactor mov/movprfx for embedded masked operations #126398

Open

dhartglassMSFT merged commit 95f94b6 into dotnet:main Apr 7, 2026
146 of 147 checks passed

dotnet-maestro bot mentioned this pull request Apr 8, 2026

[main] Source code updates from dotnet/runtime dotnet/dotnet#5926

Merged

github-actions bot mentioned this pull request Apr 9, 2026

[release-notes] .NET 11 Preview 3 jeffhandley/dotnet-core-release-notes#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for arm64 outerloop jitstress failures in FMA#126434

Fix for arm64 outerloop jitstress failures in FMA#126434
dhartglassMSFT merged 2 commits intodotnet:mainfrom
dhartglassMSFT:fix_126379

dhartglassMSFT commented Apr 1, 2026 •

edited

Loading

Uh oh!

dotnet-policy-service bot commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

a74nh commented Apr 2, 2026

Uh oh!

dhartglassMSFT commented Apr 6, 2026

Uh oh!

azure-pipelines bot commented Apr 6, 2026

Uh oh!

dhartglassMSFT commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dhartglassMSFT commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dotnet-policy-service bot commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

a74nh commented Apr 2, 2026

Uh oh!

dhartglassMSFT commented Apr 6, 2026

Uh oh!

azure-pipelines bot commented Apr 6, 2026

Uh oh!

dhartglassMSFT commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dhartglassMSFT commented Apr 1, 2026 •

edited

Loading