Fix for arm64 outerloop jitstress failures in FMA#126434
Fix for arm64 outerloop jitstress failures in FMA#126434dhartglassMSFT merged 2 commits intodotnet:mainfrom
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
Fixes ARM64 JIT codegen for certain HWIntrinsics that use a non-constant immediate (requiring a jump table) and have RMW semantics (or otherwise emit an extra mov), ensuring the jump table entry-size calculation matches the actual number of emitted instructions.
Changes:
- Pass an explicit
numInstrstoHWIntrinsicImmOpHelperfor RMWSIMDByIndexedElement(4-operand) and RMW shift-by-immediate intrinsics to account for the extramovthat may be emitted. - Add debug assertions that the 2-operand and 3-operand
SIMDByIndexedElementimmediate cases are not treated as RMW.
As long as we don't change the code in the emitter, then I'm happy. If you undid the code in these two codegen functions, then there would be a I'm happy with doing it that way. Yes, it's only two functions (and they can't get inlined), but we should set a precedence if we need more in the future. |
|
/azp run runtime-coreclr jitstress2-jitstressregs |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
jitstress regs failures are infrastructure failures osx-arm64 stress passed, at least |
Previous codegen for FMLA, when supplied with a non-constant operand, looked something like: ``` mov v0.8b, v8.8b // fill in RMW operand br x20 fmla v0.2s, v9.2s, v10.s[0] b G_M19624_IG23 fmla v0.2s, v9.2s, v10.s[1] b G_M19624_IG23 // and so on for each of the immediates ``` With the recent refactor to move the mov-emit-for-RMW down into the emit routines, the mov is emitted instead when codegen happens for each arm of the jump table: ``` br x20 mov v0.8b, v8.8b // fill in RMW operand fmla v0.2s, v9.2s, v10.s[0] b G_M19624_IG24 mov v0.8b, v8.8b // fill in RMW operand fmla v0.2s, v9.2s, v10.s[1] b G_M19624_IG24 // etc... ``` Bug happens because the jump table builder thought each case would contain only 1 instruction, not two. fixes dotnet#126379
Previous codegen for FMLA, when supplied with a non-constant operand, looked something like:
With the recent refactor to move the mov-emit-for-RMW down into the emit routines, the mov is emitted instead when codegen happens for each arm of the jump table:
Bug happens because the jump table builder thought each case would contain only 1 instruction, not two.
@a74nh Another option to fix this is to undo the refactor just for this case, and not duplicate the mov in each branch. This would yield slightly smaller codesize. I think it shouldn't matter much overall because the size difference is small, and the non-constant-operand case is rare. Let me know if we should go this route instead
fixes #126379