JIT: Suppress emitting same-reg zero extending move#22454
JIT: Suppress emitting same-reg zero extending move#22454AndyAyersMS merged 2 commits intodotnet:masterfrom
Conversation
Add a peephole optimization to suppress emitting zero extending moves if the previous instruction has already done a suitable zero extension. Only implemented for x64 currently. Closes #21923
|
@BruceForstall PTAL Jit diffs shows: sample diff (on example from #21923) G_M29305_IG02:
mov rax, bword ptr [rdx]
mov edx, dword ptr [rdx+8]
- mov edx, edx
shl rdx, 2
cmp rdx, 0xD1FFAB1E
ja SHORT G_M29305_IG04 |
|
Probably fixes https://github.com/dotnet/coreclr/issues/17963 also? |
|
Although that might need a double look back :-/ |
|
We'd get 2 out of 3, not too bad. movzx r8d,word ptr [rax]
add rax,2
mov r8d,r8d ; pointless movLooking back further is certainly an option. The first prototype (which was almost certainly a bit too aggressive) was getting about 4x as many cases -- it assumed the upstream producer would properly zero extend. |
|
Thanks, Andy. |
src/jit/codegenxarch.cpp
Outdated
| #endif | ||
| { | ||
| #ifdef _TARGET_X64_ | ||
| #ifdef _TARGET_AMD64_ |
There was a problem hiding this comment.
This looks unrelated; separate PR?
There was a problem hiding this comment.
Removed it from this PR.
src/jit/codegenxarch.cpp
Outdated
| #ifdef _TARGET_64BIT_ | ||
| case GenIntCastDesc::ZERO_EXTEND_INT: | ||
| // We can skip emitting this zero extending move if the previous instruction zero extended implicitly | ||
| if ((srcReg == dstReg) && (emit->emitCurIGinsCnt > 0) && compiler->opts.OptimizationEnabled()) |
There was a problem hiding this comment.
I'm not fond of leaking emitLastIns out of the emitter. Maybe this should call a function like emitIsLastInsCall(), say emitIsLastInsZeroExtendingWrite(srcReg). Then this code could be:
canSkip = (srcReg == dstReg) && compiler->opts.OptimizationEnabled() && emit->emitIsLastInsZeroExtendingWrite(srcReg);
There was a problem hiding this comment.
emitAreUpper32bitZero might be better, it doesn't matter how they ended up being zero, just that they are.
There was a problem hiding this comment.
Agree -- I'll go with that.
src/jit/emitxarch.cpp
Outdated
| return ((CodeGenInterface::instInfo[ins] & INS_Flags_IsDstSrcSrcAVXInstruction) != 0) && IsAVXInstruction(ins); | ||
| } | ||
|
|
||
| bool emitter::doesZeroExtendingWrite(instrDesc* id, regNumber reg) |
src/jit/codegenxarch.cpp
Outdated
| #endif | ||
| { | ||
| #ifdef _TARGET_X64_ | ||
| #ifdef _TARGET_AMD64_ |
There was a problem hiding this comment.
Forgot that was in there.
src/jit/emitxarch.cpp
Outdated
| case IF_RWR_ARD: | ||
|
|
||
| // Can't rely on a "small" movsx as we will over-extend to 8 bytes | ||
| return (id->idIns() != INS_movsx) && (id->idReg1() == reg) && (id->idOpSize() != EA_8BYTE); |
There was a problem hiding this comment.
The size check doesn't seem right - x64 only zero extends 32 bit register writes, mov al, 42 does not zero extend. So the check should be == EA_4BYTE, though that will probably require special casing for movzx where size has a different meaning.
There was a problem hiding this comment.
Good point.
For movzx you mean allow both 4 and 8 byte sizes...?
There was a problem hiding this comment.
For movzx and movsx the size indicates the source operand size - 1 or 2 bytes - rather than the destination operand size. So movzx always zeroes out the upper 32 bit, no matter the size.
It's a bit unfortunate. I'm considering changing movzx/movsx to movzxb/movzxw/movsxb/movsxw in the future, that might simplify things and perhaps allow more code to be shared with ARM.
|
Updated. Generates same diffs as the first version. |
|
OSX jenkins hangup, retrying @dotnet-bot retest OSX10.12 x64 Checked Innerloop Build and Test |
…2454) Add a peephole optimization to suppress emitting zero extending moves if the previous instruction has already done a suitable zero extension. Only implemented for x64 currently. Closes dotnet/coreclr#21923 Commit migrated from dotnet/coreclr@d5f638a
Add a peephole optimization to suppress emitting zero extending moves
if the previous instruction has already done a suitable zero extension.
Only implemented for x64 currently.
Closes #21923