-
Notifications
You must be signed in to change notification settings - Fork 5.3k
arm64: Replace RSH/RSZ -> CAST nodes with clearing register #121007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
arm64: Replace RSH/RSZ -> CAST nodes with clearing register #121007
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
Please can I have a review? @dotnet/arm64-contrib @EgorBo |
|
cc: @a74nh @JulieLeeMSFT |
|
/azp run Fuzzlyn |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Looks like Fuzzlyn got stuck? |
|
Could someone run fuzzlyn again on this please. Previous one got cancelled by the CI, I think. |
src/coreclr/jit/lower.cpp
Outdated
| if (!cast->isContained() && !cast->IsRegOptional() && !cast->gtOverflow() && | ||
| // Smaller CastOp is most likely an IND(X) node which is lowered to a zero-extend load | ||
| cast->CastOp()->TypeIs(TYP_LONG, TYP_INT)) | ||
| // Try to recognize right shift with a CAST node that is equivilent to mov #0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization should be useful for all platforms. Why restrict it to Arm64?
This would also likely be more beneficial if implemented in morph, where it could enable further downstream optimizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization should be useful for all platforms. Why restrict it to Arm64?
This would also likely be more beneficial if implemented in morph, where it could enable further downstream optimizations.
Agreed. There is nothing fundamentally architecture specific here, just replacing an overflowing shift with zero.
My only concern would be if for some reason the casts weren't being introduced until after all the morph passes. But, I don't think that's going to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always have it in both places, but I do think that morph is the more meaningful location here.
A more comprehensive version of this is #122533, which also handles other optimizations but is also doing it in lowering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked out the work in #122533 and ran my tests in this PR and they don't pass. I think both of these PRs are doing different optimisations. The Fix section in the other PR doesn't describe the situation I'm trying to optimise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved the optimisation into morph.
In lowering change a down cast and right shift into a mov w0, wzr if the shift amount is constant and greater & equal to the size of the downcast type. e.g.
assembly changes from
to