Skip to content

Conversation

@jonathandavies-arm
Copy link
Contributor

@jonathandavies-arm jonathandavies-arm commented Oct 23, 2025

In lowering change a down cast and right shift into a mov w0, wzr if the shift amount is constant and greater & equal to the size of the downcast type. e.g.

static int CastASR8_byte_int(byte x)
{
    //ARM64-FULL-LINE: mov {{w[0-9]+}}, wzr
    return x >> 8;
}

assembly changes from

uxtb    w0, w0
asr     w0, w0, #8

to

mov     w0, wzr

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 23, 2025
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 23, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jonathandavies-arm
Copy link
Contributor Author

Please can I have a review? @dotnet/arm64-contrib @EgorBo

@SwapnilGaikwad
Copy link
Contributor

cc: @a74nh @JulieLeeMSFT

@EgorBo
Copy link
Member

EgorBo commented Dec 3, 2025

/azp run Fuzzlyn

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@a74nh
Copy link
Contributor

a74nh commented Dec 9, 2025

/azp run Fuzzlyn

Looks like Fuzzlyn got stuck?

@a74nh
Copy link
Contributor

a74nh commented Dec 17, 2025

Could someone run fuzzlyn again on this please. Previous one got cancelled by the CI, I think.

if (!cast->isContained() && !cast->IsRegOptional() && !cast->gtOverflow() &&
// Smaller CastOp is most likely an IND(X) node which is lowered to a zero-extend load
cast->CastOp()->TypeIs(TYP_LONG, TYP_INT))
// Try to recognize right shift with a CAST node that is equivilent to mov #0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization should be useful for all platforms. Why restrict it to Arm64?

This would also likely be more beneficial if implemented in morph, where it could enable further downstream optimizations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization should be useful for all platforms. Why restrict it to Arm64?

This would also likely be more beneficial if implemented in morph, where it could enable further downstream optimizations.

Agreed. There is nothing fundamentally architecture specific here, just replacing an overflowing shift with zero.

My only concern would be if for some reason the casts weren't being introduced until after all the morph passes. But, I don't think that's going to happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can always have it in both places, but I do think that morph is the more meaningful location here.

A more comprehensive version of this is #122533, which also handles other optimizations but is also doing it in lowering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked out the work in #122533 and ran my tests in this PR and they don't pass. I think both of these PRs are doing different optimisations. The Fix section in the other PR doesn't describe the situation I'm trying to optimise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the optimisation into morph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants