Skip to content

Conversation

@Lotendan
Copy link
Contributor

Extend the "skip-test" optimization to all instructions that modify the same flags as "TEST" x86 instruction

Originally wanted to only cover the "POPCNT" case, but actually we can go further than that and cover more instructions.

Fixes #118811

Using the following code:

static void M1(int value)
{
    if (int.PopCount(value) > 0) 
        throw null!;
}

This fix reduces this assembly:

popcnt   edi, edi
test     edi, edi
jg       SHORT G_M9581_IG04

to this assembly:

popcnt    edi, edi
jg       SHORT G_M000_IG04

Note: this is not exactly as was intended in the issue, where a jne was expected. But that should be good enough.
Note2: this does not affect tnzcnt or lzcnt instructions as these instructions do not modify the flags.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 19, 2025
@Lotendan
Copy link
Contributor Author

@dotnet-policy-service agree

@JulieLeeMSFT
Copy link
Member

@kg, PTAL.

@kg
Copy link
Member

kg commented Nov 11, 2025

This LGTM but will need to pass CI before it can be merged. Let me know if you need help figuring out why the runtime tests lanes are red.

@saucecontrol
Copy link
Member

@Lotendan to elaborate on the above, technically, this code:

popcnt    edi, edi
jg       SHORT G_M000_IG04

is an invalid replacement for

popcnt   edi, edi
test     edi, edi
jg       SHORT G_M000_IG04

because the latter sets SF explicitly before checking it with jg, while the former clears the flag unconditionally and still checks its value.

In the case of popcnt, the result can never be negative, so the value of SF would be zero regardless of whether it was unconditionally cleared or set according to the result value. That means the invalid optimization won't actually break in this case, but it could with other instructions that get caught by the same logic.

If you instead introduce an assertion (see assertionprop.cpp) that the value is never negative, JIT can transform the compare from > 0 to != 0, which results in this valid code:

popcnt   edi, edi
jne      SHORT G_M000_IG04

Ex: https://godbolt.org/z/Mzn4Kbdas

@Lotendan
Copy link
Contributor Author

Thank you for the hint about the assertions. Very helpful as I'm totally new to this code base.
As far as I understand your suggestion it means we need two steps:

  1. skip emitting a test instruction. If not done with flags, is that acceptable to remain conservative and "hardcode" the instruction? e.g. if POPCNT.
  2. use assertions to maybe modify the GenTree and transform a >0 into a !=0 in case we know the result is always positive. I have browsed assertionprop.cpp quickly and intend to use optAssertionProp. Any hint about where this code best belongs?

@saucecontrol
Copy link
Member

saucecontrol commented Nov 13, 2025

If you add the assertion, you don't have to worry about removing the test -- that will happen automagically, because JIT will see that the value can't be negative and that > 0 is the same as the simpler != 0. And since popcnt sets ZF, JIT already knows the flags are correctly set for jne.

This is actually already handled for explicit calls to Popcnt.PopCount as shown here: https://godbolt.org/z/3GEhaen5E

I believe all you need to do is duplicate this range assertion:

case NI_AVX2_LeadingZeroCount:
case NI_AVX2_TrailingZeroCount:
case NI_AVX2_X64_LeadingZeroCount:
case NI_AVX2_X64_TrailingZeroCount:
case NI_X86Base_PopCount:
case NI_X86Base_X64_PopCount:
// Note: No advantage in using a precise range for IntegralRange.
// Example: IntCns = 42 gives [0..127] with a non -precise range, [42,42] with a precise range.
return {SymbolicIntegerValue::Zero, SymbolicIntegerValue::ByteMax};

for NI_PRIMITIVE_PopCount (and LeadingZeroCount + TrailingZeroCount)

@saucecontrol
Copy link
Member

saucecontrol commented Nov 14, 2025

@Lotendan I gave this a closer look today, and I have a couple of corrections

  1. Apparently, I hallucinated the optimization that would convert > 0 to != 0 with a range assertion. The reason Popcnt.PopCount emits optimal code today is that it has an unsigned return type. [unsigned] > 0 is optimized to != 0 here:

else if (cmp->IsUnsigned())
{
if ((oper == GT_LE) || (oper == GT_GT))
{
if (op2Value == 0)
{
// IL doesn't have a cne instruction so compilers use cgt.un instead. The JIT
// recognizes certain patterns that involve GT_NE (e.g (x & 4) != 0) and fails
// if GT_GT is used instead. Transform (x GT_GT.unsigned 0) into (x GT_NE 0)
// and (x GT_LE.unsigned 0) into (x GT_EQ 0). The later case is rare, it sometimes
// occurs as a result of branch inversion.
oper = (oper == GT_LE) ? GT_EQ : GT_NE;
cmp->gtFlags &= ~GTF_UNSIGNED;
}

  1. The range assertion I pointed out above already applies to NI_PRIMITIVE_PopCount because it is converted to NI_X86Base_PopCount (on xarch) on import. You can see that in action with:
static void M2(int value)
{
    if (int.PopCount(value) < 0) 
        throw null!;
}

which already compiles to just

C:M2(int) (FullOpts):
       ret  

because JIT is already seeing that the value can't be negative.

So, if you were to add the missing > 0 to != 0 optimization for known non-negative signed values (it could be added in morph as above), that should result in more globally useful improvements.

@Lotendan
Copy link
Contributor Author

Lotendan commented Nov 15, 2025

Your intuition was good and gave me the right tools to progress on this bugfix.
At this code location, for a signed integer the expression cmp->IsUnsigned() evaluates to false.
However, if the compare is a signed compare and the operand is known to be never negative, as is the case with POPCNT, then we can also trigger the optimization.
With this done I have compiled these 3 functions below:

public static void M1(int value)
{
    if (int.PopCount(value) > 0) 
        throw null!;
}
public static void M2(uint value)
{
    if (uint.PopCount(value) > 0)
        throw null!;
}
public static void M3(int value)
{
    if (Popcnt.PopCount((uint) value) > 0) 
        throw null!;
}

And they all end up as:

IN0001: 000003 popcnt   ecx, ecx
IN0002: 000007 jne      SHORT G_M1455_IG04

Replacing with LeadingZeroCount everywhere leads to:

IN0001: 000003 lzcnt    ecx, ecx
IN0002: 000007 jne      SHORT G_M1455_IG04

Same for TrailingZeroCount

IN0001: 000003 tzcnt    ecx, ecx
IN0002: 000007 jne      SHORT G_M28251_IG04

@saucecontrol
Copy link
Member

Nice! I suspected this might catch Span.Length > 0 as well, and the diffs show a lot of hits for that on Arm64. ex:

-            cmp     w1, #0
-            bgt     G_M13911_IG05
+            cbnz    w1, G_M13911_IG05

@kg sorry to step on your toes on this one. Can you give it another look?

@kg
Copy link
Member

kg commented Nov 17, 2025

The new changes LGTM. Thanks for providing guidance @saucecontrol , I appreciate having someone who knows the JIT well chime in.

@Lotendan
Copy link
Contributor Author

@kg what are the next steps then to merge this PR? :)

@kg
Copy link
Member

kg commented Nov 19, 2025

Now that checks have passed I'm going to get the OK to merge it since I'm new to the JIT. I'm not sure how long that will take, thanks for your patience!

@kg
Copy link
Member

kg commented Nov 20, 2025

@dotnet/jit-contrib PTAL

@Lotendan
Copy link
Contributor Author

Thanks, I have applied the suggestions.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this improvement.

@adamperlin
Copy link
Contributor

@Lotendan can you update this branch and re-run CI?

@Lotendan
Copy link
Contributor Author

Rebased this branch on the main branch :)

@adamperlin
Copy link
Contributor

@Lotendan I think we are good to merge this! You can do the honors if you'd like! Thank you for this contribution!

@Lotendan
Copy link
Contributor Author

Thanks @adamperlin ! I don’t have merge permissions in this repo, so feel free to merge it when convenient ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redundant test after PopCount

6 participants