The IR snippet in https://godbolt.org/z/j9h98rcKT produces incorrect results with -mcpu=neoverse-v2, but works correctly with plain NEON. It looks like the major difference is that llvm.experimental.cttz.elts.i64.v4i1 is lowered to SVE instructions, so I suspect this lowering is does not match the definition of the intrinsic.
To reproduce end-to-end, check out aac5f40ab2fe91418e8727d4276bdcb5b08e1a70 from
and build with -mcpu=neoverse-v2 + run SingleSource/UnitTests/Vectorizer/early-exit
This should print a mismatch:
Checking early_exit_find_step_1
Miscompare for interleave-forced: 4 != 8
The IR snippet in https://godbolt.org/z/j9h98rcKT produces incorrect results with
-mcpu=neoverse-v2, but works correctly with plain NEON. It looks like the major difference is thatllvm.experimental.cttz.elts.i64.v4i1is lowered to SVE instructions, so I suspect this lowering is does not match the definition of the intrinsic.To reproduce end-to-end, check out aac5f40ab2fe91418e8727d4276bdcb5b08e1a70 from
and build with
-mcpu=neoverse-v2+ runSingleSource/UnitTests/Vectorizer/early-exitThis should print a mismatch: