MAINT: Use fused-multiply-add for complex numbers calculations#26956
MAINT: Use fused-multiply-add for complex numbers calculations#26956yairchu wants to merge 3 commits into
Conversation
|
Looking at the CI it appears that on some environments Would it be appropriate to skip the new test on such platforms? If so, what is the best way of doing that? |
I think that'd be the appropriate action to take. The baseline build of NumPy targets the minimum supported processor, so may not have FMA instructions. We detect if we can run a more optimal version and dynamically dispatch it. If no hardware instruction is available, you might be able to use |
I don't know. Since this seems to be distinct from capabilities (if the test fails, the SIMD code is used and uses FMA), so it is a compiler decision to not fuse it for speed probably. We should be able to ensure FMA is used if SIMD uses it, I guess speed may be mostly irrelevant (the slow loop is relatively rarely never used), but dunno. |
|
You're right that no complete cross-platform promise is gained by best effort to use As for the option of using SIMD code for scalar cases - wouldn't that make things slower? |
My understanding was that
It might be slower, but I think computing the value for a scalar would hardly show up on perf let alone be a bottleneck for a program (unless someone is looping over an array, in which case I argue they are using array processing incorrectly). |
If that's the case then perhaps those CI tests failed due to the scalar version being more accurate than the SIMD one! From @seberg's comment I infer that whether
Looking at the test output to figure out which is Would it generally be a good idea for |
63fc730 to
b04949e
Compare
Thanks, @r-devulap. My assumption was that it would fall back to a naive implementation, but this indicates otherwise.
Given what @r-devulap has said, I think they're both correct. I'm guessing using |
|
@yairchu could you try changing some of the remaining failures on 32-bit from |
…ltiplication The test would only fail under the following conditions: * replace the added assert_array_almost_equal_nulp with assert_equal * compile numpy with "-ffp-contract=off" flags for clang (or equiavlent) * which was the case for numpy<2 on macOS
…6940 and and numpy#26740) * numpy#26940 already fixed in same PR by nomemoverlap fix (failure depended on existence of two problems) * For numpy#26740 only consistency within numpy is fixed. Regular complex in Python may still have different results Currently, fused-multipliy-add is typically used everywhere in numpy, depending on the compiler. But some compilers may choose not to use fma, as is the case in numpy-1.26.4 (and any <2 I believe) on macOS on ARM. This commit uses the fma functions from math.h instead of relying on the compiler to decide to use them.
@Mousius done! The currently failing test appears to be an unrelated CI failure for |
|
@yairchu the 32-bit test still fails. |
|
This seems to have stranded, so I'm going to close it. We can re-open if you still want to work on this @yairchu |
|
@jorenham as this is fixing a real bug, I would still want to work on this, but unfortunately not in the near future. Perhaps I could look again in a few months and understand what was blocking this fix and address it. |
I'm glad to hear that; reopened |
Background:
This PR makes basic calculations for complex numbers always use fused-multiply-add, using the
fmafunctions from<math.h>, and it also resolves an additional minor off-by-one bug which contributed to the inconsistency #26940