Describe the issue:
A simple program which calls np.square twice on the exact same input vector produces different results for two consecutive calls.
IIUC (which I'm not sure about) this happens because:
- The first result vector allocated for
np.square gets to sit close the input vector allocated briefly before it, which makes it fail the is_mem_overlap check and fall back to CDOUBLE_square's loop_scalar
loop_scalar is plain C code which may or may not use fused-multiply-add depending on compiler versions or options
- The simd code is used to produce the second result, and it does use fused-multiply-add regardless of compilers
Reproduce the code example:
import numpy as np
vec = np.array([-5.171866611150749e-07 + 2.5618634555957426e-07j, 0, 0])
def compute():
return np.square(vec)
first_res = compute()
second_res = compute()
print(
"Results are consistent."
if (first_res == second_res).all()
else "INCONSISTENT!"
)
print("Difference:", second_res - first_res)
Error message:
INCONSISTENT!
Difference: [2.5243549e-29+0.j 0.0000000e+00+0.j 0.0000000e+00+0.j]
Python and NumPy Versions:
1.26.4
3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]
Runtime Environment:
[{'numpy_version': '1.26.4',
'python': '3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 '
'(clang-1500.3.9.4)]',
'uname': uname_result(system='Darwin', node='Sounds-MacBook-Pro.local', release='23.4.0', version='Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000', machine='arm64')},
{'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
'found': ['ASIMDHP'],
'not_found': ['ASIMDFHM']}}]
Context for the issue:
- While dormant in numpy-2.0.0, I suspect this bug may reappear
- Perhaps my example or something similar could be added as a test to verify it stays fixed
- I suspect this issue affects more operations. IIRC I originally found it with complex vectors multiplication but minimized the example to this one.
- For info into my investigation see https://github.com/yairchu/numpy-floats-bug
- I failed building numpy from source to match the pip install version, so I analyzed what's going on by having a look on the assembly (my first time reading arm assembly!)
Describe the issue:
A simple program which calls
np.squaretwice on the exact same input vector produces different results for two consecutive calls.IIUC (which I'm not sure about) this happens because:
np.squaregets to sit close the input vector allocated briefly before it, which makes it fail theis_mem_overlapcheck and fall back toCDOUBLE_square'sloop_scalarloop_scalaris plain C code which may or may not use fused-multiply-add depending on compiler versions or optionsReproduce the code example:
Error message:
INCONSISTENT! Difference: [2.5243549e-29+0.j 0.0000000e+00+0.j 0.0000000e+00+0.j]Python and NumPy Versions:
1.26.4
3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]
Runtime Environment:
[{'numpy_version': '1.26.4',
'python': '3.12.4 (main, Jun 6 2024, 18:26:44) [Clang 15.0.0 '
'(clang-1500.3.9.4)]',
'uname': uname_result(system='Darwin', node='Sounds-MacBook-Pro.local', release='23.4.0', version='Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000', machine='arm64')},
{'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
'found': ['ASIMDHP'],
'not_found': ['ASIMDFHM']}}]
Context for the issue: