BUG: Compiler-options-dependent bug in np.square for complex numbers affecting numpy-1.26.4 on macOS on ARM

### Describe the issue:

A simple program which calls `np.square` twice on the exact same input vector produces different results for two consecutive calls.

IIUC (which I'm not sure about) this happens because:

* The first result vector allocated for `np.square` gets to sit close the input vector allocated briefly before it, which makes it fail the `is_mem_overlap` check and fall back to `CDOUBLE_square`'s `loop_scalar`
* `loop_scalar` is plain C code which may or may not use fused-multiply-add depending on compiler versions or options
* The simd code is used to produce the second result, and it does use fused-multiply-add regardless of compilers

### Reproduce the code example:

```python
import numpy as np

vec = np.array([-5.171866611150749e-07 + 2.5618634555957426e-07j, 0, 0])

def compute():
    return np.square(vec)

first_res = compute()
second_res = compute()

print(
    "Results are consistent."
    if (first_res == second_res).all()
    else "INCONSISTENT!"
)
print("Difference:", second_res - first_res)
```


### Error message:

```shell
INCONSISTENT!
Difference: [2.5243549e-29+0.j 0.0000000e+00+0.j 0.0000000e+00+0.j]
```


### Python and NumPy Versions:

1.26.4
3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 (clang-1500.3.9.4)]

### Runtime Environment:

[{'numpy_version': '1.26.4',
  'python': '3.12.4 (main, Jun  6 2024, 18:26:44) [Clang 15.0.0 '
            '(clang-1500.3.9.4)]',
  'uname': uname_result(system='Darwin', node='Sounds-MacBook-Pro.local', release='23.4.0', version='Darwin Kernel Version 23.4.0: Fri Mar 15 00:10:42 PDT 2024; root:xnu-10063.101.17~1/RELEASE_ARM64_T6000', machine='arm64')},
 {'simd_extensions': {'baseline': ['NEON', 'NEON_FP16', 'NEON_VFPV4', 'ASIMD'],
                      'found': ['ASIMDHP'],
                      'not_found': ['ASIMDFHM']}}]

### Context for the issue:

* While dormant in numpy-2.0.0, I suspect this bug may reappear
  * Perhaps my example or something similar could be added as a test to verify it stays fixed
* I suspect this issue affects more operations. IIRC I originally found it with complex vectors multiplication but minimized the example to this one.
* For info into my investigation see https://github.com/yairchu/numpy-floats-bug
  * I failed building numpy from source to match the pip install version, so I analyzed what's going on by having a look on the assembly (my first time reading arm assembly!)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Compiler-options-dependent bug in np.square for complex numbers affecting numpy-1.26.4 on macOS on ARM #26940

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

BUG: Compiler-options-dependent bug in np.square for complex numbers affecting numpy-1.26.4 on macOS on ARM #26940

Description

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions