ENH: add C99 Annex G recoveries for complex multiplication and division following CPython by ikrommyd · Pull Request #30806 · numpy/numpy

ikrommyd · 2026-02-08T21:29:20Z

Adds C99 Annex G recoveries to complex multiplication and division that would otherwise result in nan + nanj
Closes #30805
Closes #26560
Closes #17425
Closes #30470

See relevant CPython PRs that did these updates for CPython:
python/cpython#120287 and python/cpython#119457

Because multiplication also has a SIMD implementation, we have to adapt that too because otherwise you'd get different answers depending on whether you have SIMD compiled numpy or not or whether you are working with complex64 and complex128 vs complex256 because I only see SIMD implementations for fp32 and fp64 only.

ikrommyd · 2026-02-08T21:32:08Z

Opening it as draft for starters because there are decisions to be made from the maintainers here and it also definitely needs benchmarking since we're touching ufunc loops.

I'd also appreciate some help on the SIMD implementations as I am completely clueless about SIMD apart from knowing what it is so I asked Claude for help on that part :). The rest is all implemented manually although there can be typos because parsing .c.src files with my eyes is a bit difficult.

charris · 2026-02-09T00:03:06Z

@r-devulap for the SIMD.

ikrommyd · 2026-02-09T00:20:30Z

What i'm debating with myself here is whether performance or "correctness" and compliance with the C99 Annex G standard and consistency with CPython is more worth it because the extra if branch in the ufunc loops (marked as NPY_UNLIKELY) will have a performance cost. If maximum performance is the target, then just being inconsistent with CPython and the C99 Annex G may be just fine I guess. In that case the issues this PR fixes should be closed as "won't fix" cause you can't have it both ways I think.

ikrommyd · 2026-02-09T07:23:31Z

The more I think about this and what we have to do to get multiplication consistent with other things that are effectively multiplication like square and power(x, 2), the more I believe it's best to only do this for division and ignore multiplication and powers all together. I think the performance hit that multiplications will get is probably not worth it. Happy to discuss this in a triage or community meeting though.

ngoldbaum · 2026-02-10T17:24:41Z

@WarrenWeckesser @skirpichev ping in case you're interested in this, since you commented on related issues.

skirpichev · 2026-02-10T23:24:49Z

I realize, that this is a draft, but for final version I suggest to split division and multiplication parts.

What i'm debating with myself here is whether performance or "correctness" and compliance with the C99 Annex G standard and consistency with CPython is more worth it because the extra if branch in the ufunc loops (marked as NPY_UNLIKELY) will have a performance cost.

Both for multiplication and division it will cost additional isnan() check in "normal" case. I appreciate benchmarks, that show it's effect. This is something, that was missing in my pr (#26595).

ikrommyd · 2026-02-11T00:49:12Z

Yeah I have done some local benchmarks and have also used the existing ufunc benchmarks in the benchmark folder. I want to write a proper asv benchmark because for local benchmarks I've just been using a single script. For multiplication, the extra isnan check is comparable the actual multiplication in terms of speed cause multiplication is very fast and I've seen significant loss of performance for multiplication even in the nominal case. For division, because it's a more complicated operation, the loss isn't that bad in the nominal case. It can be like 5% maybe 10%. However all that changes if you have actually many values to correct in an array. I've been trying different percentages of "bad" (nan + nanj) values to correct. For multiplication, if for example 50% of your array is bad, the performance hit is massive. I will return here once I have a proper specific asv benchmark and post results for both my x86 linux and arm64 mac machines.

PS. For some existing existing ufunc benchmarks, I'm magically getting 20% improvement between this PR branch and main somehow (consistent across both my machines). I can't explain that yet.

ngoldbaum · 2026-02-11T01:16:26Z

The asv ufunc benchmarks are noisy. If you compare the same commit against itself, you'll see random improvements and performance decreases.

ikrommyd · 2026-02-11T01:31:30Z

When I compare main to main or PR with PR, asv doesn't tell me "significant performance changes" though. When I compare PR branch with main it does.

Anyways until I have an appropriate new benchmark, I've been using this naive in a way script
https://gist.github.com/ikrommyd/102de766bd88b551cb7dc489bb10d9be (courtesy of claude because I'm very lazy when my programming speed is limited by my typing speed)
It makes arrays with different percentages of "bad" numbers to correct and runs the affected ufuncs for different array sizes.
Here are the results for numpy compiled with spin build --clean -- -Dcpu-baseline=none -Dcpu-dispatch=none with python 3.14.3.
MAIN:

======================================================================
  complex128
======================================================================

  [clean]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.430      5.070      3.866      3.641      3.098     10.431     10.353      4.053
    10,000        (100x)        1.935      1.991      0.670      0.755      0.444      3.086      3.164      0.431
    100,000       (10x)         4.871      4.069      0.737      0.714      0.427      3.065      3.131      0.458
    1,000,000     (1x)          5.130      4.678      0.773      0.698      0.445      3.067      3.106      0.442

  [1% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.565      4.968      3.886      3.942      3.095     10.426     10.330      4.178
    10,000        (100x)        1.955      2.037      0.707      0.743      0.441      3.142      3.245      0.447
    100,000       (10x)         4.990      4.131      0.742      0.697      0.429      3.061      3.213      0.432
    1,000,000     (1x)          5.141      4.620      0.699      0.699      0.434      3.119      3.207      0.424

  [5% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.309      5.028      4.453      4.188      3.592     11.000     10.970      4.657
    10,000        (100x)        1.926      2.109      0.663      0.725      0.427      3.202      3.203      0.436
    100,000       (10x)         4.956      4.192      0.764      0.723      0.430      3.183      3.185      0.441
    1,000,000     (1x)          5.149      4.670      0.699      0.691      0.432      3.106      3.184      0.413

  [10% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.307      4.957      4.428      4.472      3.580     10.760     10.639      4.665
    10,000        (100x)        1.941      2.170      0.695      0.736      0.439      3.149      3.232      0.451
    100,000       (10x)         4.962      4.279      0.739      0.721      0.428      3.171      3.244      0.436
    1,000,000     (1x)          5.171      4.712      0.703      0.703      0.433      3.159      3.216      0.463

  [25% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.562      5.220      4.431      4.496      3.611     10.703     10.806      4.610
    10,000        (100x)        1.921      2.391      0.676      0.759      0.430      3.310      3.311      0.453
    100,000       (10x)         4.688      4.310      0.767      0.724      0.414      3.276      3.429      0.487
    1,000,000     (1x)          5.194      4.804      0.739      0.697      0.426      3.311      3.441      0.436

  [50% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.464      6.723      4.418      4.456      3.612     10.960     10.844      4.600
    10,000        (100x)        1.912      2.656      0.715      0.759      0.449      3.326      3.334      0.438
    100,000       (10x)         5.046      4.464      0.789      0.725      0.394      3.411      3.552      0.457
    1,000,000     (1x)          5.302      5.051      0.739      0.699      0.440      3.423      3.568      0.439

vs PR branch:

======================================================================
  complex128
======================================================================

  [clean]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.571      5.871      3.968      3.978      3.430     11.109     11.038      4.422
    10,000        (100x)        1.937      2.993      0.920      0.922      0.760      3.865      3.947      0.811
    100,000       (10x)         4.732      3.946      0.910      0.886      0.742      3.719      3.839      0.788
    1,000,000     (1x)          5.105      4.611      0.899      0.879      0.748      3.758      3.850      0.742

  [1% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.436      5.881      4.079      4.096      3.493     11.396     11.204      4.549
    10,000        (100x)        1.941      3.003      0.948      1.013      0.848      4.119      4.147      0.856
    100,000       (10x)         4.856      3.881      1.014      0.985      0.879      3.989      4.115      0.902
    1,000,000     (1x)          5.406      4.793      1.131      1.037      0.896      4.059      4.131      0.909

  [5% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.512      6.013      4.631      4.566      4.035     11.901     11.902      5.011
    10,000        (100x)        1.978      3.104      1.014      1.022      0.845      4.108      4.160      0.853
    100,000       (10x)         6.094      4.849      1.202      1.113      0.943      4.354      4.453      0.992
    1,000,000     (1x)          6.473      5.464      1.730      1.462      1.237      4.457      4.595      1.249

  [10% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.611      6.019      4.784      4.663      4.123     12.028     11.850      5.107
    10,000        (100x)        2.016      3.142      1.086      1.027      0.829      4.144      4.197      0.883
    100,000       (10x)         7.541      5.990      2.102      1.533      1.220      4.811      4.804      1.223
    1,000,000     (1x)          7.810      6.320      2.567      2.029      1.797      4.990      4.988      1.798

  [25% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      5.721      6.279      5.006      4.868      4.272     12.447     12.462      5.206
    10,000        (100x)        2.138      3.372      1.347      1.229      0.977      4.311      4.380      0.986
    100,000       (10x)        11.681      8.649      5.040      3.452      3.081      6.682      6.685      3.065
    1,000,000     (1x)         11.924      8.886      5.339      3.806      3.493      6.915      6.893      3.497

  [50% bad]
    n             (x)          divide reciprocal   multiply        a*a     square      power  float_pow         **
    100           (10000x)      6.007      7.976      5.310      5.031      4.310     12.333     12.155      5.255
    10,000        (100x)        2.305      3.600      1.890      1.443      1.111      4.618      4.754      1.120
    100,000       (10x)        14.957     12.396      9.054      6.859      6.275     10.364     10.428      6.283
    1,000,000     (1x)         15.017     12.586      9.228      7.047      6.506     10.496     10.514      6.500

This is on my AMD Ryzen 7 7800X3D.

skirpichev · 2026-02-11T06:36:31Z

Results looks noisy, when I'm trying to compare "clean" cases. Sometimes (in same column) PR is even faster (e.g. divide). I suggest using pyperf for benchmarks.

ikrommyd · 2026-02-11T06:54:37Z

Results looks noisy, when I'm trying to compare "clean" cases. Sometimes (in same column) PR is even faster (e.g. divide). I suggest using pyperf for benchmarks.

Yeah I need to do better benchmarking but division being better sometimes is also something that I saw with asv. Anyways, I think it’s evident however that multiplication takes a big hit, especially if you have a lot of bad numbers to correct.

ikrommyd · 2026-02-11T19:01:36Z

Follow up from the community meeting

-Ofast and -ffast-math disable these recoveries for C++, otherwise they are there (may be implementation dependent though).

numpy on  complex-div-and-mul-annex-g [?] is 󰏗 v2.5.0.dev0 via  pyenv 3.14.3
❯ g++ main.cpp -o main -O0; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)

numpy on  complex-div-and-mul-annex-g [?] is 󰏗 v2.5.0.dev0 via  pyenv 3.14.3
❯ g++ main.cpp -o main -O1; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)

numpy on  complex-div-and-mul-annex-g [?] is 󰏗 v2.5.0.dev0 via  pyenv 3.14.3
❯ g++ main.cpp -o main -O2; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)

numpy on  complex-div-and-mul-annex-g [?] is 󰏗 v2.5.0.dev0 via  pyenv 3.14.3
❯ g++ main.cpp -o main -O3; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)

numpy on  complex-div-and-mul-annex-g [?] is 󰏗 v2.5.0.dev0 via  pyenv 3.14.3
❯ g++ main.cpp -o main -Ofast; ./main
(1+1j)/(inf+infj) = (nan,nan)
(inf+nanj)/(2^1000+2^-1000j) = (nan,nan)
(1e300+1j)*(nan+infj) = (nan,nan)

numpy on  complex-div-and-mul-annex-g [?] is 󰏗 v2.5.0.dev0 via  pyenv 3.14.3
❯ g++ main.cpp -o main -ffast-math; ./main
(1+1j)/(inf+infj) = (-nan,-nan)
(inf+nanj)/(2^1000+2^-1000j) = (nan,nan)
(1e300+1j)*(nan+infj) = (nan,nan)

#include <complex>
#include <cmath>
#include <iostream>
#include <limits>

int main() {
    double inf = std::numeric_limits<double>::infinity();
    double mynan = std::numeric_limits<double>::quiet_NaN();

    std::complex<double> a(1, 1);
    std::complex<double> b(inf, inf);
    std::cout << "(1+1j)/(inf+infj) = " << a/b << std::endl;

    std::complex<double> d(inf, mynan);
    std::complex<double> e(std::pow(2,1000), std::pow(2,-1000));
    std::cout << "(inf+nanj)/(2^1000+2^-1000j) = " << d/e << std::endl;

    std::complex<double> z(1e300, 1);
    std::complex<double> c(mynan, inf);
    std::cout << "(1e300+1j)*(nan+infj) = " << z*c << std::endl;

    return 0;
}

cc @seberg @mattip

EDIT(seberg): Godbolt link: https://godbolt.org/z/5dvqqYeWh

seberg · 2026-02-11T19:21:28Z

Yeah I need to do better benchmarking but division being better sometimes is also something that I saw with asv.

Yeah, the benchmarks can change rather randomly unfortunately. It might be a good indicator if the change is similar across float and double for example.

If C++ implements these, I am tempted to take a small performance hit for it (although there have been very few real complaints about it). It seems like one of those things that is nice to get right, but in practice almost nobody notices :).

ikrommyd · 2026-02-11T19:24:15Z

Yeah I'd think what we actually want here is a slim to none performance hit only in the nominal case (no nan + nanj to correct) and a small performance hit in a realistic case where some values are nan + nanj and need correction (I don't know that that realistic case might be. 1% of entries? 5% of entries?)

skirpichev · 2026-02-11T20:35:57Z

Yeah I'd think what we actually want here is a slim to none performance hit only in the nominal case (no nan + nanj to correct)

But for multiplication I see difference up to 16% in "clean" case. This looks too much for me for one isnan() check vs 6 arithmetic ops.

-Ofast and -ffast-math disable these recoveries for C++

Not a surprise, these options are to break standard.

In the clang, implementation of complex multiplication and division controlled by -fcomplex-arithmetic; IIRIC, -ffast-math assumes something like -fcomplex-arithmetic=basic, i.e. primitive algorithms (no Smith's division, for example) and without recover of NaN's.

ikrommyd added 8 commits February 7, 2026 14:40

ENH: follow C99 Annex G for complex division

f454e20

MNT: match the Annex G implementation exactly

f95420d

ENH: follow C99 Annex G for complex multiplication

e00f7d1

ENH: follow C99 Annex G for complex multiplication in SIMD

45d9c18

ENH: follow C99 Annex G for reciprocal

d0c0555

TST: add scalar tests

bb39db5

TST: add umath tests

8dd7ff6

TST: add reciprocal testing

4157926

github-actions bot added the 01 - Enhancement label Feb 8, 2026

MAINT: bring back accidental removal for static inline

71e8a31

ikrommyd added 2 commits February 9, 2026 00:25

MAINT: forgot this multiply indexed loop

b7e8b6a

ENH: follow C99 Annex G for square

ecbcb55

Uh oh!

Conversation

ikrommyd commented Feb 8, 2026

Uh oh!

ikrommyd commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Feb 9, 2026

Uh oh!

ikrommyd commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikrommyd commented Feb 9, 2026

Uh oh!

ngoldbaum commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skirpichev commented Feb 10, 2026

Uh oh!

ikrommyd commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Feb 11, 2026

Uh oh!

ikrommyd commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skirpichev commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikrommyd commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ikrommyd commented Feb 11, 2026 • edited by seberg Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Feb 11, 2026

Uh oh!

ikrommyd commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skirpichev commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ikrommyd commented Feb 8, 2026 •

edited

Loading

ikrommyd commented Feb 9, 2026 •

edited

Loading

ngoldbaum commented Feb 10, 2026 •

edited

Loading

ikrommyd commented Feb 11, 2026 •

edited

Loading

ikrommyd commented Feb 11, 2026 •

edited

Loading

skirpichev commented Feb 11, 2026 •

edited

Loading

ikrommyd commented Feb 11, 2026 •

edited

Loading

ikrommyd commented Feb 11, 2026 •

edited by seberg

Loading

ikrommyd commented Feb 11, 2026 •

edited

Loading