ENH: add C99 Annex G recoveries for complex multiplication and division following CPython#30806
ENH: add C99 Annex G recoveries for complex multiplication and division following CPython#30806ikrommyd wants to merge 11 commits intonumpy:mainfrom
Conversation
|
Opening it as draft for starters because there are decisions to be made from the maintainers here and it also definitely needs benchmarking since we're touching ufunc loops. I'd also appreciate some help on the SIMD implementations as I am completely clueless about SIMD apart from knowing what it is so I asked Claude for help on that part :). The rest is all implemented manually although there can be typos because parsing |
|
@r-devulap for the SIMD. |
|
What i'm debating with myself here is whether performance or "correctness" and compliance with the C99 Annex G standard and consistency with CPython is more worth it because the extra if branch in the ufunc loops (marked as |
|
The more I think about this and what we have to do to get multiplication consistent with other things that are effectively multiplication like |
|
@WarrenWeckesser @skirpichev ping in case you're interested in this, since you commented on related issues. |
|
I realize, that this is a draft, but for final version I suggest to split division and multiplication parts.
Both for multiplication and division it will cost additional isnan() check in "normal" case. I appreciate benchmarks, that show it's effect. This is something, that was missing in my pr (#26595). |
|
Yeah I have done some local benchmarks and have also used the existing ufunc benchmarks in the PS. For some existing existing ufunc benchmarks, I'm magically getting 20% improvement between this PR branch and main somehow (consistent across both my machines). I can't explain that yet. |
|
The asv ufunc benchmarks are noisy. If you compare the same commit against itself, you'll see random improvements and performance decreases. |
|
When I compare main to main or PR with PR, Anyways until I have an appropriate new benchmark, I've been using this naive in a way script vs PR branch: This is on my AMD Ryzen 7 7800X3D. |
|
Results looks noisy, when I'm trying to compare "clean" cases. Sometimes (in same column) PR is even faster (e.g. divide). I suggest using pyperf for benchmarks. |
Yeah I need to do better benchmarking but division being better sometimes is also something that I saw with asv. Anyways, I think it’s evident however that multiplication takes a big hit, especially if you have a lot of bad numbers to correct. |
|
Follow up from the community meeting
numpy on complex-div-and-mul-annex-g [?] is v2.5.0.dev0 via pyenv 3.14.3
❯ g++ main.cpp -o main -O0; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)
numpy on complex-div-and-mul-annex-g [?] is v2.5.0.dev0 via pyenv 3.14.3
❯ g++ main.cpp -o main -O1; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)
numpy on complex-div-and-mul-annex-g [?] is v2.5.0.dev0 via pyenv 3.14.3
❯ g++ main.cpp -o main -O2; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)
numpy on complex-div-and-mul-annex-g [?] is v2.5.0.dev0 via pyenv 3.14.3
❯ g++ main.cpp -o main -O3; ./main
(1+1j)/(inf+infj) = (0,0)
(inf+nanj)/(2^1000+2^-1000j) = (inf,-inf)
(1e300+1j)*(nan+infj) = (-inf,inf)
numpy on complex-div-and-mul-annex-g [?] is v2.5.0.dev0 via pyenv 3.14.3
❯ g++ main.cpp -o main -Ofast; ./main
(1+1j)/(inf+infj) = (nan,nan)
(inf+nanj)/(2^1000+2^-1000j) = (nan,nan)
(1e300+1j)*(nan+infj) = (nan,nan)
numpy on complex-div-and-mul-annex-g [?] is v2.5.0.dev0 via pyenv 3.14.3
❯ g++ main.cpp -o main -ffast-math; ./main
(1+1j)/(inf+infj) = (-nan,-nan)
(inf+nanj)/(2^1000+2^-1000j) = (nan,nan)
(1e300+1j)*(nan+infj) = (nan,nan)#include <complex>
#include <cmath>
#include <iostream>
#include <limits>
int main() {
double inf = std::numeric_limits<double>::infinity();
double mynan = std::numeric_limits<double>::quiet_NaN();
std::complex<double> a(1, 1);
std::complex<double> b(inf, inf);
std::cout << "(1+1j)/(inf+infj) = " << a/b << std::endl;
std::complex<double> d(inf, mynan);
std::complex<double> e(std::pow(2,1000), std::pow(2,-1000));
std::cout << "(inf+nanj)/(2^1000+2^-1000j) = " << d/e << std::endl;
std::complex<double> z(1e300, 1);
std::complex<double> c(mynan, inf);
std::cout << "(1e300+1j)*(nan+infj) = " << z*c << std::endl;
return 0;
}EDIT(seberg): Godbolt link: https://godbolt.org/z/5dvqqYeWh |
Yeah, the benchmarks can change rather randomly unfortunately. It might be a good indicator if the change is similar across float and double for example. If C++ implements these, I am tempted to take a small performance hit for it (although there have been very few real complaints about it). It seems like one of those things that is nice to get right, but in practice almost nobody notices :). |
|
Yeah I'd think what we actually want here is a slim to none performance hit only in the nominal case (no nan + nanj to correct) and a small performance hit in a realistic case where some values are nan + nanj and need correction (I don't know that that realistic case might be. 1% of entries? 5% of entries?) |
But for multiplication I see difference up to 16% in "clean" case. This looks too much for me for one isnan() check vs 6 arithmetic ops.
Not a surprise, these options are to break standard. In the clang, implementation of complex multiplication and division controlled by |
Adds C99 Annex G recoveries to complex multiplication and division that would otherwise result in nan + nanj
Closes #30805
Closes #26560
Closes #17425
Closes #30470
See relevant CPython PRs that did these updates for CPython:
python/cpython#120287 and python/cpython#119457
Because multiplication also has a SIMD implementation, we have to adapt that too because otherwise you'd get different answers depending on whether you have SIMD compiled numpy or not or whether you are working with complex64 and complex128 vs complex256 because I only see SIMD implementations for fp32 and fp64 only.