Conversation
The use of the macro _NPY_CLIP results in multiple re-evaluations of the input arguments. Thus for floating point types, the check of NaNs is performed multiple times. This manifests itself as a slowness on Win32 builds. See numpy#18673.
|
The code here looks completely equivalent to the old one? What do you mean by multiple re-evaluations? If there is different on windows, it seems like weird MSVC optimization fluctuations?! A speed problem on windows is usually (or was, it might be fixed), is that MSVC is unable to spit out reasonable machine code for |
|
That said, if such a small manual macro expansion makes a big difference on windows, I guess we might as well put it in. Could you try to run the benchmarks? Hopefully it would work to go into the |
Run the following code through the C pre-processor: #define _NPY_CLIP(x, min, max) _NPY_FLOAT_MIN(_NPY_FLOAT_MAX((x), (min)), (max))
#define _NPY_FLOAT_MIN(a, b) (npy_isnan(a) ? (a) : PyArray_MIN(a, b))
#define _NPY_FLOAT_MAX(a, b) (npy_isnan(a) ? (a) : PyArray_MAX(a, b))
float func1(float x, float min, float max)
{
return _NPY_CLIP(x, min, max);
}
float func2(float x, float min, float max)
{
float t = x;
t = _NPY_FLOAT_MAX(t, min);
t = _NPY_FLOAT_MIN(t, max);
return t;
}You will get: float func1(float x, float min, float max)
{
return (npy_isnan((npy_isnan((x)) ? ((x)) : PyArray_MAX((x), (min)))) ? ((npy_isnan((x)) ? ((x)) : PyArray_MAX((x), (min)))) : PyArray_MIN((npy_isnan((x)) ? ((x)) : PyArray_MAX((x), (min))), (max)));
}
float func2(float x, float min, float max)
{
float t = x;
t = (npy_isnan(t) ? (t) : PyArray_MAX(t, min));
t = (npy_isnan(t) ? (t) : PyArray_MIN(t, max));
return t;
}
|
|
If I run the test script in #18673 (comment) i.e. np.clip(a, mn, mx) on Windows no longer runs slower than np.clip(np.clip(a, None, mx), mn, None)) on this PR.
Sorry, I am not familiar with the Using the following gave the same error. |
|
Sorry, something went wrong editing that line: EDIT: ARRRrrg, I had grepped for |
|
But yeah, seems reasonable. I missed that it gets copied unnecessarily for the |
seberg
left a comment
There was a problem hiding this comment.
OK, nevermind the discussion. The change looks good to me, and while the macro was there for a reason, the snippet is so small that it doesn't matter IMO.
I guess you could put that multi-line statement into the macro itself but I am not even sure it would be nicer (at least as long as all the paths are right next to each other).
If you are up for it, it would be great to add a benchmark to avoid a performance regression (and it would be generally nice to have). Probably as part of bench_ufunc.py. But it doesn't seem like we have a whole lot similar ones yet.
|
Actually, gh-19713 will conflict with this, and should also fix it. So it may be we just skip this and go straight there. Thanks though! |
|
Ah yes, C++ should help. Guess this PR came a few months too late... I will wait for that PR to get merged. I came across this blog which says that i.e. When both compiled with VS2019, |
With vs2017 since 1.19.x, previously vs2015. |
|
Thanks @pijyoi . |
The use of the macro _NPY_CLIP results in multiple re-evaluations of the
input arguments. Thus for floating point types, the check of NaNs is
performed multiple times.
This manifests itself as a slowness on Win32 builds. See #18673.