ENH: Increase numpy.average performance mentioned in #5507.#5551
ENH: Increase numpy.average performance mentioned in #5507.#5551minhlongdo wants to merge 6 commits intonumpy:masterfrom minhlongdo:master
Conversation
numpy/lib/function_base.py
Outdated
There was a problem hiding this comment.
Maybe use not X instead of X is False ?
There was a problem hiding this comment.
Will do, is there any technical difference between X is False and not X that I am not aware of?
There was a problem hiding this comment.
They are not the same. In Python is checks to see if the two names point to the same object ("object identity"). So although using is works fine here, it wouldn't in other cases because it requires the LHS to be exactly the object False. Using not X will coerce X to a bool, which is in general what you want.
a = False
a is False
Out[2]: True
b = 0
b is False
Out[4]: False
not b
Out[5]: True
I would also argue that is False or is True isn't really idiomatic.
There was a problem hiding this comment.
Thank you for the explaination.
|
Based on the discussion on #5525, you may want to take a look at the current implementation of
|
|
@jaimefrio Would it be ok if I replace The performance of |
There was a problem hiding this comment.
Someone else used a function that might be good here, result_type. So maybe something like
dt = np.result_type(a, 0.0)
There was a problem hiding this comment.
I agree that before a = a.astype('float64') using dt = np.result_type(a, 0.0) would be appropriate. However, as @jaimefrio mentioned before astype requires an intermediate array for the conversion. So I was thinking of using a = np.multiply(1.0, a).
So the fix would look like the following:
if issubclass(a.dtype.type, (integer, bool)):
a = np.multiply(1.0, a)
Otherwise I could do it the alternative way using result_type, then the fix would look like this:
if issubclass(a.dtype.type, (integer, bool)):
dt = np.result_type(a, 0.0)
a = a.astype(dt)
There was a problem hiding this comment.
Don't use a multiplication for this.
I think @jaimefrio was suggesting that you select the dtype via dt = np.result_dtype(a, 0.0) and then later on in the function, specify that dtype in the ufunc call. So down below it has something like avg = np.multiply(a, wtg).sum(axis)/scl the suggestion is to change this to avg = np.multiply(a, wtg, dtype=dt).sum(axis)/scl and, of course, whatever other changes are needed to ensure the dtype handling is still or becomes correct.
Doing it this way means that you don't ever have to have a full copy of a around since the ufunc machinery will convert a for you using a reasonably sized buffer.
There was a problem hiding this comment.
Thinking somemore, this is probably a good use cars for einsum. It can do the necessary sum product without any temporaries. The only possible drawback is that we would then miss out on the pair wise summation that np.add does these days. Seems like a good trade off to me. The fastest implementation for the 1d case is probably via vdot, too FWIW.
|
☔ The latest upstream changes (presumably #7382) made this pull request unmergeable. Please resolve the merge conflicts. |
|
Closing in favor of #7382, which seems to supersede this. |
This numpy.inexact to determine if it is numpy.float64. If the data type not numpy.float64 then it uses astype() to convert to numpy.float64.