ENH: Increase numpy.average performance mentioned in #5507. by minhlongdo · Pull Request #5551 · numpy/numpy

minhlongdo · 2015-02-10T07:52:28Z

This numpy.inexact to determine if it is numpy.float64. If the data type not numpy.float64 then it uses astype() to convert to numpy.float64.

Nodd · 2015-02-10T09:21:04Z

numpy/lib/function_base.py

Maybe use not X instead of X is False ?

Will do, is there any technical difference between X is False and not X that I am not aware of?

They are not the same. In Python is checks to see if the two names point to the same object ("object identity"). So although using is works fine here, it wouldn't in other cases because it requires the LHS to be exactly the object False. Using not X will coerce X to a bool, which is in general what you want.

a = False a is False Out[2]: True b = 0 b is False Out[4]: False not b Out[5]: True

I would also argue that is False or is True isn't really idiomatic.

Thank you for the explaination.

jaimefrio · 2015-02-10T14:42:43Z

Based on the discussion on #5525, you may want to take a look at the current implementation of np.mean here, and try to replicate what's there. The two main differences I see are:

Explicitly list the types you want upcasted: your current implementation will convert arrays of objects, which could hold e.g. an arbitrary precision type.
Rather than making a copy of the array with the new dtype, use it in the call to the ufunc, which is going to save you an intermediate array and is likely to be faster.

minhlongdo · 2015-02-20T21:46:35Z

@jaimefrio Would it be ok if I replace a = a.astype('float64') with a = np.multiply(1.0, a)? I chose np.multiply because it was faster than np.add and it also does not produce an intermediate array.

The performance of np.multiply and np.add on a 5000 x 5000 array
np.multiply: 1.03e-01 sec
np.add: 1.17e-01

charris · 2015-02-22T23:44:35Z

numpy/lib/function_base.py

Someone else used a function that might be good here, result_type. So maybe something like

dt = np.result_type(a, 0.0)

I agree that before a = a.astype('float64') using dt = np.result_type(a, 0.0) would be appropriate. However, as @jaimefrio mentioned before astype requires an intermediate array for the conversion. So I was thinking of using a = np.multiply(1.0, a).
So the fix would look like the following:

if issubclass(a.dtype.type, (integer, bool)): a = np.multiply(1.0, a)

Otherwise I could do it the alternative way using result_type, then the fix would look like this:

if issubclass(a.dtype.type, (integer, bool)): dt = np.result_type(a, 0.0) a = a.astype(dt)

Don't use a multiplication for this.

I think @jaimefrio was suggesting that you select the dtype via dt = np.result_dtype(a, 0.0) and then later on in the function, specify that dtype in the ufunc call. So down below it has something like avg = np.multiply(a, wtg).sum(axis)/scl the suggestion is to change this to avg = np.multiply(a, wtg, dtype=dt).sum(axis)/scl and, of course, whatever other changes are needed to ensure the dtype handling is still or becomes correct.

Doing it this way means that you don't ever have to have a full copy of a around since the ufunc machinery will convert a for you using a reasonably sized buffer.

Thinking somemore, this is probably a good use cars for einsum. It can do the necessary sum product without any temporaries. The only possible drawback is that we would then miss out on the pair wise summation that np.add does these days. Seems like a good trade off to me. The fastest implementation for the 1d case is probably via vdot, too FWIW.

homu · 2016-03-07T20:29:02Z

☔ The latest upstream changes (presumably #7382) made this pull request unmergeable. Please resolve the merge conflicts.

eric-wieser · 2017-11-29T17:59:35Z

Closing in favor of #7382, which seems to supersede this.

ENH: Increase numpy.average performance mentioned in #5507.

fed05c9

minhlongdo mentioned this pull request Feb 10, 2015

ENH: Increase average calculation performance. See #5507 #5525

Closed

minhlongdo added 3 commits February 10, 2015 08:01

ENH: Increase numpy.average performance mentioned in #5507.

d6f5691

ENH: Increase numpy.average performance mentioned in #5507.

f2e1837

Correct function call np.issubclass to np.issubclass_.

b8d8b07

Nodd reviewed Feb 10, 2015
View reviewed changes

Change from np.issubclass_ is False to not np.issubclass_.

fd4edce

Explicitly state bool and int to be upcasted to float64.

960abb1

charris reviewed Feb 22, 2015
View reviewed changes

rgommers added component: numpy.lib 01 - Enhancement labels Mar 8, 2015

seberg mentioned this pull request Mar 6, 2016

MAINT: cleanup np.average #7382

Merged

eric-wieser closed this Nov 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ENH: Increase numpy.average performance mentioned in #5507.#5551

ENH: Increase numpy.average performance mentioned in #5507.#5551
minhlongdo wants to merge 6 commits intonumpy:masterfrom
minhlongdo:master

minhlongdo commented Feb 10, 2015

Uh oh!

Nodd Feb 10, 2015

Uh oh!

minhlongdo Feb 10, 2015

Uh oh!

ewmoore Feb 10, 2015

Uh oh!

minhlongdo Feb 11, 2015

Uh oh!

jaimefrio commented Feb 10, 2015

Uh oh!

minhlongdo commented Feb 20, 2015

Uh oh!

charris Feb 22, 2015

Uh oh!

minhlongdo Feb 25, 2015

Uh oh!

ewmoore Feb 25, 2015

Uh oh!

ewmoore Feb 26, 2015

Uh oh!

homu commented Mar 7, 2016

Uh oh!

eric-wieser commented Nov 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Uh oh!

Conversation

minhlongdo commented Feb 10, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jaimefrio commented Feb 10, 2015

Uh oh!

minhlongdo commented Feb 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

homu commented Mar 7, 2016

Uh oh!

eric-wieser commented Nov 29, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants