Skip to content

ENH: Increase numpy.average performance mentioned in #5507.#5551

Closed
minhlongdo wants to merge 6 commits intonumpy:masterfrom
minhlongdo:master
Closed

ENH: Increase numpy.average performance mentioned in #5507.#5551
minhlongdo wants to merge 6 commits intonumpy:masterfrom
minhlongdo:master

Conversation

@minhlongdo
Copy link
Copy Markdown

This numpy.inexact to determine if it is numpy.float64. If the data type not numpy.float64 then it uses astype() to convert to numpy.float64.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use not X instead of X is False ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, is there any technical difference between X is False and not X that I am not aware of?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not the same. In Python is checks to see if the two names point to the same object ("object identity"). So although using is works fine here, it wouldn't in other cases because it requires the LHS to be exactly the object False. Using not X will coerce X to a bool, which is in general what you want.

a = False

a is False
Out[2]: True

b = 0 

b is False
Out[4]: False

not b
Out[5]: True

I would also argue that is False or is True isn't really idiomatic.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explaination.

@jaimefrio
Copy link
Copy Markdown
Member

Based on the discussion on #5525, you may want to take a look at the current implementation of np.mean here, and try to replicate what's there. The two main differences I see are:

  1. Explicitly list the types you want upcasted: your current implementation will convert arrays of objects, which could hold e.g. an arbitrary precision type.
  2. Rather than making a copy of the array with the new dtype, use it in the call to the ufunc, which is going to save you an intermediate array and is likely to be faster.

@minhlongdo
Copy link
Copy Markdown
Author

@jaimefrio Would it be ok if I replace a = a.astype('float64') with a = np.multiply(1.0, a)? I chose np.multiply because it was faster than np.add and it also does not produce an intermediate array.

The performance of np.multiply and np.add on a 5000 x 5000 array
np.multiply: 1.03e-01 sec
np.add: 1.17e-01

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone else used a function that might be good here, result_type. So maybe something like

dt = np.result_type(a, 0.0)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that before a = a.astype('float64') using dt = np.result_type(a, 0.0) would be appropriate. However, as @jaimefrio mentioned before astype requires an intermediate array for the conversion. So I was thinking of using a = np.multiply(1.0, a).
So the fix would look like the following:

if issubclass(a.dtype.type, (integer, bool)):
    a = np.multiply(1.0, a)

Otherwise I could do it the alternative way using result_type, then the fix would look like this:

if issubclass(a.dtype.type, (integer, bool)):
    dt = np.result_type(a, 0.0)
    a = a.astype(dt)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use a multiplication for this.

I think @jaimefrio was suggesting that you select the dtype via dt = np.result_dtype(a, 0.0) and then later on in the function, specify that dtype in the ufunc call. So down below it has something like avg = np.multiply(a, wtg).sum(axis)/scl the suggestion is to change this to avg = np.multiply(a, wtg, dtype=dt).sum(axis)/scl and, of course, whatever other changes are needed to ensure the dtype handling is still or becomes correct.

Doing it this way means that you don't ever have to have a full copy of a around since the ufunc machinery will convert a for you using a reasonably sized buffer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking somemore, this is probably a good use cars for einsum. It can do the necessary sum product without any temporaries. The only possible drawback is that we would then miss out on the pair wise summation that np.add does these days. Seems like a good trade off to me. The fastest implementation for the 1d case is probably via vdot, too FWIW.

@homu
Copy link
Copy Markdown
Contributor

homu commented Mar 7, 2016

☔ The latest upstream changes (presumably #7382) made this pull request unmergeable. Please resolve the merge conflicts.

@eric-wieser
Copy link
Copy Markdown
Member

Closing in favor of #7382, which seems to supersede this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants