Skip to content

ENH: __array_function__ updates for NumPy 1.17.0#12830

Merged
mattip merged 1 commit intonumpy:masterfrom
shoyer:array-func-np117
Jan 23, 2019
Merged

ENH: __array_function__ updates for NumPy 1.17.0#12830
mattip merged 1 commit intonumpy:masterfrom
shoyer:array-func-np117

Conversation

@shoyer
Copy link
Copy Markdown
Member

@shoyer shoyer commented Jan 22, 2019

  • Always enable array_function overrides.
  • Remove special cases for Python 2 compatibility.
  • Document these changes in 1.17.0-notes.rst.

It will be good to see ASV numbers to understand the performance implications
of these changes. If need be, we can speed up NumPy functions internally by
using non-dispatched functions (with .__wrapped__).

- Always enable __array_function__ overrides.
- Remove special cases for Python 2 compatibility.
- Document these changes in 1.17.0-notes.rst.

It will be good to see ASV numbers to understand the performance implications
of these changes. If need be, we can speed up NumPy functions internally by
using non-dispatched functions (with ``.__wrapped__``).
Copy link
Copy Markdown
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI related changes LGTM

@mattip mattip merged commit f8058c8 into numpy:master Jan 23, 2019
@mattip
Copy link
Copy Markdown
Member

mattip commented Jan 23, 2019

Thanks @shoyer

@shoyer shoyer deleted the array-func-np117 branch January 23, 2019 07:03
@rgommers rgommers added this to the 1.17.0 release milestone Jun 28, 2019
@rgommers
Copy link
Copy Markdown
Member

rgommers commented Jun 28, 2019

It will be good to see ASV numbers to understand the performance implications
of these changes. If need be, we can speed up NumPy functions internally by
using non-dispatched functions (with .__wrapped__).

@shoyer I'm just looking at the benchmarks, and noticed that this PR introduced fairly large regressions. Both in the override benchmarks and in the below one for count_nonzero it's about 7 us:

image

That's more than I expected (it's up to 9x in the benchmarks, see https://pv.github.io/numpy-bench/#regressions?sort=3&dir=desc). Did you or anyone else look into this after the merge of this PR at any point?

@shoyer
Copy link
Copy Markdown
Member Author

shoyer commented Jun 28, 2019

@rgommers That is indeed way more of a slow-down than I expected. But it looks like the actual slow-down from this PR for time_count_nonzero is much smaller, only about 2x and only for the smallest size arrays (100 elements):

image

That said, 7us of overhead still seems like a lot. Running this on my machine, I see about 400 ns of overhead from dispatching. That's in approximately the expected range. There's a 2x slowdown, but there's definitely no nested dispatch going on:

In [25]: %timeit np.count_nonzero._implementation(x)
421 ns ± 37.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [26]: %timeit np.count_nonzero(x)
853 ns ± 17.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

@shoyer
Copy link
Copy Markdown
Member Author

shoyer commented Jun 28, 2019

The slow down for the override benchmarks is to be expected from turning on overrides by default. Before this PR, those benchmarks weren't actually measuring anything at all.

@rgommers
Copy link
Copy Markdown
Member

rgommers commented Jun 28, 2019

Thanks @shoyer

That said, 7us of overhead still seems like a lot. Running this on my machine, I see about 400 ns of overhead from dispatching. That's in approximately the expected range.

Yes, 400 ns is about the number I had in mind. Something odd about that benchmarking setup perhaps.

In [1]: np.__version__                                                                         
Out[1]: '1.17.0.dev0+ad84b69'

In [2]: x = np.linspace(0, 1, num=10)                                                          

In [3]: %timeit np.count_nonzero._implementation(x)                                            
300 ns ± 1.52 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit np.count_nonzero(x)                                                            
623 ns ± 1.37 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [1]: np.__version__                                                                         
Out[1]: '1.16.2'

In [2]: x = np.linspace(0, 1, num=10)                                                          

In [3]: %timeit np.count_nonzero(x)                                                            
280 ns ± 0.863 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [1]: np.__version__                                                                         
Out[1]: '1.15.4'

In [2]: x = np.linspace(0, 1, num=10)                                                          

In [3]: %timeit np.count_nonzero(x) 
295 ns ± 8.36 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


# And for reference a do-nothing function with the same signature as count_nonzero
In [9]: def f2(x, axis=None): pass                                                             

In [10]: %timeit f2(x)                                                                         
81.4 ns ± 3.05 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

# And for another comparison, overhead of a ufunc call (~400ns)
In [32]: x = np.linspace(0, 1, num=1)                                                          

In [33]: %timeit np.abs(x)                                                                     
415 ns ± 10.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [34]: %timeit np.positive(x)                                                                
387 ns ± 1.35 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Okay, so we're all good here:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants