ENH: performance improvement to ediff1d by mattharrigan · Pull Request #1 · mattharrigan/numpy

mattharrigan · 2016-10-19T00:42:56Z

Eliminate a copy operation when to_begin or to_end is given. Also
use ravel instead of flatiter which is much faster.

Closes numpy#8175

Benchmark:
python -m timeit --setup="import numpy as np;x=np.arange(1e7)" "np.ediff1d(x, 0)"

new version is about 5x faster on my machine

Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.

shoyer · 2016-10-19T01:21:46Z

I've actually never used this function before. What are the use cases for np.ediff1d versus np.diff?

An alternative approach might be to squeeze its functionality (adding a start or end value) into np.diff

mattharrigan · 2016-10-19T02:15:29Z

Buried in some performance critical code I needed to compute an elementwise difference of a 1d array and prepend the result with a value, basically exactly what ediff1d does. Adding beginning and ending values might make sense for diff, but I have not needed that specific functionality. Diff seems far more general, including multidimensional arrays and higher derivatives.

While I have the performance itch, diff could be modified to a single pass algorithm instead of recursive with one pass for each n. Adding beginning and ending arrays could be done in conjunction. But that's a much bigger effort.

shoyer · 2016-10-19T02:19:11Z

+        to_end = np.asanyarray(to_end).ravel()
+
+    # do the calculation in place and copy to_begin and to_end
+    result = np.empty(l + len(to_begin) + len(to_end), dtype=ary.dtype)


In theory this could be a performance regression due to the extra copy if neither to_begin nor to_end is used.

If they aren't used then there is nothing to copy and they are basically no ops. Correct?

Good point, you need to allocate the result array anyways.

shoyer · 2016-10-19T02:21:29Z

+    l = len(ary) - 1
+    if l < 0:
+        # force length to be non negative, match previous API
+        # should this be an warning or deprecated?


If anyone cares, we could deprecate this. But I think it's probably not worth the trouble -- I would sooner deprecate the entire function.

shoyer · 2016-10-19T02:22:27Z

OK, this seems reasonable enough to me.

The function is certainly a very weird fit for arraysetops, though.

mattharrigan · 2016-10-19T12:12:30Z

originally arraysetops used ediff1d, see https://github.com/numpy/numpy/blob/669969980843dc207db170d99fa0884594c6bc7e/numpy/lib/arraysetops.py#L70

Now it looks like diff is used in its place.

mattharrigan · 2016-10-19T12:13:16Z

can/should I merge? Not sure of the process.

shoyer · 2016-10-19T15:01:30Z

My only concern here is that the existing tests for ediff1d feel a little sparse -- they don't even check to_begin or to_end with non-empty input. So maybe add a few test cases, just so we can be confident in the refactor.

can/should I merge? Not sure of the process.

You actually opened a pull request against the wrong repository :). I only found this because I followed the link from your issue. Please reopen against the master branch of numpy/numpy by clicking "New pull request" on the main numpy repo. (Technically, you can do whatever you want in your own fork, but nobody else is going to see it.)

mattharrigan · 2016-10-19T15:38:02Z

Oops, sorry about pulling to the wrong repo. I'll add some more tests and then reopen

Adds a regression test that demonstrates the issue.

ENH: performance improvement to ediff1d

16a98c0

Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.

shoyer reviewed Oct 19, 2016

View reviewed changes

TST: Added cases for better coverage

1bb4329

mattharrigan closed this Oct 20, 2016

mattharrigan pushed a commit that referenced this pull request Nov 8, 2016

Merge pull request #1 from embray/asarray

f2c818a

Adds a regression test that demonstrates the issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: performance improvement to ediff1d#1

ENH: performance improvement to ediff1d#1
mattharrigan wants to merge 2 commits intomasterfrom
ediff1d-performance

mattharrigan commented Oct 19, 2016 •

edited

Loading

Uh oh!

shoyer commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

shoyer Oct 19, 2016

Uh oh!

mattharrigan Oct 19, 2016

Uh oh!

shoyer Oct 19, 2016

Uh oh!

shoyer Oct 19, 2016

Uh oh!

shoyer commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

shoyer commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mattharrigan commented Oct 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shoyer commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

shoyer Oct 19, 2016

Choose a reason for hiding this comment

Uh oh!

mattharrigan Oct 19, 2016

Choose a reason for hiding this comment

Uh oh!

shoyer Oct 19, 2016

Choose a reason for hiding this comment

Uh oh!

shoyer Oct 19, 2016

Choose a reason for hiding this comment

Uh oh!

shoyer commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

shoyer commented Oct 19, 2016

Uh oh!

mattharrigan commented Oct 19, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mattharrigan commented Oct 19, 2016 •

edited

Loading