ENH: performance improvement to ediff1d#1
Conversation
Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.
|
I've actually never used this function before. What are the use cases for An alternative approach might be to squeeze its functionality (adding a start or end value) into |
|
Buried in some performance critical code I needed to compute an elementwise difference of a 1d array and prepend the result with a value, basically exactly what ediff1d does. Adding beginning and ending values might make sense for diff, but I have not needed that specific functionality. Diff seems far more general, including multidimensional arrays and higher derivatives. While I have the performance itch, diff could be modified to a single pass algorithm instead of recursive with one pass for each n. Adding beginning and ending arrays could be done in conjunction. But that's a much bigger effort. |
| to_end = np.asanyarray(to_end).ravel() | ||
|
|
||
| # do the calculation in place and copy to_begin and to_end | ||
| result = np.empty(l + len(to_begin) + len(to_end), dtype=ary.dtype) |
There was a problem hiding this comment.
In theory this could be a performance regression due to the extra copy if neither to_begin nor to_end is used.
There was a problem hiding this comment.
If they aren't used then there is nothing to copy and they are basically no ops. Correct?
There was a problem hiding this comment.
Good point, you need to allocate the result array anyways.
| l = len(ary) - 1 | ||
| if l < 0: | ||
| # force length to be non negative, match previous API | ||
| # should this be an warning or deprecated? |
There was a problem hiding this comment.
If anyone cares, we could deprecate this. But I think it's probably not worth the trouble -- I would sooner deprecate the entire function.
|
OK, this seems reasonable enough to me. The function is certainly a very weird fit for |
|
originally arraysetops used ediff1d, see https://github.com/numpy/numpy/blob/669969980843dc207db170d99fa0884594c6bc7e/numpy/lib/arraysetops.py#L70 Now it looks like diff is used in its place. |
|
can/should I merge? Not sure of the process. |
|
My only concern here is that the existing tests for
You actually opened a pull request against the wrong repository :). I only found this because I followed the link from your issue. Please reopen against the master branch of |
|
Oops, sorry about pulling to the wrong repo. I'll add some more tests and then reopen |
Adds a regression test that demonstrates the issue.
Eliminate a copy operation when to_begin or to_end is given. Also
use ravel instead of flatiter which is much faster.
Closes numpy#8175
Benchmark:
python -m timeit --setup="import numpy as np;x=np.arange(1e7)" "np.ediff1d(x, 0)"
new version is about 5x faster on my machine