MAINT: Rewrite shape normalization in pad function#11966
MAINT: Rewrite shape normalization in pad function#11966mattip merged 4 commits intonumpy:masterfrom
Conversation
numpy/lib/arraypad.py
Outdated
There was a problem hiding this comment.
I think these lines are the important part of the patch
|
I'm with mhvk here - this function doesn't seem important enough to justify a constant-time micro-optimization. Let's just fix the int/intp bug, and remove the weird list comprehensions that are simpler as a call to |
|
Removing the optimizations results (923ce12) slows down the benchmarks noticeably compared to (88c01b8). At this rate I would say the extra code paths for size == 1 and size == 2 may be worth it, considering that those are the most likely input. But if your experience tells you otherwise I'll gladly defer to your decision. :) In the meantime I'll write some unit tests for New Benchmarks |
I wonder whether |
numpy/lib/arraypad.py
Outdated
There was a problem hiding this comment.
Would be great to add a failing unit tests for this. Here's one that fails:
# pad the non-empty axis so that no memory allocation is needed
assert_equal(np.pad(np.zeros((1, 0)), [(0, 2**31), (0, 0)], 'constant').shape, (2**31+1, 0))
assert_equal(np.pad(np.zeros((1, 0)), [(0, 2**32), (0, 0)], 'constant').shape, (2**32+1, 0))There was a problem hiding this comment.
Would be great to add a failing unit tests for this.
You mean marked with pytest's xfail? Considering the explanation in #12023 (comment): how would I mark a unit test as xfailing only on Windows Systems?
There was a problem hiding this comment.
I think what I meant by this, is test the above since they no longer fail with this patch. More rigorous I suppose would be to use np.iinfo(np.intp).max instead of 2**32
There was a problem hiding this comment.
Wouldn't it be simpler to test _as_pairs directly if the unit test in question would be introduced alongside this anyway?
E.g.:
_as_pairs([np.iinfo(np.intp).max], 2, as_index=True)
# -> passes
_as_pairs([np.iinfo(np.intp).max + 1], 2, as_index=True)
# -> raises "ValueError: index can't contain negative values"I'm still not sure what this test is supposed to accomplish. You want to document the behavior for the edge case around the maximum of intp, correct?
Also
np.pad(np.zeros((1, 0)), [(0, np.iinfo(np.intp).max), (0, 0)], 'constant')will raise a different error which may not be what you intended...
ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.
There was a problem hiding this comment.
@mattip, @mhvk Before you consider merging this I just want to highlight this thread in case you missed it. From what I gathered @eric-wieser suggested this as an optional improvement. However I still don't know how to proceed.
Of course I'm always happy to add the test as a separate PR if / when this thread is resolved.
This rewrite joins the previous functions _normalize_shape and _validate_length as an intro to larger changes suggested in numpygh-11126.
|
Some comments & observations:
I hope that is enough to for you to decide which version to prefer. Obviously I'm still prefering the faster version which I tried to improve. I hope this doesn't seem too strong-headed. In any case I'll defer to your final judgment. ;) |
Test coverage of _as_pairs is 100%.
|
Very interesting issue, CI passing but local runtests failing: Looking into this. Probably unrelated to this particular PR. EDIT: in-place build works fine, looks like a |
Addresses reviews: - The test "x.ndim < 3" now covers both optimizations as it should. - Extend comments and improve formatting as well.
Here you go : New benchmarksI made a few temporary additions to the benchmarks to hopefully cover more use cases: Results using a single core while keeping the OS as idle as possible: |
|
@mhvk you also looked at this, any last comments? |
Also improve description of the rounding behavior when using `as_index=True`.
|
No, no last comments. Thanks, @lagru! |
|
Thanks @lagru and the reviewers |
|
What Numpy version will this patch be in? 1.16.0? This broke something in scikit-image/scikit-image#3551 and affects Astropy testing, so I am trying to figure out what is compatible with what in the test matrix. Thanks. |
|
Yes, seems to be in 1.16.0 can you open an issue? |
I am not sure what issue to open. Seems to be between |
|
@pllim OK, nevermind. If you scikit-image fixed it, I think lets just not worry about it. This was usage of a private internal function, so removing is not really a bug or anything. That said, if this was a real issue, we can work around it of course... |
|
@seberg , just to clarify, I am not involved with scikit-image dev, I am just a user in this case. Yes, I agree that it is not a bug. |
Suggested in #11358 (comment).
This rewrite joins the previous functions
_normalize_shapeand_validate_lengthas an intro to larger changes suggested in gh-11126. This function accelerates the benchmarks fornp.padby a factor of0.57 to 0.90.
cc @eric-wieser
Benchmarks