MAINT: Reduce allocation size of empty (0 size) arrays to 1 byte#21477
MAINT: Reduce allocation size of empty (0 size) arrays to 1 byte#21477seberg merged 9 commits intonumpy:mainfrom
Conversation
| * Note: always sync this with calls to PyDataMem_UserFREE | ||
| */ | ||
| if (nbytes == 0) { | ||
| nbytes = descr->elsize ? descr->elsize : 1; |
There was a problem hiding this comment.
Can we ensure all the strides are set to zero to avoid any concerns of undefined behavior?
There was a problem hiding this comment.
OK, but array_strides_set will allow resetting the strides to any values.
There was a problem hiding this comment.
I have the following behavior when a 0-size array is reshaped in NumPy 1.23:
>>> import numpy as np
>>> a = np.ones((2, 0, 3))
>>> a.strides
(0, 0, 0)
>>> b = a.reshape((3, 0, 2))
>>> b.strides
(16, 16, 8)
Do you think reshape should be also fixed to return b with strides (0, 0, 0) here, @eric-wieser @mattip ? Thanks!
ref. cupy/cupy#6807
There was a problem hiding this comment.
That does look like a bug. Please open a new issue for this, the PR here was merged and further discussion will get lost.
|
I am wondering if we should worry about limiting the total size also in this case. But I cannot think of any place where it would actually be used (i.e. we assume that a subset of axis cannot overflow when combined). So, I would feel a bit safer to keep that check unmodified, but I am OK to not worry about it now. |
which ones? |
|
Sorry, the current loop ignores 0s, so if you create an array of shape |
I restored this behaviour. There were a few places in the code that assumed |
I'd hoped this was the type of thing that |
|
Thanks for restoring the max-size thing! Yeah, |
…th `size == 0`. In order to avoid undefined behavior, we need to ensure the generated strides are all zero.
90ab6b9 to
0f8bb86
Compare
a69de90 to
0f8bb86
Compare
92bbe08 to
e598b11
Compare
|
@mattip I just realized, I had that EDIT: I may just push that update here later though. |
We have to reset all strides in the case of a subarray dtype which may have caused the result array to not be 1-D. Since the logic is a bit complicated now, restore the logic to only do this when it really is necessary.
|
I think the windows error path is which gives "RuntimeWarning: invalid value encountered in add". After |
|
Ah, this is probably an invalid access in our sum implementation. The reduce path would be taken here with numpy/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src Lines 523 to 532 in 62293d4 That path clearly always loads one data-point from the output (even for EDIT: Maybe all reduce checks should have an |
|
So for the windows thing, I think the correct fix (which is a bug-fix in any case), is to never do an empty reduce-loop. There are two ways to do that:
Since I do not think complex iteration paths can generate length 0 inner-loop calls, both seem very feasible to me, and I think I will make a PR for it. But I would also be happy with the second approach. |
|
I think this PR is growing too large. The change to prevent ufuncs looping when |
|
After a one-line check for 0-length loops in |
|
If we fix it in the ufunc setup code, I think we should add an assert to |
Is there precedence for this in the code? The macros are used like
I originally did, but it turned out to be simpler than I thought, so I am not sure it is necessary as a separate PR. |
Ah, I had not thought about it, but this should work (just tested briefly and seems good): Maybe a comment makes sense, that it uses the One thing here is that we still need to figure out the EDIT: Well, unfortunatley, the assert is hit quickly even with the fix, so we probably need a few more smaller tweaks in the ufunc machinery to do this. |
|
#21499 was merged, Is there any outstanding review for this PR (pushing on it a bit since it removes the annoying "uh oh" output when testing allocation policy changes)? |
|
@mattip unfortunatley, I don't trust the arr = np.ones(1, dtype="i8")
arr = np.broadcast_to(arr, 10)
arr.dtype = "i1"
print(arr) # oopsUnfortunately, I don't have a clear idea of how to fix that. Right now, I mainly see the option of making a "deep clean" here and rethinking the code almost as a whole, and I didn't try to think it through yet. |
|
I pushed a fix which will allow a view of an empty array ( |
| /* resize on last axis only */ | ||
| int axis = PyArray_NDIM(self) - 1; | ||
| if (PyArray_DIMS(self)[axis] != 1 && | ||
| PyArray_SIZE(self) != 0 && |
There was a problem hiding this comment.
Hah, this should work, yeah. So the one thing is that if the stride was previously 0, in theory it should remain 0 in this case (but gets replaced with elsize). I think I am OK with ignoring the potential UB problem that may follow that for now.
If we really want to fight it, I guess we would have to add asserts into array_dealloc to flush out places that need adjustment.
@eric-wieser how worried are you actually about the UB? If not very, I am tempted to suggest accepting that this path might lead to this UB.
There was a problem hiding this comment.
@mattip if you really care about the prod, you cold order it to the end, since it will be hit very rarely then?
But honestly, this is still a Python "call", so I really doubt it matters at all.
|
OK, I am going to put this in a bit optimistically, since I think Matti was hoping to get this into the next release. If we got holes in this logic, it seems OK to fix it in bug-fix releases as well. But if Eric (or anyone else) is a bit worried about the changes, we can revert again as well. However, right now, I also think that downstream testing might be good here, and merging will get us at least a bit of testing. |
|
🚀 |
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of the itemsize. This causes `array_core()` to erroneously attempt to index the array with numeric indices instead of empty slices. Since this is a special case (empty arrays), a simple solution seems to be for `array_core()` to return the array itself when the array is 0-sized. In order to maintain consistency of the `array_core()` function between the simulator and hardware targets, we make this change to both implementations.
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of the itemsize. This causes `array_core()` to erroneously attempt to index the array with numeric indices instead of empty slices. Since this is a special case (empty arrays), a simple solution seems to be for `array_core()` to return the array itself when the array is 0-sized. In order to maintain consistency of the `array_core()` function between the simulator and hardware targets, we make this change to both implementations.
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of the itemsize. This causes `array_core()` to erroneously attempt to index the array with numeric indices instead of empty slices. Since this is a special case (empty arrays), a simple solution seems to be for `array_core()` to return the array itself when the array is 0-sized. In order to maintain consistency of the `array_core()` function between the simulator and hardware targets, we make this change to both implementations.
Since numpy/numpy#21477 the strides of a 0-sized array are 0 instead of the itemsize. This causes `array_core()` to erroneously attempt to index the array with numeric indices instead of empty slices. Since this is a special case (empty arrays), a simple solution seems to be for `array_core()` to return the array itself when the array is 0-sized. In order to maintain consistency of the `array_core()` function between the simulator and hardware targets, we make this change to both implementations.
In NumPy 1.23, the strides of empty arrays are 0 instead of the item size, due to numpy/numpy#21477 - however, `np.empty_like` seems to create non-zero-strided arrays from a zero-strided empty array, and copying to the host from a device array with zero strides fails a compatibility check in Numba. This commit works around the issue by calling `copy_to_host()` with no arguments, allowing Numba to create an array on the host that is compatible with the device array - the resulting implementation is functionally equivalent and slightly simpler, so I believe this change could remain permanant rather than requiring a revert later.
In NumPy 1.23, the strides of empty arrays are 0 instead of the item size, due to numpy/numpy#21477 - however, `np.empty_like` seems to create non-zero-strided arrays from a zero-strided empty array, and copying to the host from a device array with zero strides fails a compatibility check in Numba. This commit works around the issue by calling `copy_to_host()` with no arguments, allowing Numba to create an array on the host that is compatible with the device array - the resulting implementation is functionally equivalent and slightly simpler, so I believe this change could remain permanant rather than requiring a revert later.
…7089) In NumPy 1.23, the strides of empty arrays are 0 instead of the item size, due to numpy/numpy#21477 - however, `np.empty_like` seems to create non-zero-strided arrays from a zero-strided empty array, and copying to the host from a device array with zero strides fails a compatibility check in Numba. This commit works around the issue by calling `copy_to_host()` with no arguments, allowing Numba to create an array on the host that is compatible with the device array - the resulting implementation is functionally equivalent and slightly simpler, so I believe this change could remain permanant rather than requiring a revert later.
…ask#7089) In NumPy 1.23, the strides of empty arrays are 0 instead of the item size, due to numpy/numpy#21477 - however, `np.empty_like` seems to create non-zero-strided arrays from a zero-strided empty array, and copying to the host from a device array with zero strides fails a compatibility check in Numba. This commit works around the issue by calling `copy_to_host()` with no arguments, allowing Numba to create an array on the host that is compatible with the device array - the resulting implementation is functionally equivalent and slightly simpler, so I believe this change could remain permanant rather than requiring a revert later.
Restart #15788 from @eric-wieser.