ENH: Adding Object dtype to einsum#18053
Conversation
|
Also, |
| @@ -0,0 +1,309 @@ | |||
| import itertools | |||
There was a problem hiding this comment.
Most of the test case are just copied from test_einsum.py, which I think is unnecessary, you can just add object type to test_einsum.py.
There was a problem hiding this comment.
Ah, that's another issue I got stuck on.
The tests in check_einsum_sums do somethig like:
a = np.arange(1, dtype='object')
np.sum(a, axis=-1).astype('object'))
which raises _`AttributeError: int' object has no attribute 'astype'
I guess a object array with size =1 gets converted to a regular python int at some point in np.sum()
So, I needed a test to remove the .astype
There was a problem hiding this comment.
You could always add a special case within the test_einsum.py tests for when dtype=object, to avoid those troublesome tests.
There was a problem hiding this comment.
You could always add a special case within the test_einsum.py tests for when dtype=object, to avoid those troublesome tests.
Done
| if(PyErr_Occurred()){ | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
I don't think calling PyErr_Occurred is safe from within NPY_BEGIN_THREADS_THRESHOLDED, but hopefully someone who remembers more about the threading (AKA releasing the GIL) can review to confirm or refute.
I suspect the real error here is that NPY_BEGIN_THREADS_THRESHOLDED is not safe to call at all when handling object arrays.
There was a problem hiding this comment.
I didn't consider this.
Perhaps calling sop can be split into two functions based on type_num. one function releases the GIL were the other does not, and checks for PyError_Occured.
There was a problem hiding this comment.
See also gh-18450 for how the error should be checked here.
There was a problem hiding this comment.
I think the problem of API calls inside the NPY_BEGIN_THREADS macros is resolved: it will only release the GIL and check error conditions if the needs_api is False.
|
I'm just here to say that this enhancement would be extremely useful to me. Please keep at it! |
|
Following @eric-wieser 's comment on Which just combines I can get to work on drafting some tests for this |
|
I am happy with adding it (probably a few places that use the other should be using that anyway), although I am not sure that here relying on I personally would like if we returned |
|
I'm here again to say that I would absolutely love to have this functionality in numpy. Is there anything I could do to help out? |
|
@ketch, I think this PR is pretty much stalled at this point. It could probably be picked up and start a new one with some inspiration from here (in that case preferably with the commits as basis for attribution unless). |
Implementing Erics suggestions Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
e3813b7 to
31e2176
Compare
|
Also closes #23492 |
|
@mattip did you fix this up more generally? Can we fix Can also do push it off, but I hadn't even realized that there was this place that has another version of custom zero filling... |
|
|
6317a13 to
d1306cb
Compare
|
I removed the WIP and added the ENH label. Does this need a release note? |
|
I think its worth an improvement note, has been asked for often enough! |
|
In the top comment, @iamsoto mentioned:
It turns out both were wrong: the correct value is |
| ``np.einsum`` now accepts arrays with ``object`` dtype | ||
| ------------------------------------------------------ | ||
| The code path will call python operators on object dtype arrays, much | ||
| like ``np.dot`` and ``np.matmul``. |
There was a problem hiding this comment.
There might be an assumption that it is OK to start with int(0), that I am not sure is shared by all code paths (e.g. dot does not). I can live this this, though, even if it might be considered an issue.
Anyway, have to review the rest carefully, but not today.
|
I think this is ready for review. |
seberg
left a comment
There was a problem hiding this comment.
The GIL releasing doesn't seem right. Otherwise, it would be good to clean up the function name duplication a bit mostly I think.
| NPY_BEGIN_THREADS_THRESHOLDED(shape[2] * shape[1] * shape[0]); | ||
| for (coords[1] = shape[2]; coords[1] > 0; --coords[1]) { | ||
| for (coords[0] = shape[1]; coords[0] > 0; --coords[0]) { | ||
| sop(2, ptrs[0], strides[0], shape[0]); |
There was a problem hiding this comment.
| sop(2, ptrs[0], strides[0], shape[0]); | |
| sop(2, ptrs[0], strides[0], shape[0]); |
if really would be nicer if sop would just have a return value IMO. Then we could avoid the whole PyErr_Occurred() thing.
| * loop provided by the iterator is in Fortran-order. | ||
| */ | ||
| int needs_api = NpyIter_IterationNeedsAPI(iter); | ||
| NPY_BEGIN_THREADS_THRESHOLDED(shape[2] * shape[1] * shape[0]); |
There was a problem hiding this comment.
NpyIter_IterationNeedsAPI is OK for now. But we have to use it for more than the error guard. Can't release the GIL (and have to restore, although that may not need a conditional, since the macro's should come with one).
There was a problem hiding this comment.
(It would be nice to have a test with large enough inputs to make sure the GIL release path would be taken.)
There was a problem hiding this comment.
I copy-pasted the pattern from elsewhere. I will check for needs_api before calling NPY_BEGINS_THREADS_THRESHOLDED everywhere in the file. Indeed, NPY_END_THREADS does not need a guard, it checks whether NPY_BEGINS_THREADS_THRESHOLDED actually released the GIL before re-acquiring it.
| * object_sum_of_products_outstride0_one, | ||
| * object_sum_of_products_outstride0_two, | ||
| * object_sum_of_products_outstride0_three, | ||
| * object_sum_of_products_outstride0_any# |
There was a problem hiding this comment.
There are a lot of templating here. Notes:
- Not all of these are ever used (The code coverage may actual points these out correctly)
- The function picking the specialization accepts
NULLas "not implemented" in some but not all places (which could be extended) - I don't mind aliasing some, but its all in one file how about just doing
#define ... object_sum_of_products_any? and avoid the templating?
f7c9f71 to
962120b
Compare
|
reguide_check crashed but it passes locally. |
| } | ||
| Py_SETREF(prod, PyNumber_Multiply(prod, curr)); | ||
| if (!prod) { | ||
| return; |
There was a problem hiding this comment.
I would have thought this is covered by a test I added with NULL pointers?
There was a problem hiding this comment.
This would be multiplication raising. string objects would do?
There was a problem hiding this comment.
Hmmm, but NULL filled would be None, so it should be covered, weird, but OK.
| PyObject *sum = PyNumber_Add(*(PyObject **)dataptr[nop], prod); | ||
| Py_DECREF(prod); | ||
| if (!sum) { | ||
| return; |
There was a problem hiding this comment.
I guess I can get this line covered with an object that returns itself on multiplication and raises on addition?
|
The coverage is just weird, but probably largely due to the odd templating and the function being instantiated way too many times. It seems overall OK, though. We do start the reduction with I am going to merge this, there are two cleanups that really should happen:
Tests seem fine, and it is nice that there is the Thanks @iamsoto and especially Matti for the many follow-ups! |
|
@seberg |
Hi Numpy,
Following issue #17837 I'm attempting to add support for the Object dtype to einsum.
Currently, I'm encountering an issue with scalar types.
If we take, for example
This doesn't work because einsum produces
Falsewere dot producesNone. You can view this scenario in action innumpy/core/tests/test_einsum_object.py line 245.test_einsum.py line 539. I'm not sure how to resolve this.This is my most ambitious contribution to Numpy to date and I very much appreciate any and all advice, and your patience! Thank you!