ENH: Adding Object dtype to einsum by i-am-b-soto · Pull Request #18053 · numpy/numpy

i-am-b-soto · 2020-12-22T00:39:47Z

Hi Numpy,

Following issue #17837 I'm attempting to add support for the Object dtype to einsum.

Currently, I'm encountering an issue with scalar types.
If we take, for example

a = np.arange(1, dtype="object")
assert_equal(np.einsum("i,i", a[1:], a[:-1], optimize=do_opt), np.dot(a[1:], a[:-1]))

This doesn't work because einsum produces False were dot produces None. You can view this scenario in action in ~~numpy/core/tests/test_einsum_object.py line 245.~~ test_einsum.py line 539. I'm not sure how to resolve this.

This is my most ambitious contribution to Numpy to date and I very much appreciate any and all advice, and your patience! Thank you!

i-am-b-soto · 2020-12-22T00:40:25Z

Also,
Right now there is just one function used to handle all the optimizations of einsum. (It's based on _sum_of_products_any) If I can get this to pass the tests and get a green light from other members of the team, I can add more of the optimizations.

numpy/core/src/multiarray/einsum_sumprod.c.src

Qiyu8 · 2020-12-22T03:39:55Z

numpy/core/tests/test_einsum_object.py

@@ -0,0 +1,309 @@
+import itertools


Most of the test case are just copied from test_einsum.py, which I think is unnecessary, you can just add object type to test_einsum.py.

Ah, that's another issue I got stuck on.

The tests in check_einsum_sums do somethig like:

a = np.arange(1, dtype='object') np.sum(a, axis=-1).astype('object'))

which raises _`AttributeError: int' object has no attribute 'astype'
I guess a object array with size =1 gets converted to a regular python int at some point in np.sum()
So, I needed a test to remove the .astype

You could always add a special case within the test_einsum.py tests for when dtype=object, to avoid those troublesome tests.

You could always add a special case within the test_einsum.py tests for when dtype=object, to avoid those troublesome tests.

Done

eric-wieser · 2020-12-26T23:16:46Z

numpy/core/src/multiarray/einsum.c.src

+        if(PyErr_Occurred()){
+            return -1;
+        }


I don't think calling PyErr_Occurred is safe from within NPY_BEGIN_THREADS_THRESHOLDED, but hopefully someone who remembers more about the threading (AKA releasing the GIL) can review to confirm or refute.

I suspect the real error here is that NPY_BEGIN_THREADS_THRESHOLDED is not safe to call at all when handling object arrays.

I didn't consider this.
Perhaps calling sop can be split into two functions based on type_num. one function releases the GIL were the other does not, and checks for PyError_Occured.

See also gh-18450 for how the error should be checked here.

I think the problem of API calls inside the NPY_BEGIN_THREADS macros is resolved: it will only release the GIL and check error conditions if the needs_api is False.

numpy/core/src/multiarray/einsum_sumprod.c.src

ketch · 2021-03-28T09:53:18Z

I'm just here to say that this enhancement would be extremely useful to me. Please keep at it!

i-am-b-soto · 2021-03-30T05:23:34Z

Following @eric-wieser 's comment on PyError_Occured not being allowed inside NPY_BEGIN_THREADS_THREASHOLDED How does the team feel about defining a new macro.
Something like:

#define NPY_BEGIN_THREADS_DESCR_THRESHOLDED(loop_size, dtype) do { if ((loop_size) > 500 && (!(PyDataType_FLAGCHK((dtype), NPY_NEEDS_PYAPI)))) \
{ _save =PyEval_SaveThread();} while(0);

Which just combines NPY_BEGIN_THREADS_DESCR with NPY_BEGIN_THREADS_THREASHOLDED
And then is called inside einsum.c.src like:
NPY_BEGIN_THREADS_DESCR_THRESHOLDED(shape[1] * shape[0], NpyIter_GetDescrArray(iter))

I can get to work on drafting some tests for this

seberg · 2021-03-30T15:16:11Z

I am happy with adding it (probably a few places that use the other should be using that anyway), although I am not sure that here relying on int needs_api = NpyIter_IterationNeedsAPI(iter) isn't just as well.

I personally would like if we returned -1 on error from the sop (but that would also mean adding return 0; to all others).

ketch · 2021-09-13T13:31:27Z

I'm here again to say that I would absolutely love to have this functionality in numpy. Is there anything I could do to help out?

seberg · 2021-09-13T15:21:10Z

@ketch, I think this PR is pretty much stalled at this point. It could probably be picked up and start a new one with some inspiration from here (in that case preferably with the commits as basis for attribution unless).

Implementing Erics suggestions Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

mattip · 2023-03-30T08:30:20Z

Also closes #23492

seberg · 2023-03-30T08:39:01Z

@mattip did you fix this up more generally?

Can we fix PyArray_AssignZero instead since it seems to be a private function to begin with? The concept of zero filling is used in some other places. This is OK for now, but we need to find a more generic solution eventually (this doesn't generalize to more DTypes). I think np.zeros actually uses int(0) and then uses force-casting, which may be the more generic solution here (with a fast-path that we can keep to assume that non-reference object memset is fine, but that may also not generalize perfectly).

Can also do push it off, but I hadn't even realized that there was this place that has another version of custom zero filling...

mattip · 2023-03-30T09:36:09Z

PyArray_AssignZero is currently used in exactly one place (in einsum). There is a comment at the top of reduction.h that seems to be left over from a previous refactoring. I will

refactor the function to handle all dtypes, including user-defined
use it in PyArray_MatrixProduct2

mattip · 2023-03-30T13:43:36Z

I removed the WIP and added the ENH label. Does this need a release note?

seberg · 2023-03-30T13:47:48Z

I think its worth an improvement note, has been asked for often enough!

mattip · 2023-03-30T13:50:29Z

In the top comment, @iamsoto mentioned:

This doesn't work because einsum produces False where dot produces None

It turns out both were wrong: the correct value is 0 in both cases. This was due to PyArray_AssignZero assigning False for object dtype, and dot improperly using memset for object dtype arrays.

seberg · 2023-03-30T14:31:18Z

doc/release/upcoming_changes/18053.new_feature.rst

+``np.einsum`` now accepts arrays with ``object`` dtype
+------------------------------------------------------
+The code path will call python operators on object dtype arrays, much
+like ``np.dot`` and ``np.matmul``.


There might be an assumption that it is OK to start with int(0), that I am not sure is shared by all code paths (e.g. dot does not). I can live this this, though, even if it might be considered an issue.

Anyway, have to review the rest carefully, but not today.

mattip · 2023-04-07T08:15:01Z

I think this is ready for review.

seberg

The GIL releasing doesn't seem right. Otherwise, it would be good to clean up the function name duplication a bit mostly I think.

seberg · 2023-04-11T11:44:07Z

numpy/core/src/multiarray/einsum.c.src

    NPY_BEGIN_THREADS_THRESHOLDED(shape[2] * shape[1] * shape[0]);
    for (coords[1] = shape[2]; coords[1] > 0; --coords[1]) {
        for (coords[0] = shape[1]; coords[0] > 0; --coords[0]) {
            sop(2, ptrs[0], strides[0], shape[0]);


Suggested change

sop(2, ptrs[0], strides[0], shape[0]);

sop(2, ptrs[0], strides[0], shape[0]);

if really would be nicer if sop would just have a return value IMO. Then we could avoid the whole PyErr_Occurred() thing.

I opened #23671 for this

seberg · 2023-04-11T11:45:50Z

numpy/core/src/multiarray/einsum.c.src

     * loop provided by the iterator is in Fortran-order.
     */
+    int needs_api = NpyIter_IterationNeedsAPI(iter);
    NPY_BEGIN_THREADS_THRESHOLDED(shape[2] * shape[1] * shape[0]);


NpyIter_IterationNeedsAPI is OK for now. But we have to use it for more than the error guard. Can't release the GIL (and have to restore, although that may not need a conditional, since the macro's should come with one).

(It would be nice to have a test with large enough inputs to make sure the GIL release path would be taken.)

I copy-pasted the pattern from elsewhere. I will check for needs_api before calling NPY_BEGINS_THREADS_THRESHOLDED everywhere in the file. Indeed, NPY_END_THREADS does not need a guard, it checks whether NPY_BEGINS_THREADS_THRESHOLDED actually released the GIL before re-acquiring it.

numpy/core/src/multiarray/einsum.c.src

seberg · 2023-04-11T11:53:23Z

numpy/core/src/multiarray/einsum_sumprod.c.src

+ *  object_sum_of_products_outstride0_one,
+ *  object_sum_of_products_outstride0_two,
+ *  object_sum_of_products_outstride0_three,
+ *  object_sum_of_products_outstride0_any#


There are a lot of templating here. Notes:

Not all of these are ever used (The code coverage may actual points these out correctly)

The function picking the specialization accepts NULL as "not implemented" in some but not all places (which could be extended)

I don't mind aliasing some, but its all in one file how about just doing #define ... object_sum_of_products_any? and avoid the templating?

numpy/core/src/multiarray/einsum_sumprod.c.src

mattip · 2023-04-27T21:34:27Z

reguide_check crashed but it passes locally.

mattip · 2023-04-27T21:40:27Z

numpy/core/src/multiarray/einsum_sumprod.c.src

+            }
+            Py_SETREF(prod, PyNumber_Multiply(prod, curr));
+            if (!prod) {
+                return;


I would have thought this is covered by a test I added with NULL pointers?

This would be multiplication raising. string objects would do?

Hmmm, but NULL filled would be None, so it should be covered, weird, but OK.

mattip · 2023-04-27T21:41:55Z

numpy/core/src/multiarray/einsum_sumprod.c.src

+        PyObject *sum = PyNumber_Add(*(PyObject **)dataptr[nop], prod);
+        Py_DECREF(prod);
+        if (!sum) {
+            return;


I guess I can get this line covered with an object that returns itself on multiplication and raises on addition?

seberg · 2023-04-28T09:49:35Z

The coverage is just weird, but probably largely due to the odd templating and the function being instantiated way too many times. It seems overall OK, though.

We do start the reduction with int(0) for object, which I think should be OK in for most practical purposes (and if anyone wants to change it, it should be plausible to do).

I am going to merge this, there are two cleanups that really should happen:

The loop duplication is just a mess, maybe just better about NULL meaning fall-through would be good (lets not care much about a few misses for object loops).
Error returns relying on PyErr_Occurred() is something I spend a lot of work on getting rid off, and I don't like this making almost a step backwards (before, it didn't matter).

Tests seem fine, and it is nice that there is the dot fix here! So lets give this a shot.

Thanks @iamsoto and especially Matti for the many follow-ups!

i-am-b-soto · 2023-05-03T21:41:02Z

@seberg
I'm sorry, I haven't been able to look at this for some time. Really happy you guys got it figured out!!

xref numpy#18053

github-actions bot added the 25 - WIP label Dec 22, 2020

eric-wieser reviewed Dec 22, 2020

View reviewed changes

numpy/core/src/multiarray/einsum_sumprod.c.src Outdated Show resolved Hide resolved

eric-wieser reviewed Dec 22, 2020

View reviewed changes

numpy/core/src/multiarray/einsum_sumprod.c.src Show resolved Hide resolved

eric-wieser reviewed Dec 22, 2020

View reviewed changes

numpy/core/src/multiarray/einsum_sumprod.c.src Outdated Show resolved Hide resolved

Qiyu8 reviewed Dec 22, 2020

View reviewed changes

eric-wieser reviewed Dec 26, 2020

View reviewed changes

Base automatically changed from master to main March 4, 2021 02:05

InessaPawson added the 52 - Inactive Pending author response label Aug 8, 2022

i-am-b-soto and others added 8 commits March 29, 2023 14:06

WIP: Adding Object dtype to einsum

4168b17

PR_fixes_1

53f7b55

Update numpy/core/src/multiarray/einsum_sumprod.c.src

20ea97f

Implementing Erics suggestions Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

BUG: if dtype is object, do outbuf[:] = 0 rather than memset

5e6e586

BUG: only check PyErr_Occurred if Python C-API is needed

18de0a5

BUG: work around shortcoming in PyArray_AssignZero for object dtype

e800d9e

TST: fix tests for einsum returning a python object

2538eb7

TST: add test for 5e6e586 (issue numpygh-23492)

31e2176

mattip mentioned this pull request Mar 30, 2023

ENH: Add 'out' in tensordot #15186

Closed

mattip force-pushed the adding_object_to_einsum branch from e3813b7 to 31e2176 Compare March 30, 2023 08:29

MAINT: expand PyArray_AssignZero to handle object dtype

d1306cb

mattip force-pushed the adding_object_to_einsum branch from 6317a13 to d1306cb Compare March 30, 2023 13:03

mattip removed the 52 - Inactive Pending author response label Mar 30, 2023

mattip added 01 - Enhancement and removed 25 - WIP labels Mar 30, 2023

seberg changed the title ~~WIP: Adding Object dtype to einsum~~ ENH: Adding Object dtype to einsum Mar 30, 2023

DOC: add a release note

1f1a7ed

seberg reviewed Mar 30, 2023

View reviewed changes

seberg reviewed Apr 11, 2023

View reviewed changes

mattip mentioned this pull request Apr 27, 2023

ENH: update einsum som_of_product functions to return a value instead of void #23671

Open

fixes from review

962120b

mattip force-pushed the adding_object_to_einsum branch from f7c9f71 to 962120b Compare April 27, 2023 20:32

mattip reviewed Apr 27, 2023

View reviewed changes

seberg merged commit 94e5723 into numpy:main Apr 28, 2023

This was referenced Apr 28, 2023

ENH: Einsum does not support object dtype #17837

Closed

BUG: dot can create a 0d array with dtype=object that can assert in ufuncs #23492

Closed

BvB93 added a commit to BvB93/numpy that referenced this pull request May 8, 2023

TYP: Let np.einsum accept object dtypes

cc99207

xref numpy#18053

	sop(2, ptrs[0], strides[0], shape[0]);
	sop(2, ptrs[0], strides[0], shape[0]);

Uh oh!

Conversation

i-am-b-soto commented Dec 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

i-am-b-soto commented Dec 22, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ketch commented Mar 28, 2021

Uh oh!

i-am-b-soto commented Mar 30, 2021

Uh oh!

seberg commented Mar 30, 2021

Uh oh!

ketch commented Sep 13, 2021

Uh oh!

seberg commented Sep 13, 2021

Uh oh!

mattip commented Mar 30, 2023

Uh oh!

seberg commented Mar 30, 2023

Uh oh!

mattip commented Mar 30, 2023

Uh oh!

mattip commented Mar 30, 2023

Uh oh!

seberg commented Mar 30, 2023

Uh oh!

mattip commented Mar 30, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Apr 7, 2023

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattip commented Apr 27, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

i-am-b-soto commented Dec 22, 2020 •

edited

Loading