ENH: Create helper for conversion to arrays by seberg · Pull Request #24214 · numpy/numpy

seberg · 2023-07-19T11:13:45Z

Had to give this a start to be able to experiment. It does solve at least some problems (e.g. np.select() use-case and it looks like it would fix and/or simplify linalg code).

The main thing is that we often convert multiple arrays, and then want something like:

a, b, c = np.asarray(a), np.asarray(b), np.asarray(c) and then dtype = np.result_type(a, b, c) often with an additional 0.0. That is a pretty common pattern, NEP 50 makes it tricky if you wish to support weak scalars, because you need to call result_type with the scalar information intact (before the asarray or manually undo). (For example, this happens in np.select())
- Maybe libraries can disregard that, but for NumPy it's hard, since you want many functions to be ufunc-like, so if ufuncs do it, others probably should be able to/try to.
Maybe mainly internally we sometimes need the __array_wrap__ for subclass support, although it seems consistently used for single input functions (outside ufuncs).
- Additional motif: A clean wrapping pattern seems required for making scalar logic sane.
It may be useful to know if an input was a scalar or not (not a 0-D array). You need to ask NumPy about it, but it discovered at conversion time. So it is an information that is weird to tag onto an array (probably?) but figuring it out later would require duplicating work.

First, I wanted to just make a single function. But with multiple things above that may be interesting to return, and e.g. result_type() might error and I am not sure we always need it.

So, is it insane to create a rich object (rather than a complicated return value and many kwargs):

# better names of course, currently in `np.core._multiarray_umath.ArrayCreationHelper`
arrays = creation_helper(*inputs)

# e.g. a linspace could do something like this:
dtype = arrays.common_dtype(ensure_inexact=True)

arrays.arrays  # arrays as tuple (probably allow arrays[0])

# Apply the typical wrapping if needed
# (should cache wrap, or fetch it early on rather, maybe just
# make it a method which applies the wrapping `arrays.wrap(result)`).
if arrays.wrap is not None:
    return arrays.wrap(result)
return result

(I can see a, b, info = helper(a, b) as a nice pattern; OTOH it makes including casting kwarg heavy since you somewhat need to include it in the first step.)

Further things might include:

Casting to a given or the common dtype. (maybe arrays.to_common_dtype(ensure_inexact=True))
I think @mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

mhvk · 2023-07-19T17:26:37Z

I like this idea. Following the broadcasting suggestion, it may be nice to try to be reasonably compatible with what ufuncs do generally. I guess that might include an optional out argument (and corresponding out attribute) or perhaps instead a method output_arrays(out=None) that would create the array with the right dtype and subclass, or check if it is OK if an array is given. Taking this perhaps too far, one could similarly have an axis argument/method as well (which would just use normalize_axis on the broadcasted shape under the hood).

p.s. Might it be possible to factor out the ufunc machinery (rather that write this from scratch)? Or indeed use something like this inside the ufuncs?

mdhaber · 2023-07-19T17:44:22Z

I think @mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

That would be nice. It seems like a common enough follow-up step that it would be a convenient addition to this convenience function.

So, is it insane to create a rich object...

We do this frequently in SciPy now, so not insane. Seems unusual for NumPy, but I wouldn't complain.

Idea: maybe instead of an extra parameter ensure_inexact=True, could extra_dtype accept classes like np.inexact, if that makes sense?
I'm not used to reading C. Would it be easy to add a docstring or the equivalent sequence of existing NumPy commands so I can check my understanding of what is going on?
We had also discussed the possibility of adding axis to be ignored when broadcasting for use in, say, multi-sample reduction statistics. For example, in ttest_ind, it is sensible for x.shape == (5,) and y.shape == (3, 4) with axis=-1; when the arrays are broadcasted, the shapes become (3, 5) and (3, 4). I implemented this already in SciPy, but should I open an issue about it being moved upstream?

Thanks for this!

seberg · 2023-07-20T07:57:08Z

Yeah, agree that the ufunc machinery should share as much as possible, although was a bit hoping to mainly consolidate Python paths first. The reason was that looking at the array-wrap implementation in ufuncs they are messy and buggy (there is an issue), so fixing that seems like a bit of a messy change easier on its own.

I think a method for output array prep could make sense, adding it to the initial call seems like it would overload meaning (we don't know the common dtype is right for one thing).

Idea: maybe instead of an extra parameter ensure_inexact=True, could extra_dtype accept classes like np.inexact, if that makes sense?

Right, that is how it is implemented. Just dtype=np.inexact doesn't quite work yet and I thought the result_type(..., 0.) pattern is so common, it might actually make sense to have it explicitly.

[discussed] adding axis to be ignored when broadcasting for use in, say, multi-sample reduction statistics [...]

Ah, right! Yeah, maybe make its own issue, it seems to me we should with integrating that into np.broadcast_arrays first. Admittedly, I think that may require reimplementing the whole function (maybe for better).

rgommers · 2023-07-20T09:27:56Z

@seberg from your description it is not entirely clear to me if you mean for this to be a public or private function? With the namespace we discussed somewhere else, np.lib.array_utils I believe, we should get a place to put utilities like this - but I can't tell from the description yet if this would be useful more widely.

Since it was mentioned above, we recently came across the need for normalize_axis_index in SciPy - that's also a useful utility, which is not public right now.

seberg · 2023-07-20T09:37:19Z

The first issue I want to solve would mean private is fine, but yes suspect we should just make it public and if we are unsure about some methods/attributes or so give it an _ (which would mean putting it into np.lib.array_utils namespace, or different name if anyone suggests a better one soon).
(And @mhvk's interest makes me lean more towards public at last if I can get some of what @mdhaber wants too.)

mhvk · 2023-07-20T12:04:36Z

Agreed that the output makes more sense as a separate method, rather than on input, especially since in most cases something has to be generated.

Also, 👍 to making normalize_axis_index a publicly accessible tool (we already import it...); probably all of shape_base can go into the same collection too.

p.s. A better name than creation_helper would still be good. Will try to think a bit more.

p.p.s. You'll not be surprised to hear that I think that by default subclasses should be passed through... But it may be good to think a bit more generally what to do with array mimics - the benefit of getting result_type depends only on having a dtype, not on being an ndarray (subclass) (and similarly for a possible broadcast shape). Though easiest perhaps for that is just to let this be overridden by __array_function__.

mdhaber · 2023-08-12T01:19:24Z

I just ran into this little gem , which might have been avoided with use of a helper like this.

np.float16([2]) < np.float32(2.0009766)  # array([False])
np.float16([2]) < np.float32([2.0009766])  # array([ True])

seberg · 2023-08-12T07:50:59Z

You might think that helps, but it probably wouldn't :). This is NEP 50 stuff.

mdhaber · 2023-08-12T14:32:01Z

If before comparison the float16 is promoted to the result type or if the float32 is broadcasted to the result shape, the comparison is done correctly. So I think the helper would help with those things.

seberg · 2023-08-12T19:41:25Z

@mdhaber no, np.result_type() would do the same down promotion in the above case until you do NEP 50 .

mdhaber · 2023-08-12T19:59:09Z

I always pass the dtypes into result_type instead of the arrays.

import numpy as np
x = np.float16([2])
y = np.float32(2.0009766)
print(x < y)
x1, y1 = np.broadcast_arrays(x , y)
print(x1 < y1)
dtype = np.result_type(x.dtype, y.dtype)
print(x.astype(dtype) < y.astype(y))

[False]
[ True]
[ True]

seberg · 2023-08-12T20:05:09Z

Right, if you pass the dtypes you get the right thing, but that is the same as adopting NEP 50.

seberg · 2023-11-03T11:11:30Z

I was trying to get back to this a bit, but having difficulties to quite formulate what we need/want here. I would prefer to not go the feature-creep route for starters.

Things that I would like:

Allow converting multiple arrays at once more conveniently (easy).
The result should have common_dtype() as it is pretty useful especially with NEP 50.
The result should have a wrap() to apply __array_wrap__() when it makes sense. ¹
We could have more options:
- shape (braodcast shape)
- A broadcast() method to modify the "arrays"` (probably needs a local implementation for starters, which then might be adoptable elsewhere). Broadcasting could be a kwarg also.

There are some options, which would probably fit as kwargs ² :

subok=True (to set it to False)
broadcasting could be a kwarg.

Another API, I worry about more: it would be nice to provide a clean way to preserve Python better. This is to allow later normal promotion to use NEP 50 rules.
Due to the isclose and similar use-case, it also seems to me that in many cases it would make sense for broadcasting to not modify scalars (or 0-D objects).
For that, it would make sense to either have a kwarg preserve_scalars=True or an attribute arrays_or_scalars?

Maybe one solution is to make it:

conv = array_converter(*arrays)
dt = conv.common_dtype()  # always applicable, may do initial conversion to arrays
conv.shape  # could do this to get the broadcast shape, but could also defer.
conv.was_pyscalar  # maybe.  Or an enum with `0-D, scalar, python scalar`.

arrays = conv.arrays(subok=True, allow_scalars=True, broadcast=True, dtype=dt)

# Wrap result (if wanted) (see footnote 2) [^2]
return conv.wrap(result)

Any thoughts? Implementing any of this isn't super hard, but getting more clarity on where to start without a feature creap, is for me.

I think, I like that last thought as API (initialization takes no kwargs, but fetching the arrays is a function and not an attribute/property). But it isn't quite clear to me. Especially how to do scalar preservation.

We currently mainly/only seem to use wrap if subok=False, so the details here are a bit tricky (should you ever call it when you already got a subclass?).
However, one reason I would like a clear entry-point is that it would help a lot thinking about ENH: No longer auto-convert array scalars to numpy scalars in ufuncs (and elsewhere?) #24897, since it could take a kwarg like preserve_scalars=True. ↩
kwargs are slow for classes due to tp_new pre Python 3.13, but lets not worry about it ↩

seberg · 2023-11-23T11:06:43Z

numpy/_core/function_base.py

-    stop = asanyarray(stop) * 1.0
+    conv = _array_converter(start, stop)
+    start, stop = conv.as_arrays(subok=True)
+    dt = conv.result_type(ensure_inexact=True)


Iterated a bit, this is still a bit rough (I would like to consolidate __array_wrap__ a bit at least).

This does define conv.wrap() as the one true way to call __array_wrap__, but for NEP 50, the interesting part is really this part.
(I haved also moved the cast to float into the subtract via dtype=, which isn't strictly necessary.)

The point is, that this means np.linspace(np.float32(3), 10) returns float32. (yes, still needs tests, but thought I would note this, as this pattern would hopefully be useful a few places, that are currently subtly broken in the same way.)

@mhvk beyond cleanup, part of my thought for consolidating is that it should help with gh-24897. Because wrapping is a reasonable place to decide scalar vs. array logic.

mhvk

@seberg - now had a better look. I think this is really nice and it should be both relatively straightforward to extend and to use some of it inside the ufuncs.

One suggestion: it might be nice if conv can be indexed like a tuple. That should then return the "anyarray" version (in which case the default subok=False for as_arrays() makes sense; without this, I feel subok=True would be a more logical default).

If not doing that, I would suggest to replace ary, = conv.as_arrays() with ary = conv.as_arrays()[0] for clarity (the trailing comma before the equal sign is easy to miss...).

numpy/_core/function_base.py

mhvk · 2023-11-24T05:57:06Z

numpy/_core/src/multiarray/creation_helper.c

+    Py_ssize_t narrs_ssize_t = (args == NULL) ? 0 : PyTuple_GET_SIZE(args);
+    int narrs = (int)narrs_ssize_t;
+    /* Limit to 64 for now to use in a few places. */
+    if (narrs_ssize_t > 64) {


More logical to use NPY_MAXARGS?

mhvk · 2023-11-24T06:02:50Z

numpy/_core/src/multiarray/creation_helper.c

+        item->DType = NPY_DTYPE(PyArray_DESCR(item->array));
+        Py_INCREF(item->DType);
+
+        if (!npy_mark_tmp_array_if_pyscalar(


I think I'd swap the branches to avoid the !

mhvk · 2023-11-24T06:04:15Z

numpy/_core/src/multiarray/creation_helper.c

+            Py_INCREF(item->descr);
+        }
+        else {
+            /* just clear the flag again */


Well, swap only if this is really not needed! This would seem to preclude using it later, is that intended? Or is ->was_scalar exactly the same (i.e., python scalars only)?

I also leave descr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

mhvk · 2023-11-24T06:21:01Z

numpy/_core/src/multiarray/creation_helper.c

+            res_item = item->object;
+        }
+        else {
+            res_item = (PyObject *)item->array;


Probably premature optimization, but it would seem logical not to even create the array from a python scalar unless it is actually needed.

yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

mhvk · 2023-11-24T06:41:47Z

numpy/_core/src/multiarray/creation_helper.c

+creation_helper_dealloc(PyArrayCreationHelperObject *self)
+{
+    creation_item *item = self->items;
+    for (int i = 0; i < self->narrs; i++) {


you need an item++ here too! (or do self->items[i])

Ouch, luckily, probably the reason the sanitizer job failed...

mhvk · 2023-11-24T07:47:14Z

numpy/_core/src/multiarray/creation_helper.c

+creation_helper_as_arrays(PyArrayCreationHelperObject *self,
+        PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames)
+{
+    npy_bool subok = NPY_FALSE;


I think this will replace more asanyarray than asarray, so maybe make the default NPY_TRUE? In general, subclasses should be OK...

OK, let me change it. In practice it isn't a big change. For a bit I was thinking "why asanyarray at all?!
I am not actually sure it is more useful for most places, but since the conv.wrap() is a no-op (normally) for those now, I also don't really care to prefer it.

The point for asanyarray() is allowing functions to work due to composition of other functions/methods working out, when simple wrapping might not propagate info well.
In practice, I do wonder how often that is actually well advised (compared to using __array_function__ and dealing with it manually). The point being: if the class is not trivial enough for just wrapping, maybe it should use __array_function__. OTOH, I can see a small amount of pragmatism in it, so fine with preferring it.

numpy/lib/_arraysetops_impl.py

mhvk · 2023-11-24T07:55:03Z

numpy/lib/_arraysetops_impl.py


    """
+    conv = _array_converter(ary)
+    ary, = conv.as_arrays(subok=True)


Maybe ary = conv.as_arrays(subok=True)[0].ravel().

It does feel one should just be able to index conv (effectively making it a bit like a subclass of tuple).
Then it would be ary = conv[0].ravel().

mhvk · 2023-11-24T08:04:12Z

numpy/lib/_shape_base_impl.py

-        # matrices have to be transposed first, because they collapse dimensions!
-        out_arr = transpose(buff, buff_permute)
-        return res.__array_wrap__(out_arr)
+    res = transpose(buff, buff_permute)


For subclasses other than matrix, this would be an API change if .transpose() is overridden. That seems rather unlikely (and wrong-headed), so I'm fine with making the change regardless...

seberg

I have addressed all the comments and changed the default for subok (I was torn on that initially, but I made my peace with it).

I allow indexing now (which also allows a, b = conv unpacking, although calling the method is actually a tiny bit faster (the reason is that unpacking checks conv[length] to trigger the index error).

I have a start on reorganizing the ufuncs to share some infrastructure for array-wrap at least. But decided that shouldn't stop us here, since it mostly means moving the find_wrap function out to share it and the ufunc changes are not quite utterly trivial.

numpy/_core/function_base.py

seberg · 2023-11-24T13:43:19Z

numpy/_core/src/multiarray/creation_helper.c

+            res_item = item->object;
+        }
+        else {
+            res_item = (PyObject *)item->array;


yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

seberg · 2023-11-24T13:44:33Z

numpy/_core/src/multiarray/creation_helper.c

+            Py_INCREF(item->descr);
+        }
+        else {
+            /* just clear the flag again */


I also leave descr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

seberg · 2023-11-28T11:35:51Z

numpy/_core/src/multiarray/creation_helper.c

+    }
+    else {
+        /* 
+        * `__array_wrap__` expects an array argument, so we ensure that.


Well, I think that scalars remain common in some code paths (for the time being at least). But rephrased the comment.

seberg · 2023-11-28T11:42:01Z

numpy/_core/src/multiarray/creation_helper.c

+creation_helper_dealloc(PyArrayCreationHelperObject *self)
+{
+    creation_item *item = self->items;
+    for (int i = 0; i < self->narrs; i++) {


Ouch, luckily, probably the reason the sanitizer job failed...

numpy/lib/_arraysetops_impl.py

seberg · 2023-11-28T11:54:12Z

numpy/_core/src/multiarray/creation_helper.c

+creation_helper_as_arrays(PyArrayCreationHelperObject *self,
+        PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames)
+{
+    npy_bool subok = NPY_FALSE;


OK, let me change it. In practice it isn't a big change. For a bit I was thinking "why asanyarray at all?!
I am not actually sure it is more useful for most places, but since the conv.wrap() is a no-op (normally) for those now, I also don't really care to prefer it.

The point for asanyarray() is allowing functions to work due to composition of other functions/methods working out, when simple wrapping might not propagate info well.
In practice, I do wonder how often that is actually well advised (compared to using __array_function__ and dealing with it manually). The point being: if the class is not trivial enough for just wrapping, maybe it should use __array_function__. OTOH, I can see a small amount of pragmatism in it, so fine with preferring it.

mhvk

Looks mostly great now! Just a few remaining comments.

One comment: with the indexing giving any array, I think it is actually OK to have subok=False as default for as_arrays(). Though also completely happy to keep it as you have it now - in principle, subclasses should behave like regular ones, so it should be fine to keep them (and it is nice if the default actually corresponds to the least amount of work, which is the case for subok=True since you are not viewing as a regular array).

numpy/_core/function_base.py

numpy/_core/src/multiarray/array_converter.c

mhvk · 2023-11-28T20:34:52Z

numpy/_core/src/multiarray/array_converter.c

+        }
+    }
+
+    Py_INCREF(self);


Are you sure you need the INCREF here? PyObject_NewVar returns a new reference already (which you DECREF in the fail path).

Right, thanks! Must have been some remnant of some debugging confusion...

mhvk · 2023-11-28T20:39:44Z

numpy/_core/src/multiarray/array_converter.c

+        return NULL;
+    }
+
+    /* When we only have scalars, the default is to retur an array: */


retur -> return

numpy/lib/_function_base_impl.py

mhvk · 2023-11-28T21:06:02Z

p.s. On the asanyarray, beyond the rationale that subclasses should behave the same way and thus not be converted (of course, they don't always do...), it is certainly nice if generally they get passed on, since then __array_ufunc__ coverage often is good enough for functions to work. For Quantity at least, there are still a lot of functions where we do not have to override; see https://github.com/astropy/astropy/blob/main/astropy/units/quantity_helper/function_helpers.py (the SUBCLASS_SAFE_FUNCTIONS set; as you'll notice, there are also a lot that do need overrides, which means a very long list of rather boilerplate code that is little fun to write, so all the better if some functions don't need it!)

seberg

Yeah, about SUBCLASS_SAFE_FUNCTIONS: it would be great if we could formalize this in some form and document the pattern. But the best would be to generalize it to non-subclasses somehow, which might be just calling func._implementation() rather than ndarray.__array_function__.

with the indexing giving any array, I think it is actually OK to have subok=False as default for

I agree more with the latter part, unless we had asanyarray also: align default and short-hand. Plus, if you want a real nudge toward subok having the opposite default is no good.

The main reason for not doing it was maybe always matrices (and probably historic). But we should hopefully worry less and less about those.

Also added a start for documentation.

numpy/_core/src/multiarray/array_converter.c

seberg · 2023-11-29T07:10:46Z

numpy/_core/src/multiarray/array_converter.c

+        }
+    }
+
+    Py_INCREF(self);


Right, thanks! Must have been some remnant of some debugging confusion...

numpy/_core/function_base.py

numpy/lib/_function_base_impl.py

ngoldbaum

I looked this over and spotted a few minor things but no showstoppers.

ngoldbaum · 2023-11-29T19:05:17Z

numpy/_core/src/multiarray/ctors.c

+ * This means it was:
+ * - Recognized as a scalar by a/the dtype.  This can be DType specific,
+ *   for example a tuple may be a scalar, but only for structured dtypes.
+ * - Anything not recognized a scalar (mapped to a DType) but also not


Suggested change

* - Anything not recognized a scalar (mapped to a DType) but also not

* - Anything not recognized as an instance of a DType's scalar type but also not

I got tripped up on the original wording - take it or leave it, but if you leave it, it should be recognized as a scalar

numpy/_core/src/multiarray/array_converter.c

ngoldbaum · 2023-11-29T19:20:10Z

numpy/_core/_add_newdocs.py

+    Helper to convert one or more objects to arrays.  Integrates machinery
+    to deal with the ``result_type`` and ``__array_wrap__``.
+
+    The reason for this is that e.g. ``result_type`` needs to covert to arrays


Suggested change

The reason for this is that e.g. ``result_type`` needs to covert to arrays

The reason for this is that e.g. ``result_type`` needs to convert to arrays

mhvk

I think what's left is only typos and some suggestions for the new docstrings!

mhvk · 2023-11-30T15:50:14Z

numpy/_core/_add_newdocs.py

+    wrap(arr, /, to_scalar=None)
+
+    Call ``__array_wrap__`` on the input ``arr``.  When ``arr`` is already
+    the correct subclass, this may be a no-op.


I think one should be explicit about what is done: "this may be a no-op" -> "no wrapping is done". Or maybe rewrite the note:

Call ``__array_wrap__`` on ``arr`` if ``arr`` is not the same subclass as the input the ``__array_wrap__`` method was retrieved from.

mhvk · 2023-11-30T16:00:49Z

numpy/_core/_add_newdocs.py

+add_newdoc('numpy._core._multiarray_umath', '_array_converter', ('scalar_input',
+    """
+    A tuple which is ``True`` if the original inputs were scalars and
+    ``False`` otherwise.  Scalars are all objects which are coerced to a


How about

A tuple which indicates for each input whether it was a scalar that was coerced to a 0-D array (and was not already an array or something converted via a protocol like ``__array__()``).

mhvk · 2023-11-30T16:01:22Z

numpy/_core/_add_newdocs.py

+    ``False`` otherwise.  Scalars are all objects which are coerced to a
+    0-D array but are not already arrays or converted via one of the protocols.
+
+    .. note::


My sense would be to delete this note: it probably adds more confusion than it solves...

mhvk · 2023-11-30T16:05:36Z

numpy/_core/_add_newdocs.py

+    ----------
+    arr : ndarray
+        The object to be wrapped.  Ideally, this is always an ndarray
+        or subclass, although NumPy scalars are accepted as of now by


How about

The object to be wrapped. Normally an ndarray or subclass, although for backward compatibility NumPy scalars are also accepted (these will be converted to a NumPy array before being passed on to the ``__array_wrap__`` method).

mhvk · 2023-11-30T16:06:12Z

numpy/_core/_add_newdocs.py

+        When ``True`` will convert a 0-d array to a scalar via ``result[()]``
+        (with a fast-path for non-subclasses).  If ``False`` the result should
+        be an array-like (as ``__array_wrap__`` is free to return a non-array).
+        By default (``None``), a scalar is returned if all inputs are scalar.


are scalar -> were scalar

numpy/_core/src/multiarray/array_converter.c

mhvk · 2023-11-30T16:13:33Z

numpy/_core/src/multiarray/array_converter.c

+        PyObject *obj = self->items[i].object;
+        if (PyArray_CheckExact(obj)) {
+            if (wrap == NULL || priority < 0) {
+                Py_INCREF(Py_None);


Strictly, you don't need to set the wrap here, given the if (wrap == NULL) clause at the end. (same for scalar)

numpy/lib/_arraysetops_impl.py

numpy/lib/_function_base_impl.py

mhvk · 2023-11-30T16:20:02Z

p.s. Probably time for some squashing of commits too...

seberg · 2023-11-30T17:44:35Z

I don't want to jinx this because of it, but the [()] logic for subclasses doesn't work for current memmap (which incorrectly always does it).
Now that is maybe even never used right now, but it makes me wonder if we need another approach, which might mean passing scalar=True in __array_wrap__!?

mhvk · 2023-11-30T18:26:23Z

I'm a bit confused: why would it not work for memmap? It doesn't hurt to do [()] twice, does it? (I do agree we can and perhaps should adjust memmap.__array_wrap__ to not create scalars itself, but that seems independent of this PR).

That said, you are right that in principle it would be better to give __array_wrap__ a chance to react to whether or not a scalar is expected, instead of using a heuristic as you have now. And I guess 2.0 is the right time for such an API change. Though on the other hand, __array_wrap__ still is somewhat out-of-date API and generally classes have to be able to deal with [()] anyway. So, perhaps we do not need to worry?

ngoldbaum · 2023-11-30T18:31:54Z

see #25278 which was reported with impressive timing for this PR

mhvk · 2023-12-19T22:25:59Z

I'm still not quite sold on return_scalar, but I'm also realizing I've lost what little overview I had of the problems that are caused by the various steps. This makes me wonder if it would be better to have a PR first that creates a helper that simply keeps the status quo, i.e., makes no attempt to be clever on wrapping, just reproducing the __array_wrap__ calls that are present now.

…r`` (#25409) This reorganize how array-wrap is called. It might very mildly change the semantics for reductions I think (and for negative priorities). Overall, it now passes a new return_scalar=False/True when calling __array_wrap__ and deprecates any array-wrap which does not accept arr, context, return_scalar. I have not integrated it yet, but half the reason for the reorganization is to integrate it/reuse it in the array_coverter helper PR (gh-24214), which stumbled over trying to make the scalar handling sane. Forcing downstream to add return_scalar=False to the signature is a bit annoying, but e.g. our memory maps currently try to guess at it, which seems bad. I am hoping, this can be part to making the scalar vs. array return more sane. But, maybe mainly, I hope it consolidates things (together with gh-24124 mainly, as if ufuncs were the only complex place we used this, it wouldn't matter much). --- * API: Reorganize `__array_wrap__` and add `return_scalar=False` This also deprecates any `__array_wrap__` which does not accept `context` and `return_scalar`. * BUG: Fix niche bug in rounding. * MAINT: Adjust __array_wrap__ in code and tests (also deprecation test) * MAINT: Use/move the simplest C-wrapping also * DOC: Update doc and add release note * STY: Make linter happy in old tests * MAINT: Silence GCC warning (value cannot really be used uninitialized) * MAINT: Small style fixes * BUG: Fix reference leak in ufunc array-wrapping This probably doesn't fix the 32bit issue unfortunately, only the windows ones... * BUG: Fix leak for result arrays in all ufunc calls * Ensure we try passing context and address related smaller review comments * Ensure we don't try `context=None` and expand code comment * Rely on return_scalar always being right (and style nit) * Remove outdated comments as per review * Let's just undo force-wrap for now for reductions (its a change...) * ENH: Chain the original error when the deprecationwarning is raised Doing this due to gh-25635 since it is super confusing with the bad retrying... * BUG,MAINT: Address Martens review comments

This helper allows dealing with result_type and array wrapping without explicitly holding on to the original objects and especially potentially converting them many times.

This doesn't quite fix all places, but it does most (most prominently linalg also does some quite awkward wrapping).

seberg · 2024-01-20T21:36:37Z

@mhvk and @ngoldbaum, I merged main (i.e. the array-wrap helper) and applied all the remaining doc suggestions (thanks, they were great!). Then squashed it down to three commits.

I think this should be pretty much ready now, since IIRC it was relatively settled. But happy to do another round if things come up of course.

mhvk

This looks good! Looking at it again, I mainly wonder about methods and arguments that are defined but not actually used. Arguably, we should not have those without a clear use case... But as long as this is private, perhaps it is OK?

mhvk · 2024-01-20T22:55:54Z

numpy/_core/_add_newdocs.py

    its memory, in which case you can traverse ``a.base`` for a memory handler.
    """)

+add_newdoc('numpy._core._multiarray_umath', '_array_converter',


I wondered a bit about whether this needs to be private, as it seems generally useful. But I guess this allows possible changes before making it public?

mhvk · 2024-01-20T22:57:47Z

numpy/_core/fromnumeric.py

-        result = wrap(result)
-    return result
+    conv = _array_converter(obj)
+    # As this already tried the method, subok is maybe quite reasonable here:


Maybe add # but this follows what was done before. TODO: revisit this. Or something else that makes perhaps clearer that it would be good to make this change (even if best not to do so in this PR!).

mhvk · 2024-01-21T00:12:21Z

numpy/_core/src/multiarray/array_converter.c

@@ -0,0 +1,471 @@
+/*
+ * This file defines an _array_converter object used internally to NumPy


"used internally in Numpy to [deal ...]"

mhvk · 2024-01-21T00:18:16Z

numpy/_core/src/multiarray/array_converter.c

+        item->DType = NPY_DTYPE(PyArray_DESCR(item->array));
+        Py_INCREF(item->DType);
+
+        if (npy_mark_tmp_array_if_pyscalar(


Could this be if (item->scalar_input && npy_mark...)? If so, maybe combine with l.93-95,

if (item->scalar_input) { if (npy_mark...) { ... } else { ... } } else { /* non-scalar input */ self->flags &= ~(NPY_CH_ALL_PYSCALARS | NPY_CH_ALL_SCALARS); }

Yes, that is correct, and I adopted it, but not as suggested here. The pretty-code annoyance is that item->descr needs to be dealt with, so whatever you do, you have some duplication.

But I agree, it isn't super nice to split this from L93, and maybe easier to grok if it is clear that npy_mark_tmp_array_if_pyscalar does indeed require scalar_input.

mhvk · 2024-01-21T00:25:51Z

numpy/_core/_add_newdocs.py

+    float, or complex.
+    """)
+
+add_newdoc('numpy._core._multiarray_umath', '_array_converter', ('scalar_input',


I think scalar_input is not used anywhere in the code. Do you have cases in mind where it would be useful? It sort-of feels like it should be, but if there is no use, should it be exposed? (Fine to leave in, just asking.)

Also, right now it is a property. Might it make sense to make it a method (with no arguments), so that one could extend it if needed? (E.g., to ask not just for scalar, but python scalar inputs.)

But perhaps all this just argues for keeping _array_converter private for now, so we can see what is best!

The linter says this line is one character too long

I didn't think we cared, but fair, pushed the fix (inclued it in the previous commit).

mhvk · 2024-01-21T00:42:32Z

numpy/_core/src/multiarray/array_converter.c

+        return NULL;
+    }
+
+    /* When we only have scalars, the default is to return an array: */


I was again a bit confused about this, maybe it helps to say "We follow the default setting for as_arrays of CONVERT_IF_NO_ARRAY"

mhvk · 2024-01-21T00:51:40Z

numpy/_core/_add_newdocs.py

+    ----------
+    subok : True or False, optional
+        Whether array subclasses are preserved.
+    pyscalars : {"convert", "preserve", "convert_if_no_array"}, optional


This argument is not used anywhere, so a bit of the same question as for scalar_input above. In particular, I wonder if it wouldn't be handier to pass this on input (perhaps still allowing to override it here). But without a use case, probably not worth overthinking it now!

seberg · 2024-01-21T13:04:45Z

Addressed the smaller comments, thanks.

The API issues are not very clear to me. I really like this type of feature, but it is quite fuzzy to me how it should look like and which details are useful (enough).

That is why I opted for private for now: I am not comfortable with saying that this API seems good (enough) unless others give very clear +1's.

Generally, I would like this or something similar to be public, because I think it can be generally useful and also specifically because I think it is more useful with NEP 50 if downstream wants to be pedantic.

From a maintenance burden I am not super worried even if we deprecate it again soonish, we would name squat np.lib.array_utils.array_converter, though.

If we keep things private, I somewhat like just keeping those odd features. I can see the scalar_input to be useful at least in niches like a python searchsorted like:

conv = array_converter(haystack, needle)
haystack, needle = conv

# The needle decides whether a scalar result makes sense:
return conv.wrap(result, to_scalar=conv.scalar_input[1])

I am not sure that we will use it (in fact I doubt it in the near future), but I also like the idea of "advertising" that this is possible.
(Part of why it is there, was that it is very easy and I also thought it may be useful for the python scalar preservation, which the conversion already does now and it seems to work well.)

If we lean towards public API, I fully agree to ditch them all until it is very clear we or downstream use them. Those odd features shouldn't stop us from making this public!

mhvk

OK, code looks good now! On the API, I'm fine with keeping it private for now and seeing what ends up being useful and what not. I'll approve now, but won't merge to give @ngoldbaum a chance to look.

p.s. Note that the linter failed.

ngoldbaum · 2024-01-21T18:29:04Z

I'll look at this tomorrow.

seberg · 2024-01-22T08:59:59Z

FWIW, I rebased my try-0d-preservation branch, and ended up using those features (not I am sure alternatives wouldn't be better). It actually seems surprisingly close to doing something reasonable.

ngoldbaum

I looked over the C code again and didn't spot any issues. Please push a fixup for the linter then this should be good to go.

ngoldbaum · 2024-01-22T15:05:58Z

numpy/_core/_add_newdocs.py

+    float, or complex.
+    """)
+
+add_newdoc('numpy._core._multiarray_umath', '_array_converter', ('scalar_input',


The linter says this line is one character too long

ngoldbaum · 2024-01-22T17:03:56Z

I figure it's better to keep the linter happy so people don't get surprise linter corrections on future PRs. Thanks!

github-actions bot added the 01 - Enhancement label Jul 19, 2023

seberg changed the title ~~ENH: Create helper for conversion to arrays~~ DISCUSS,ENH: Create helper for conversion to arrays Jul 19, 2023

seberg mentioned this pull request Jul 22, 2023

BUG: np.geomspace() raises AttributeError if stop >= 2 ** 64 #24194

Open

seberg force-pushed the creation-helper branch from 2e69d9c to 2eeb2c5 Compare November 23, 2023 10:59

seberg commented Nov 23, 2023

View reviewed changes

mhvk reviewed Nov 24, 2023

View reviewed changes

seberg commented Nov 28, 2023

View reviewed changes

seberg marked this pull request as ready for review November 28, 2023 14:12

seberg force-pushed the creation-helper branch from 8361de0 to 9003c94 Compare November 28, 2023 14:23

mhvk reviewed Nov 28, 2023

View reviewed changes

seberg force-pushed the creation-helper branch from f40fa4a to 2ad57cf Compare November 29, 2023 12:00

seberg commented Nov 29, 2023

View reviewed changes

seberg changed the title ~~DISCUSS,ENH: Create helper for conversion to arrays~~ ENH: Create helper for conversion to arrays Nov 29, 2023

seberg added this to the 2.0.0 release milestone Nov 29, 2023

ngoldbaum reviewed Nov 29, 2023

View reviewed changes

mhvk reviewed Nov 30, 2023

View reviewed changes

seberg mentioned this pull request Dec 8, 2023

API: Return type of isin() for scalar inputs #25290

Open

seberg mentioned this pull request Dec 17, 2023

API,MAINT: Reorganize array-wrap calling and introduce return_scalar #25409

Merged

seberg force-pushed the creation-helper branch from 2ad57cf to 397445f Compare December 19, 2023 10:34

seberg force-pushed the creation-helper branch from 397445f to ccab1af Compare January 20, 2024 21:21

seberg added 3 commits January 20, 2024 22:29

MAINT: Allow PyArray_FromAny_int to signal if the input was a scalar

27fa00b

ENH: Add private array_conversion helper

b194bdd

This helper allows dealing with result_type and array wrapping without explicitly holding on to the original objects and especially potentially converting them many times.

MAINT: Use new converter to fix promotions and avoid manual wrapping

04444cd

This doesn't quite fix all places, but it does most (most prominently linalg also does some quite awkward wrapping).

seberg force-pushed the creation-helper branch from ccab1af to 04444cd Compare January 20, 2024 21:34

mhvk reviewed Jan 21, 2024

View reviewed changes

mhvk approved these changes Jan 21, 2024

View reviewed changes

ngoldbaum reviewed Jan 22, 2024

View reviewed changes

MAINT: (array_converter) Address smaller review comments by Marten

2264d77

seberg force-pushed the creation-helper branch from 8428655 to 2264d77 Compare January 22, 2024 16:42

ngoldbaum merged commit 55afe35 into numpy:main Jan 22, 2024

	* - Anything not recognized a scalar (mapped to a DType) but also not
	* - Anything not recognized as an instance of a DType's scalar type but also not

	The reason for this is that e.g. ``result_type`` needs to covert to arrays
	The reason for this is that e.g. ``result_type`` needs to convert to arrays

		@@ -0,0 +1,471 @@
		/*
		* This file defines an _array_converter object used internally to NumPy

Uh oh!

Conversation

seberg commented Jul 19, 2023

Uh oh!

mhvk commented Jul 19, 2023

Uh oh!

mdhaber commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jul 20, 2023

Uh oh!

rgommers commented Jul 20, 2023

Uh oh!

seberg commented Jul 20, 2023

Uh oh!

mhvk commented Jul 20, 2023

Uh oh!

mdhaber commented Aug 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Aug 12, 2023

Uh oh!

mdhaber commented Aug 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Aug 12, 2023

Uh oh!

mdhaber commented Aug 12, 2023

Uh oh!

seberg commented Aug 12, 2023

Uh oh!

seberg commented Nov 3, 2023

Footnotes

Uh oh!

seberg Nov 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mdhaber commented Jul 19, 2023 •

edited

Loading

mdhaber commented Aug 12, 2023 •

edited

Loading

mdhaber commented Aug 12, 2023 •

edited

Loading

seberg Nov 23, 2023 •

edited

Loading