Skip to content

ENH: Create helper for conversion to arrays#24214

Merged
ngoldbaum merged 4 commits intonumpy:mainfrom
seberg:creation-helper
Jan 22, 2024
Merged

ENH: Create helper for conversion to arrays#24214
ngoldbaum merged 4 commits intonumpy:mainfrom
seberg:creation-helper

Conversation

@seberg
Copy link
Member

@seberg seberg commented Jul 19, 2023

Had to give this a start to be able to experiment. It does solve at least some problems (e.g. np.select() use-case and it looks like it would fix and/or simplify linalg code).

The main thing is that we often convert multiple arrays, and then want something like:

  • a, b, c = np.asarray(a), np.asarray(b), np.asarray(c) and then dtype = np.result_type(a, b, c) often with an additional 0.0. That is a pretty common pattern, NEP 50 makes it tricky if you wish to support weak scalars, because you need to call result_type with the scalar information intact (before the asarray or manually undo). (For example, this happens in np.select())
    • Maybe libraries can disregard that, but for NumPy it's hard, since you want many functions to be ufunc-like, so if ufuncs do it, others probably should be able to/try to.
  • Maybe mainly internally we sometimes need the __array_wrap__ for subclass support, although it seems consistently used for single input functions (outside ufuncs).
    • Additional motif: A clean wrapping pattern seems required for making scalar logic sane.
  • It may be useful to know if an input was a scalar or not (not a 0-D array). You need to ask NumPy about it, but it discovered at conversion time. So it is an information that is weird to tag onto an array (probably?) but figuring it out later would require duplicating work.

First, I wanted to just make a single function. But with multiple things above that may be interesting to return, and e.g. result_type() might error and I am not sure we always need it.

So, is it insane to create a rich object (rather than a complicated return value and many kwargs):

# better names of course, currently in `np.core._multiarray_umath.ArrayCreationHelper`
arrays = creation_helper(*inputs)

# e.g. a linspace could do something like this:
dtype = arrays.common_dtype(ensure_inexact=True)

arrays.arrays  # arrays as tuple (probably allow arrays[0])

# Apply the typical wrapping if needed
# (should cache wrap, or fetch it early on rather, maybe just
# make it a method which applies the wrapping `arrays.wrap(result)`).
if arrays.wrap is not None:
    return arrays.wrap(result)
return result 

(I can see a, b, info = helper(a, b) as a nice pattern; OTOH it makes including casting kwarg heavy since you somewhat need to include it in the first step.)

Further things might include:

  • Casting to a given or the common dtype. (maybe arrays.to_common_dtype(ensure_inexact=True))
  • I think @mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

@seberg seberg changed the title ENH: Create helper for conversion to arrays DISCUSS,ENH: Create helper for conversion to arrays Jul 19, 2023
@mhvk
Copy link
Contributor

mhvk commented Jul 19, 2023

I like this idea. Following the broadcasting suggestion, it may be nice to try to be reasonably compatible with what ufuncs do generally. I guess that might include an optional out argument (and corresponding out attribute) or perhaps instead a method output_arrays(out=None) that would create the array with the right dtype and subclass, or check if it is OK if an array is given. Taking this perhaps too far, one could similarly have an axis argument/method as well (which would just use normalize_axis on the broadcasted shape under the hood).

p.s. Might it be possible to factor out the ufunc machinery (rather that write this from scratch)? Or indeed use something like this inside the ufuncs?

@mdhaber
Copy link
Contributor

mdhaber commented Jul 19, 2023

I think @mdhaber was hoping to include broadcasting. Not sure, since it is one call away, but also seems fine to add.

That would be nice. It seems like a common enough follow-up step that it would be a convenient addition to this convenience function.

So, is it insane to create a rich object...

We do this frequently in SciPy now, so not insane. Seems unusual for NumPy, but I wouldn't complain.

  1. Idea: maybe instead of an extra parameter ensure_inexact=True, could extra_dtype accept classes like np.inexact, if that makes sense?

  2. I'm not used to reading C. Would it be easy to add a docstring or the equivalent sequence of existing NumPy commands so I can check my understanding of what is going on?

  3. We had also discussed the possibility of adding axis to be ignored when broadcasting for use in, say, multi-sample reduction statistics. For example, in ttest_ind, it is sensible for x.shape == (5,) and y.shape == (3, 4) with axis=-1; when the arrays are broadcasted, the shapes become (3, 5) and (3, 4). I implemented this already in SciPy, but should I open an issue about it being moved upstream?

Thanks for this!

@seberg
Copy link
Member Author

seberg commented Jul 20, 2023

Yeah, agree that the ufunc machinery should share as much as possible, although was a bit hoping to mainly consolidate Python paths first. The reason was that looking at the array-wrap implementation in ufuncs they are messy and buggy (there is an issue), so fixing that seems like a bit of a messy change easier on its own.

I think a method for output array prep could make sense, adding it to the initial call seems like it would overload meaning (we don't know the common dtype is right for one thing).

Idea: maybe instead of an extra parameter ensure_inexact=True, could extra_dtype accept classes like np.inexact, if that makes sense?

Right, that is how it is implemented. Just dtype=np.inexact doesn't quite work yet and I thought the result_type(..., 0.) pattern is so common, it might actually make sense to have it explicitly.

  1. [discussed] adding axis to be ignored when broadcasting for use in, say, multi-sample reduction statistics [...]

Ah, right! Yeah, maybe make its own issue, it seems to me we should with integrating that into np.broadcast_arrays first. Admittedly, I think that may require reimplementing the whole function (maybe for better).

@rgommers
Copy link
Member

@seberg from your description it is not entirely clear to me if you mean for this to be a public or private function? With the namespace we discussed somewhere else, np.lib.array_utils I believe, we should get a place to put utilities like this - but I can't tell from the description yet if this would be useful more widely.

Since it was mentioned above, we recently came across the need for normalize_axis_index in SciPy - that's also a useful utility, which is not public right now.

@seberg
Copy link
Member Author

seberg commented Jul 20, 2023

The first issue I want to solve would mean private is fine, but yes suspect we should just make it public and if we are unsure about some methods/attributes or so give it an _ (which would mean putting it into np.lib.array_utils namespace, or different name if anyone suggests a better one soon).
(And @mhvk's interest makes me lean more towards public at last if I can get some of what @mdhaber wants too.)

@mhvk
Copy link
Contributor

mhvk commented Jul 20, 2023

Agreed that the output makes more sense as a separate method, rather than on input, especially since in most cases something has to be generated.

Also, 👍 to making normalize_axis_index a publicly accessible tool (we already import it...); probably all of shape_base can go into the same collection too.

p.s. A better name than creation_helper would still be good. Will try to think a bit more.

p.p.s. You'll not be surprised to hear that I think that by default subclasses should be passed through... But it may be good to think a bit more generally what to do with array mimics - the benefit of getting result_type depends only on having a dtype, not on being an ndarray (subclass) (and similarly for a possible broadcast shape). Though easiest perhaps for that is just to let this be overridden by __array_function__.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 12, 2023

I just ran into this little gem , which might have been avoided with use of a helper like this.

np.float16([2]) < np.float32(2.0009766)  # array([False])
np.float16([2]) < np.float32([2.0009766])  # array([ True])

@seberg
Copy link
Member Author

seberg commented Aug 12, 2023

You might think that helps, but it probably wouldn't :). This is NEP 50 stuff.

@mdhaber
Copy link
Contributor

mdhaber commented Aug 12, 2023

If before comparison the float16 is promoted to the result type or if the float32 is broadcasted to the result shape, the comparison is done correctly. So I think the helper would help with those things.

@seberg
Copy link
Member Author

seberg commented Aug 12, 2023

@mdhaber no, np.result_type() would do the same down promotion in the above case until you do NEP 50 .

@mdhaber
Copy link
Contributor

mdhaber commented Aug 12, 2023

I always pass the dtypes into result_type instead of the arrays.

import numpy as np
x = np.float16([2])
y = np.float32(2.0009766)
print(x < y)
x1, y1 = np.broadcast_arrays(x , y)
print(x1 < y1)
dtype = np.result_type(x.dtype, y.dtype)
print(x.astype(dtype) < y.astype(y))
[False]
[ True]
[ True]

@seberg
Copy link
Member Author

seberg commented Aug 12, 2023

Right, if you pass the dtypes you get the right thing, but that is the same as adopting NEP 50.

@seberg
Copy link
Member Author

seberg commented Nov 3, 2023

I was trying to get back to this a bit, but having difficulties to quite formulate what we need/want here. I would prefer to not go the feature-creep route for starters.

Things that I would like:

  • Allow converting multiple arrays at once more conveniently (easy).
  • The result should have common_dtype() as it is pretty useful especially with NEP 50.
  • The result should have a wrap() to apply __array_wrap__() when it makes sense. 1
  • We could have more options:
    • shape (braodcast shape)
    • A broadcast() method to modify the "arrays"` (probably needs a local implementation for starters, which then might be adoptable elsewhere). Broadcasting could be a kwarg also.

There are some options, which would probably fit as kwargs 2 :

  • subok=True (to set it to False)
  • broadcasting could be a kwarg.

Another API, I worry about more: it would be nice to provide a clean way to preserve Python better. This is to allow later normal promotion to use NEP 50 rules.
Due to the isclose and similar use-case, it also seems to me that in many cases it would make sense for broadcasting to not modify scalars (or 0-D objects).
For that, it would make sense to either have a kwarg preserve_scalars=True or an attribute arrays_or_scalars?

Maybe one solution is to make it:

conv = array_converter(*arrays)
dt = conv.common_dtype()  # always applicable, may do initial conversion to arrays
conv.shape  # could do this to get the broadcast shape, but could also defer.
conv.was_pyscalar  # maybe.  Or an enum with `0-D, scalar, python scalar`.

arrays = conv.arrays(subok=True, allow_scalars=True, broadcast=True, dtype=dt)

# Wrap result (if wanted) (see footnote 2) [^2]
return conv.wrap(result)

Any thoughts? Implementing any of this isn't super hard, but getting more clarity on where to start without a feature creap, is for me.


I think, I like that last thought as API (initialization takes no kwargs, but fetching the arrays is a function and not an attribute/property). But it isn't quite clear to me. Especially how to do scalar preservation.

Footnotes

  1. We currently mainly/only seem to use wrap if subok=False, so the details here are a bit tricky (should you ever call it when you already got a subclass?).
    However, one reason I would like a clear entry-point is that it would help a lot thinking about ENH: No longer auto-convert array scalars to numpy scalars in ufuncs (and elsewhere?) #24897, since it could take a kwarg like preserve_scalars=True.

  2. kwargs are slow for classes due to tp_new pre Python 3.13, but lets not worry about it

stop = asanyarray(stop) * 1.0
conv = _array_converter(start, stop)
start, stop = conv.as_arrays(subok=True)
dt = conv.result_type(ensure_inexact=True)
Copy link
Member Author

@seberg seberg Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterated a bit, this is still a bit rough (I would like to consolidate __array_wrap__ a bit at least).

This does define conv.wrap() as the one true way to call __array_wrap__, but for NEP 50, the interesting part is really this part.
(I haved also moved the cast to float into the subtract via dtype=, which isn't strictly necessary.)

The point is, that this means np.linspace(np.float32(3), 10) returns float32. (yes, still needs tests, but thought I would note this, as this pattern would hopefully be useful a few places, that are currently subtly broken in the same way.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhvk beyond cleanup, part of my thought for consolidating is that it should help with gh-24897. Because wrapping is a reasonable place to decide scalar vs. array logic.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - now had a better look. I think this is really nice and it should be both relatively straightforward to extend and to use some of it inside the ufuncs.

One suggestion: it might be nice if conv can be indexed like a tuple. That should then return the "anyarray" version (in which case the default subok=False for as_arrays() makes sense; without this, I feel subok=True would be a more logical default).

If not doing that, I would suggest to replace ary, = conv.as_arrays() with ary = conv.as_arrays()[0] for clarity (the trailing comma before the equal sign is easy to miss...).

Py_ssize_t narrs_ssize_t = (args == NULL) ? 0 : PyTuple_GET_SIZE(args);
int narrs = (int)narrs_ssize_t;
/* Limit to 64 for now to use in a few places. */
if (narrs_ssize_t > 64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More logical to use NPY_MAXARGS?

item->DType = NPY_DTYPE(PyArray_DESCR(item->array));
Py_INCREF(item->DType);

if (!npy_mark_tmp_array_if_pyscalar(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd swap the branches to avoid the !

Py_INCREF(item->descr);
}
else {
/* just clear the flag again */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, swap only if this is really not needed! This would seem to preclude using it later, is that intended? Or is ->was_scalar exactly the same (i.e., python scalars only)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also leave descr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

res_item = item->object;
}
else {
res_item = (PyObject *)item->array;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably premature optimization, but it would seem logical not to even create the array from a python scalar unless it is actually needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

creation_helper_dealloc(PyArrayCreationHelperObject *self)
{
creation_item *item = self->items;
for (int i = 0; i < self->narrs; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need an item++ here too! (or do self->items[i])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, luckily, probably the reason the sanitizer job failed...

creation_helper_as_arrays(PyArrayCreationHelperObject *self,
PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames)
{
npy_bool subok = NPY_FALSE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will replace more asanyarray than asarray, so maybe make the default NPY_TRUE? In general, subclasses should be OK...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me change it. In practice it isn't a big change. For a bit I was thinking "why asanyarray at all?!
I am not actually sure it is more useful for most places, but since the conv.wrap() is a no-op (normally) for those now, I also don't really care to prefer it.

The point for asanyarray() is allowing functions to work due to composition of other functions/methods working out, when simple wrapping might not propagate info well.
In practice, I do wonder how often that is actually well advised (compared to using __array_function__ and dealing with it manually). The point being: if the class is not trivial enough for just wrapping, maybe it should use __array_function__. OTOH, I can see a small amount of pragmatism in it, so fine with preferring it.


"""
conv = _array_converter(ary)
ary, = conv.as_arrays(subok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ary = conv.as_arrays(subok=True)[0].ravel().

It does feel one should just be able to index conv (effectively making it a bit like a subclass of tuple).
Then it would be ary = conv[0].ravel().

# matrices have to be transposed first, because they collapse dimensions!
out_arr = transpose(buff, buff_permute)
return res.__array_wrap__(out_arr)
res = transpose(buff, buff_permute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For subclasses other than matrix, this would be an API change if .transpose() is overridden. That seems rather unlikely (and wrong-headed), so I'm fine with making the change regardless...

Copy link
Member Author

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed all the comments and changed the default for subok (I was torn on that initially, but I made my peace with it).

I allow indexing now (which also allows a, b = conv unpacking, although calling the method is actually a tiny bit faster (the reason is that unpacking checks conv[length] to trigger the index error).


I have a start on reorganizing the ufuncs to share some infrastructure for array-wrap at least. But decided that shouldn't stop us here, since it mostly means moving the find_wrap function out to share it and the ufunc changes are not quite utterly trivial.

res_item = item->object;
}
else {
res_item = (PyObject *)item->array;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but that requires splitting up the creation, so I indeed thought it premature.

This is one of the things that we could do in the ufunc if and only if we delete the "promotion state" (to allow setting it to warn for NumPy 2).

Py_INCREF(item->descr);
}
else {
/* just clear the flag again */
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also leave descr empty. Se yes, it is just the same. Just wanted to go through that, because that is wehre other paths go through.

}
else {
/*
* `__array_wrap__` expects an array argument, so we ensure that.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think that scalars remain common in some code paths (for the time being at least). But rephrased the comment.

creation_helper_dealloc(PyArrayCreationHelperObject *self)
{
creation_item *item = self->items;
for (int i = 0; i < self->narrs; i++) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, luckily, probably the reason the sanitizer job failed...

creation_helper_as_arrays(PyArrayCreationHelperObject *self,
PyObject *const *args, Py_ssize_t len_args, PyObject *kwnames)
{
npy_bool subok = NPY_FALSE;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let me change it. In practice it isn't a big change. For a bit I was thinking "why asanyarray at all?!
I am not actually sure it is more useful for most places, but since the conv.wrap() is a no-op (normally) for those now, I also don't really care to prefer it.

The point for asanyarray() is allowing functions to work due to composition of other functions/methods working out, when simple wrapping might not propagate info well.
In practice, I do wonder how often that is actually well advised (compared to using __array_function__ and dealing with it manually). The point being: if the class is not trivial enough for just wrapping, maybe it should use __array_function__. OTOH, I can see a small amount of pragmatism in it, so fine with preferring it.

@seberg seberg marked this pull request as ready for review November 28, 2023 14:12
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly great now! Just a few remaining comments.

One comment: with the indexing giving any array, I think it is actually OK to have subok=False as default for as_arrays(). Though also completely happy to keep it as you have it now - in principle, subclasses should behave like regular ones, so it should be fine to keep them (and it is nice if the default actually corresponds to the least amount of work, which is the case for subok=True since you are not viewing as a regular array).

}
}

Py_INCREF(self);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you need the INCREF here? PyObject_NewVar returns a new reference already (which you DECREF in the fail path).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks! Must have been some remnant of some debugging confusion...

return NULL;
}

/* When we only have scalars, the default is to retur an array: */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retur -> return

@mhvk
Copy link
Contributor

mhvk commented Nov 28, 2023

p.s. On the asanyarray, beyond the rationale that subclasses should behave the same way and thus not be converted (of course, they don't always do...), it is certainly nice if generally they get passed on, since then __array_ufunc__ coverage often is good enough for functions to work. For Quantity at least, there are still a lot of functions where we do not have to override; see https://github.com/astropy/astropy/blob/main/astropy/units/quantity_helper/function_helpers.py (the SUBCLASS_SAFE_FUNCTIONS set; as you'll notice, there are also a lot that do need overrides, which means a very long list of rather boilerplate code that is little fun to write, so all the better if some functions don't need it!)

Copy link
Member Author

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, about SUBCLASS_SAFE_FUNCTIONS: it would be great if we could formalize this in some form and document the pattern. But the best would be to generalize it to non-subclasses somehow, which might be just calling func._implementation() rather than ndarray.__array_function__.

with the indexing giving any array, I think it is actually OK to have subok=False as default for

I agree more with the latter part, unless we had asanyarray also: align default and short-hand. Plus, if you want a real nudge toward subok having the opposite default is no good.

The main reason for not doing it was maybe always matrices (and probably historic). But we should hopefully worry less and less about those.

Also added a start for documentation.

}
}

Py_INCREF(self);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks! Must have been some remnant of some debugging confusion...

@seberg seberg changed the title DISCUSS,ENH: Create helper for conversion to arrays ENH: Create helper for conversion to arrays Nov 29, 2023
@seberg seberg added this to the 2.0.0 release milestone Nov 29, 2023
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked this over and spotted a few minor things but no showstoppers.

* This means it was:
* - Recognized as a scalar by a/the dtype. This can be DType specific,
* for example a tuple may be a scalar, but only for structured dtypes.
* - Anything not recognized a scalar (mapped to a DType) but also not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - Anything not recognized a scalar (mapped to a DType) but also not
* - Anything not recognized as an instance of a DType's scalar type but also not

I got tripped up on the original wording - take it or leave it, but if you leave it, it should be recognized as a scalar

Helper to convert one or more objects to arrays. Integrates machinery
to deal with the ``result_type`` and ``__array_wrap__``.

The reason for this is that e.g. ``result_type`` needs to covert to arrays
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The reason for this is that e.g. ``result_type`` needs to covert to arrays
The reason for this is that e.g. ``result_type`` needs to convert to arrays

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what's left is only typos and some suggestions for the new docstrings!

wrap(arr, /, to_scalar=None)

Call ``__array_wrap__`` on the input ``arr``. When ``arr`` is already
the correct subclass, this may be a no-op.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one should be explicit about what is done: "this may be a no-op" -> "no wrapping is done". Or maybe rewrite the note:

Call ``__array_wrap__`` on ``arr`` if ``arr`` is not the same subclass
as the input the ``__array_wrap__`` method was retrieved from.

add_newdoc('numpy._core._multiarray_umath', '_array_converter', ('scalar_input',
"""
A tuple which is ``True`` if the original inputs were scalars and
``False`` otherwise. Scalars are all objects which are coerced to a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

A tuple which indicates for each input whether it was a scalar that
was coerced to a 0-D array (and was not already an array or something
converted via a protocol like ``__array__()``).

``False`` otherwise. Scalars are all objects which are coerced to a
0-D array but are not already arrays or converted via one of the protocols.

.. note::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My sense would be to delete this note: it probably adds more confusion than it solves...

----------
arr : ndarray
The object to be wrapped. Ideally, this is always an ndarray
or subclass, although NumPy scalars are accepted as of now by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

The object to be wrapped. Normally an ndarray or subclass,
although for backward compatibility NumPy scalars are also accepted
(these will be converted to a NumPy array before being passed on to
the ``__array_wrap__`` method).

When ``True`` will convert a 0-d array to a scalar via ``result[()]``
(with a fast-path for non-subclasses). If ``False`` the result should
be an array-like (as ``__array_wrap__`` is free to return a non-array).
By default (``None``), a scalar is returned if all inputs are scalar.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are scalar -> were scalar

PyObject *obj = self->items[i].object;
if (PyArray_CheckExact(obj)) {
if (wrap == NULL || priority < 0) {
Py_INCREF(Py_None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly, you don't need to set the wrap here, given the if (wrap == NULL) clause at the end. (same for scalar)

@mhvk
Copy link
Contributor

mhvk commented Nov 30, 2023

p.s. Probably time for some squashing of commits too...

@seberg
Copy link
Member Author

seberg commented Nov 30, 2023

I don't want to jinx this because of it, but the [()] logic for subclasses doesn't work for current memmap (which incorrectly always does it).
Now that is maybe even never used right now, but it makes me wonder if we need another approach, which might mean passing scalar=True in __array_wrap__!?

@mhvk
Copy link
Contributor

mhvk commented Nov 30, 2023

I'm a bit confused: why would it not work for memmap? It doesn't hurt to do [()] twice, does it? (I do agree we can and perhaps should adjust memmap.__array_wrap__ to not create scalars itself, but that seems independent of this PR).

That said, you are right that in principle it would be better to give __array_wrap__ a chance to react to whether or not a scalar is expected, instead of using a heuristic as you have now. And I guess 2.0 is the right time for such an API change. Though on the other hand, __array_wrap__ still is somewhat out-of-date API and generally classes have to be able to deal with [()] anyway. So, perhaps we do not need to worry?

@ngoldbaum
Copy link
Member

see #25278 which was reported with impressive timing for this PR

@mhvk
Copy link
Contributor

mhvk commented Dec 19, 2023

I'm still not quite sold on return_scalar, but I'm also realizing I've lost what little overview I had of the problems that are caused by the various steps. This makes me wonder if it would be better to have a PR first that creates a helper that simply keeps the status quo, i.e., makes no attempt to be clever on wrapping, just reproducing the __array_wrap__ calls that are present now.

seberg added a commit that referenced this pull request Jan 20, 2024
…r`` (#25409)

This reorganize how array-wrap is called. It might very mildly change the semantics for reductions I think (and for negative priorities).

Overall, it now passes a new return_scalar=False/True when calling __array_wrap__ and deprecates any array-wrap which does not accept arr, context, return_scalar.

I have not integrated it yet, but half the reason for the reorganization is to integrate it/reuse it in the array_coverter helper PR (gh-24214), which stumbled over trying to make the scalar handling sane.

Forcing downstream to add return_scalar=False to the signature is a bit annoying, but e.g. our memory maps currently try to guess at it, which seems bad. I am hoping, this can be part to making the scalar vs. array return more sane.

But, maybe mainly, I hope it consolidates things (together with gh-24124 mainly, as if ufuncs were the only complex place we used this, it wouldn't matter much).

---

* API: Reorganize `__array_wrap__` and add `return_scalar=False`

This also deprecates any `__array_wrap__` which does not accept
`context` and `return_scalar`.

* BUG: Fix niche bug in rounding.

* MAINT: Adjust __array_wrap__ in code and tests (also deprecation test)

* MAINT: Use/move the simplest C-wrapping also

* DOC: Update doc and add release note

* STY: Make linter happy in old tests

* MAINT: Silence GCC warning (value cannot really be used uninitialized)

* MAINT: Small style fixes

* BUG: Fix reference leak in ufunc array-wrapping

This probably doesn't fix the 32bit issue unfortunately, only the windows ones...

* BUG: Fix leak for result arrays in all ufunc calls

* Ensure we try passing context and address related smaller review comments

* Ensure we don't try `context=None` and expand code comment

* Rely on return_scalar always being right (and style nit)

* Remove outdated comments as per review

* Let's just undo force-wrap for now for reductions (its a change...)

* ENH: Chain the original error when the deprecationwarning is raised

Doing this due to gh-25635 since it is super confusing with the
bad retrying...

* BUG,MAINT: Address Martens review comments
This helper allows dealing with result_type and array wrapping
without explicitly holding on to the original objects and
especially potentially converting them many times.
This doesn't quite fix all places, but it does most (most prominently
linalg also does some quite awkward wrapping).
@seberg
Copy link
Member Author

seberg commented Jan 20, 2024

@mhvk and @ngoldbaum, I merged main (i.e. the array-wrap helper) and applied all the remaining doc suggestions (thanks, they were great!). Then squashed it down to three commits.

I think this should be pretty much ready now, since IIRC it was relatively settled. But happy to do another round if things come up of course.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Looking at it again, I mainly wonder about methods and arguments that are defined but not actually used. Arguably, we should not have those without a clear use case... But as long as this is private, perhaps it is OK?

its memory, in which case you can traverse ``a.base`` for a memory handler.
""")

add_newdoc('numpy._core._multiarray_umath', '_array_converter',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered a bit about whether this needs to be private, as it seems generally useful. But I guess this allows possible changes before making it public?

result = wrap(result)
return result
conv = _array_converter(obj)
# As this already tried the method, subok is maybe quite reasonable here:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add # but this follows what was done before. TODO: revisit this. Or something else that makes perhaps clearer that it would be good to make this change (even if best not to do so in this PR!).

@@ -0,0 +1,471 @@
/*
* This file defines an _array_converter object used internally to NumPy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"used internally in Numpy to [deal ...]"

item->DType = NPY_DTYPE(PyArray_DESCR(item->array));
Py_INCREF(item->DType);

if (npy_mark_tmp_array_if_pyscalar(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be if (item->scalar_input && npy_mark...)? If so, maybe combine with l.93-95,

if (item->scalar_input) {
    if (npy_mark...) {
        ...
    }
    else {
        ...
    }
}
else {  /* non-scalar input */
    self->flags &= ~(NPY_CH_ALL_PYSCALARS | NPY_CH_ALL_SCALARS);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct, and I adopted it, but not as suggested here. The pretty-code annoyance is that item->descr needs to be dealt with, so whatever you do, you have some duplication.

But I agree, it isn't super nice to split this from L93, and maybe easier to grok if it is clear that npy_mark_tmp_array_if_pyscalar does indeed require scalar_input.

float, or complex.
""")

add_newdoc('numpy._core._multiarray_umath', '_array_converter', ('scalar_input',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think scalar_input is not used anywhere in the code. Do you have cases in mind where it would be useful? It sort-of feels like it should be, but if there is no use, should it be exposed? (Fine to leave in, just asking.)

Also, right now it is a property. Might it make sense to make it a method (with no arguments), so that one could extend it if needed? (E.g., to ask not just for scalar, but python scalar inputs.)

But perhaps all this just argues for keeping _array_converter private for now, so we can see what is best!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter says this line is one character too long

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think we cared, but fair, pushed the fix (inclued it in the previous commit).

return NULL;
}

/* When we only have scalars, the default is to return an array: */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was again a bit confused about this, maybe it helps to say "We follow the default setting for as_arrays of CONVERT_IF_NO_ARRAY"

----------
subok : True or False, optional
Whether array subclasses are preserved.
pyscalars : {"convert", "preserve", "convert_if_no_array"}, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument is not used anywhere, so a bit of the same question as for scalar_input above. In particular, I wonder if it wouldn't be handier to pass this on input (perhaps still allowing to override it here). But without a use case, probably not worth overthinking it now!

@seberg
Copy link
Member Author

seberg commented Jan 21, 2024

Addressed the smaller comments, thanks.

The API issues are not very clear to me. I really like this type of feature, but it is quite fuzzy to me how it should look like and which details are useful (enough).

That is why I opted for private for now: I am not comfortable with saying that this API seems good (enough) unless others give very clear +1's.

Generally, I would like this or something similar to be public, because I think it can be generally useful and also specifically because I think it is more useful with NEP 50 if downstream wants to be pedantic.

From a maintenance burden I am not super worried even if we deprecate it again soonish, we would name squat np.lib.array_utils.array_converter, though.


If we keep things private, I somewhat like just keeping those odd features. I can see the scalar_input to be useful at least in niches like a python searchsorted like:

conv = array_converter(haystack, needle)
haystack, needle = conv

# The needle decides whether a scalar result makes sense:
return conv.wrap(result, to_scalar=conv.scalar_input[1])

I am not sure that we will use it (in fact I doubt it in the near future), but I also like the idea of "advertising" that this is possible.
(Part of why it is there, was that it is very easy and I also thought it may be useful for the python scalar preservation, which the conversion already does now and it seems to work well.)

If we lean towards public API, I fully agree to ditch them all until it is very clear we or downstream use them. Those odd features shouldn't stop us from making this public!

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, code looks good now! On the API, I'm fine with keeping it private for now and seeing what ends up being useful and what not. I'll approve now, but won't merge to give @ngoldbaum a chance to look.

p.s. Note that the linter failed.

@ngoldbaum
Copy link
Member

I'll look at this tomorrow.

@seberg
Copy link
Member Author

seberg commented Jan 22, 2024

FWIW, I rebased my try-0d-preservation branch, and ended up using those features (not I am sure alternatives wouldn't be better). It actually seems surprisingly close to doing something reasonable.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over the C code again and didn't spot any issues. Please push a fixup for the linter then this should be good to go.

float, or complex.
""")

add_newdoc('numpy._core._multiarray_umath', '_array_converter', ('scalar_input',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The linter says this line is one character too long

@ngoldbaum
Copy link
Member

I figure it's better to keep the linter happy so people don't get surprise linter corrections on future PRs. Thanks!

@ngoldbaum ngoldbaum merged commit 55afe35 into numpy:main Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants