Skip to content

Conversation

@person142
Copy link
Member

As discussed in #16759, the new DType classes provide a good path forward for making ndarray generic over DType. Update NEP 42 to discuss those applications in more detail.

I'm sure more discussion will be needed here, but this is my attempt to get the ball rolling.

Ping @BvB93 @seberg.

As discussed in numpy#16759, the new
DType classes provide a good path forward for making `ndarray` generic
over DType. Update NEP 42 to discuss those applications in more
detail.
avoiding crowding the ``np`` namespace as discussed above. For a user
defined DType::

class UserDtype(dtype): ...
Copy link
Member

@BvB93 BvB93 Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Suggested change
class UserDtype(dtype): ...
class UserScalar: ...
class UserDtype(np.dtype[UserScalar]): ...

or if we want the scalar type to be variable:

Suggested change
class UserDtype(dtype): ...
T = typing.TypeVar("T")
class UserDtype(np.dtype[T]): ...

EDIT1: Made np.generic superclass of UserScalar and use it as bound for the typevar.
EDIT2: Reverted the previous edit (see #17447 (comment)).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure-in the NEP there is this line:

The notation works equally well with built-in and user-defined DTypes

but I'm not actually sure how the __class_getitem__ would work with a user-defined scalar type. I have more been expecting that people would define new DType classes directly (maybe not even defining a corresponding scalar type!), which is why I emphasized the DType class here and left the scalars out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@person142 it will work for DTypes requesting it (not necessarily for all DTypes). And in that case it will work by NumPy having global known_dtype mapping basically. The reason we need something in any case, is to support:

np.array([user_scalar])

without explicitly passing a dtype=, which I would like to support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Maybe I should include both examples then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update here c564fcb#diff-89eb8c11ba758f8d3a296f44372c3254R293 for user-defined scalar types. AFAICT supporting np.dtype[UserScalar] mapping to UserDtype will require adding a single overload inside the NumPy stubs, which seems ok since we'll already have the known_dtype mapping. LMK if you see a way around that though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn’t it already a subclass in the example? (I meant for it to be.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was specifically talking about the suggestion in #17447 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does say "generic" currently. Further down, I say "Scalars will be expected to derive from a NumPy scalar", but honestly, I am not sure we need to stick with it. I do not see a huge advantage, because I do not believe in "inheriting" the dtype attributes. When you inherit it, you mix up contra and covariance for a container and AFAIK that just doesn't work.

The reason for saying "from a NumPy scalar", is because I do not like np.generic. It comes with expectations on memory layout on the NumPy side and quite frankly I don't want to use it. When we reach this stage, I would rather consider creating a np.generic baseclass without all of those expectations.

Does subclassing even help with typing? It seems like there is a solution that will always work and it also seems to me like users will always have to define this once for runtime and once for static typing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or from a different angle: Does subclassing really matter, as opposed to what is pretty much equivalent to an ABC you can register with?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or from a different angle: Does subclassing really matter, as opposed to what is pretty much equivalent to an ABC you can register with?

For static typing the only important thing is that there is some common baseclass or protocol for automatically annotating the likes of np.array([user_scalar]).
Exactly how this is accomplished doesn't really matter (well, like you said, it might matter a lot during runtime...):

  • Directly subclassing np.generic.
  • Adding np.generic as a superclass only during type checking.
  • Adding a new ABC (which may or may not be available at runtime).
  • Adding a new protocol (which, again, may or may not be available at runtime).
  • etc...

When we reach this stage, I would rather consider creating a np.generic baseclass without all of those expectations.

+1 to this suggestion!

and is inspired by and potentially useful for type hinting.
to get the DType class corresponding to a scalar type. (Note that with
the acceptance of PEP 585 [2]_ there is precedent for
``np.dtype[np.int64]`` referring to a *subclass* of ``np.dtype``.) The
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``np.dtype[np.int64]`` referring to a *subclass* of ``np.dtype``.) The
``np.dtype[np.int64]`` referring to a *subtype* of ``np.dtype``.) The

e.g. list[int] is a subtype of list[object] but not a subclass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I meant subclass here, but it has apparently been too long since I have read PEP 585-it only mentions

>>> isinstance(list[str], types.GenericAlias)
True

i.e. the proxy classes for list[int] do not subclass list. I'll remove this line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess both is correct here, I am actually not sure which one is more precise with respect to how "type" is used in typing :).

The notation works equally well with built-in and user-defined DTypes
and is inspired by and potentially useful for type hinting.
to get the DType class corresponding to a scalar type. (Note that with
the acceptance of PEP 585 [2]_ there is precedent for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
the acceptance of PEP 585 [2]_ there is precedent for
the acceptance of :PEP:`585` there is precedent for

Minor.

This getter eliminates the need to create an explicit name for every
DType, crowding the ``np`` namespace; the getter itself signifies the type.
DType, crowding the ``np`` namespace; the getter itself signifies the
type. It also opens up the possibility of making ``np.ndarray``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type. It also opens up the possibility of making ``np.ndarray``
type. It also opens the possibility of making ``np.ndarray``

Minor.

DType, crowding the ``np`` namespace; the getter itself signifies the type.
DType, crowding the ``np`` namespace; the getter itself signifies the
type. It also opens up the possibility of making ``np.ndarray``
generic over DType class via annotations like::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
generic over DType class via annotations like::
generic over DType class using annotations like::

Minor. Google style guide says of "via": "Don't use."

Comment on lines 289 to 290
avoiding crowding the ``np`` namespace as discussed above. For a user
defined DType::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
avoiding crowding the ``np`` namespace as discussed above. For a user
defined DType::
avoiding crowding the ``np`` namespace as discussed above. For a
user-defined DType::

Minor.


class UserDtype(dtype): ...

one can simply do ``np.ndarray[UserDtype]``, keeping annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
one can simply do ``np.ndarray[UserDtype]``, keeping annotations
one can use ``np.ndarray[UserDtype]``, keeping annotations

Minor. Google style guide recommends against "simply."

class UserDtype(dtype): ...

one can simply do ``np.ndarray[UserDtype]``, keeping annotations
concise in that case without introducing any extra boilerplate in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
concise in that case without introducing any extra boilerplate in
concise in that case without introducing boilerplate in

Minor.

to return a ``uint8`` or ``float32`` array respectively. This is
further described in the documentation for :func:`numpy.result_type`.
.. [2] https://www.python.org/dev/peps/pep-0585/
Copy link
Contributor

@bjnath bjnath Oct 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. [2] https://www.python.org/dev/peps/pep-0585/

No longer needed with change proposed in https://github.com/numpy/numpy/pull/17447/files#r499271335


Since getter calls won't be needed often, this is unlikely to be burdensome.
Classes can also offer concise alternatives.
in ``numpy.typing``, thus keeping annotations concise but still
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point including it in numpy.typing, might have to do this sooner rather than later to make solving the pickling issue simpler.

Also clean up the language a bit.
@seberg seberg merged commit 789b217 into numpy:master Oct 6, 2020
@seberg
Copy link
Member

seberg commented Oct 6, 2020

Thanks @person142 and @BvB93 , I don't feel like worrying too much about the generic when later text says something more about it. It gives the right idea.

@person142 person142 deleted the nep-42-typing branch October 7, 2020 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants