Skip to content

Conversation

@BvB93
Copy link
Member

@BvB93 BvB93 commented Oct 8, 2020

An issue which has plagued the typing numpy for some time are objects whose value or, even worse,
whose type is platform-specific. The most notorious example here is np.int_, the latter being an
alias for either np.int32 (on windows for example) or np.int64.

It is desirable to have some sort of mechanism for annotating these platform-specifics; in order to
accomplish this the PR does two things:

  • It adds a script which dynamically generates a .pyi file. This script is run at the very end of
    setup_package(), ensuring that a fresh .pyi file is generated after every numpy installation
    (similar to np.version).
  • It adds a (second) new submodule to np.typing with valid character codes for each np.generic
    subclass, the latter also containing a number of platform specifics in the case of np.number
    subclasses.

Pinging @person142 as this would help with a few issues encountered in #16759.

EDIT: See #17843 for an alternative approach using a mypy plugin.

@rgommers
Copy link
Member

rgommers commented Oct 8, 2020

Thanks @BvB93, this sounds like a good idea.

the latter also containing a number of platform specifics in the case of np.number subclasses.

I'm actually failing to spot the platform-specific bit. E.g. a common type of dynamic check like is_32_bit = np.intp(0).itemsize < 8. What am I overlooking?

@BvB93
Copy link
Member Author

BvB93 commented Oct 8, 2020

I'm actually failing to spot the platform-specific bit. E.g. a common type of dynamic check like is_32_bit = np.intp(0).itemsize < 8. What am I overlooking?

They keyword here is "dynamic". The example you provided works just fine during runtime, but for static
type checking it has to be, well, defined statically. The latter is somewhat tricky if its type (or value)
is platform specific, hence this PR.

As a side note, the CI failures suggest that there are still some problems with the current implementation;
I'll take a closer look at it tomorrow.

@BvB93 BvB93 marked this pull request as draft October 9, 2020 14:04
@BvB93
Copy link
Member Author

BvB93 commented Oct 9, 2020

Converting this into a draft for now, as there are still some issues getting the tests running properly.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be worth a detailed explanation somewhat about exactly what is going on with the aliases:

  • char through longlong are the native types used inside numpy corresponding to C types
  • int8 through int64 are both:
    • aliases for the above types
    • the default name of all but one of these types (intc or longlong depending on platform)
  • the typing stubs are written as if the sized types are the native ones
  • these files generate aliases in the reverse direction
  • the reverse aliases are not quite accurate - they suggest that two of intc, int_, longlong are identical, when in fact there are distinct scalar types

@BvB93
Copy link
Member Author

BvB93 commented Oct 9, 2020

I think it would be worth a detailed explanation somewhat about exactly what is going on with the aliases:

Good point, I'll update the PR description in a bit.

the typing stubs are written as if the sized types are the native ones

To be fair, this is consist with how the number subclasses are treated during runtime
(at least on the python side), where their C names are treated as aliases for their corresponding
sized type rather than the other way around.

In [2]: np.byte
Out[2]: numpy.int8

the reverse aliases are not quite accurate - they suggest that two of intc, int_, longlong are identical, when in fact there are distinct scalar types

I'm not quite sure how you arrived at this conclusion, as can be seen below the only two integer types
which are considered identical by definition are intp and int0.

https://github.com/numpy/numpy/blob/e6a5514cae950f9336866ac39b4880f8b80bcce0/tools/generate_type_hints.py#L138-L143

Running the script locally yields the following output (MacOS):

byte = int8
short = int16
intc = int32
intp = int0 = int64
int_ = int64
longlong = Any

@eric-wieser
Copy link
Member

Running the script locally yields the following output (MacOS):

byte = int8
short = int16
intc = int32
intp = int0 = int64
int_ = int64
longlong = Any

Ah, I wasn't aware that longlong = Any was the result of this script.

@BvB93
Copy link
Member Author

BvB93 commented Oct 9, 2020

Ah, I wasn't aware that longlong = Any was the result of this script.

Yes, Any is used as a fallback value here in case the aliased typed is still unannotated (or if it isn't an alias at all!).

@BvB93
Copy link
Member Author

BvB93 commented Oct 21, 2020

So I'm having some trouble to get this working with the likes of pip install -V . (as is done in the tests).

The part I'm struggling with is accessing the content of np.core ("Importing the numpy C-extensions failed")
in order to resolve the aliases of all the np.generic subclasses. I suspect that I'll have to hook into np.distutils
somewhere to get this working, but I'm not familiar enough with the whole setup process to narrow it down any further.

Could anyone provide some more insight here?

@BvB93
Copy link
Member Author

BvB93 commented Oct 29, 2020

Xref #16548 as this issue describes more or less what this PR is doing.

@BvB93
Copy link
Member Author

BvB93 commented Nov 2, 2020

Say, aren't the numerical scalars in ctypes always of the same size as their np.number-based counterpart?
If so, then the problem with importing from np.core (#17514 (comment)) can be circumvented by grabbing the scalar sizes from ctypes instead of numpy.

Any thoughts?

Comment on lines +6 to +7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this file is replaced by the generated one at installation, what is the point in this version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its purpose is to provide runtime-accessible values so we can just export them to numpy/typing/__init__.pyi.
This way we don't have to worry about whether typing.TYPE_CHECKING is True or False.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also like to note that the script run during the numpy installation generates
_platform_aliases.pyi, not _platform_aliases.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could draw attention to the fact that the generated file has a different name in a comment.

@BvB93 BvB93 marked this pull request as ready for review November 3, 2020 19:07
@BvB93 BvB93 added this to the 1.20.0 release milestone Nov 11, 2020
Bas van Beek added 19 commits November 15, 2020 20:32
`ctypes` scalars are now used for inferring the size of their `np.number`-based counterpart
mypy has issues recognizing an object as valid type aliases when using the likes of `clongfloat = longcomplex = clongdouble = Any`
Should be more consist with the `half, single, double, ...` series
…f `uint`; set the return type `signedinteger[Any]`
Use the return precision of either `self` or `int_`, depending on which one is larger
…cript

* Clarified docstrings
* Clarified comments
* Use a `NamedTuple` as values in the `TO_BE_ANNOTATED` dictionary
@BvB93
Copy link
Member Author

BvB93 commented Nov 15, 2020

Rebased to get rid of a merge conflict.

@hoefling
Copy link

Just out of curiosity - wouldn't it be easier to implement a small mypy plugin that transforms some types depending on the current platform?

@BvB93
Copy link
Member Author

BvB93 commented Nov 16, 2020

Just out of curiosity - wouldn't it be easier to implement a small mypy plugin that transforms some types depending on the current platform?

In principle we could do this with the help of get_type_analyze_hook if we structure the types/type-aliases as following.
This does mean the users will have to enable the plugin in their mypy config files though.

from typing import Any, Annotated, Union, Literal
import numpy as np
import numpy.typing as npt

_Int64CodesBase = Literal['i8', ...]
_LongLongCodes = Literal['q', ...]
_IntPCodes = Literal['p', ...]

# A plugin can append the union with charcodes from `np.longlong`, `np.intp`, etc
_Int64Codes = Annotated[Union[_Int64CodesBase], "_Int64Codes"]

# A plugin can transform this into `intp = signedinteger[npt._64Bit]`
class intp(np.singedinteger[Any]): ...  

Personally I'd okay with either option, though going the plugin route does mean we'll have expose it to public API.
I suspect the latter will be inevitable however, at last at some point in the future.

@BvB93
Copy link
Member Author

BvB93 commented Nov 25, 2020

After a bit of trial and error I managed to get the plugin-based approach working.
The respective PR is up at #17843.

@BvB93
Copy link
Member Author

BvB93 commented Dec 22, 2020

Closed in favor of #17843.

@BvB93 BvB93 closed this Dec 22, 2020
@BvB93 BvB93 deleted the dynamic branch January 29, 2021 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants