ENH: expose cython isinstance checks for np.bool_, np.integer_, np.floating#16363
ENH: expose cython isinstance checks for np.bool_, np.integer_, np.floating#16363jbrockmendel wants to merge 7 commits intonumpy:masterfrom
Conversation
numpy/__init__.pxd
Outdated
There was a problem hiding this comment.
Does cython not just let you run this directly?
There was a problem hiding this comment.
it does, but it doesnt optimize it into a C call
There was a problem hiding this comment.
Is the optimization even worth it? Isn't isinstance a thin wrapper around PyType_Check anyway?
Would it be worth filing a issue against cython requesting for it to optimize this type of thing automatically?
These just seem like weird helper functions to add to me, and I the only justification I can think of would be upstream deficiencies - in which case I'd like to see links to open cython issues in our source code.
(This complaint applies equally to https://github.com/numpy/numpy/pull/16266/files#r429937827 from the previous patch, although the rest of it seemed fine)
There was a problem hiding this comment.
There is a difference, PyObject_TypeCheck does not support metaclasses/__subclasscheck__ or __subclasshook__, so for this type of function it may actually make a real difference.
There was a problem hiding this comment.
... except that the function calls isinstance(obj, bool) in the first half
There was a problem hiding this comment.
So to summarize so far:
- this optimization is worth it, but maybe belongs upsrtream in Cython, which doesn't seem easy to do
- there was a suggestion to put these into a namespace, which seems non-trivial
which leaves us with accepting these functions as work-arounds, much like the python work-arounds we have in other places. Perhaps we can mark them as such with some comments and accept this as-is?
There was a problem hiding this comment.
this optimization is worth it, but maybe belongs upsrtream in Cython, which doesn't seem easy to do
not sure i understand this. My plan is that if/when this is merged, I'll make a PR with cython to make the same edits to their numpy/__init__.pxd file
There was a problem hiding this comment.
These functions to two things: First they do some optimizations that cython should do in isinstance(), but fails to do. Second, they make writing isinstance(value, (python_type, numpy_type)) slight shorter. I think what Matti is saying that the second point may not be worth it. But the first point is worth it until cython is capable of parsing the full isinstance check into something faster (or maybe providing an isinstancefast/hastype check with that signature). Once Cython has such a new feature, these functions could probably be considered deprecated.
EDIT: To be clear, that is my current opinion as wel.
There was a problem hiding this comment.
Cython does translate isinstance(value, (python_type, numpy_type)) into the corresponding C-API calls, except for bool, as @jbrockmendel pointed out. Calling isinstance(obj, (float, floating)) should be just fine.
|
CI failure looks like a mingw installation problem, presumably unrelated? |
d2bc708 to
d83d59a
Compare
|
@mattip thoughts on how to get travis to green? |
|
Don't worry about that particular run, its notorious. I restarted it, but chances are it will just fail again. |
|
if we're not worrying about travis, anything else here thats actionable? |
|
@seberg gentle ping |
|
OK, so I think the main reason this has stalled, is that these functions are clearly useful and simple. But also feel a bit out of place in some sense (and I suppose even slightly annoying in case we ever feel like updating any of the definitions). Partially because they are so simple probably. I am +0 to putting it in right now. A quick show of hands (or just note that nobody is against it), would be helpful! |
|
I would be more positive on this if the docstrings referenced an issue in Cython. Or a one-sentence summary of the long discussion of the problem in the docstring and why Cython cannot solve this cleanly. |
Good idea, will update.
I'm not clear on what you're looking for here. If/when this is merged, I'll make a PR on cython porting the same functions to their numpy/init.pxd file, but my understanding is that that file is going away in favor of just having that live in numpy. |
Your docstrings say "Cython equivalent of So the follow up question is "why can't cython fix [optimizing it into a C call]?". |
numpy/__init__.pxd
Outdated
There was a problem hiding this comment.
| Cython equivalent of `isinstance(val, (float, np.float_))` | |
| Cython equivalent of `isinstance(val, (float, np.floating))` |
numpy/__init__.pxd
Outdated
There was a problem hiding this comment.
| Cython equivalent of `isinstance(val, (complex, np.complex_))` | |
| Cython equivalent of `isinstance(val, (complex, np.complexfloating))` |
|
Updated per requests |
Obvious follow-up question: how can we / why can't we make |
I don't know/understand the numpy internals well enough to answer that. |
|
When you say "exposed as a C type", do you mean
|
|
It sounds like all you need is ctypedef class numpy.integer [object PyObject, type PyIntegerArrType_Type]:
passand similar for the other types |
|
What happens if you also omit the |
sorry, im not clear on what you have in mind. weve gone through a lot of iterations |
|
Co-authored-by: Eric Wieser <wieser.eric@gmail.com>
Shouldn't |
There's no reason for it to be. At worse cython pays a one-time cost of calling |
no dice, see CI logs from most recent run. At this point my feeling is that we have a working implementation and should go with that. If someone wants to try to figure out another solution more power to them, but it shouldn't be me. |
numpy/__init__.pxd
Outdated
| /* NumPy API declarations from "numpy/__init__.pxd" */ | ||
| """ | ||
|
|
||
| cdef extern from "numpy/scalartypes.h": |
There was a problem hiding this comment.
CI failure is that this header doesn't exist.
There was a problem hiding this comment.
| cdef extern from "numpy/scalartypes.h": | |
| cdef extern from *: |
That's fair. @scoder, would you have any interest in doing this? Otherwise, I can have a go |
|
The last change looks good to me, with the tweak by @eric-wieser. Could you apply that and let CI try again? |
|
Also worth remembering: before merging, whatever change we decide on needs to be duplicated to |
|
I went ahead and added definitions for all the abstract types in #17150. Assuming that gets merged, would you be interested in adding the concrete ones @jbrockmendel? |
Neat! Maybe? I'm not sure what the "concrete ones" are in this context. Since the CI says that |
Simple: it doesn't. cython emits code in the module startup like: As it turns out, |
The ones you can actually create instances of, rather than the abstract base classes. |
|
Since #17150 is merged, are you happy to close this @jbrockmendel? |
|
I think the is_integer_object check is worthwhile, as it is distinct from just the isinstance check. I'll run a performance check on the others and see if it makes a difference. |
| ----- | ||
| This does not counts np.timedelta64 objects as integers. | ||
| """ | ||
| return (not PyBool_Check(obj) and PyArray_IsIntegerScalar(obj) |
There was a problem hiding this comment.
Note the definition of this function is just:
numpy/numpy/core/include/numpy/ndarrayobject.h
Lines 51 to 52 in 6ee4917
aka isinstance(obj, (int, np.integer))
There was a problem hiding this comment.
yah, its excluding bool and td64 that makes this useful
There was a problem hiding this comment.
Sure, but after that change it no longer uses any numpy C API functions, so it probably doesn't belong in numpy/__init__.pxd any more.
|
OK, ill close this and can revisit if performance discrepancies popup. |
Follows #16266 to upstream a few more cython-isinstance checks from pandas.