Skip to content

ENH: expose cython isinstance checks for np.bool_, np.integer_, np.floating#16363

Closed
jbrockmendel wants to merge 7 commits intonumpy:masterfrom
jbrockmendel:cyapi2
Closed

ENH: expose cython isinstance checks for np.bool_, np.integer_, np.floating#16363
jbrockmendel wants to merge 7 commits intonumpy:masterfrom
jbrockmendel:cyapi2

Conversation

@jbrockmendel
Copy link
Contributor

Follows #16266 to upstream a few more cython-isinstance checks from pandas.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does cython not just let you run this directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does, but it doesnt optimize it into a C call

Copy link
Member

@eric-wieser eric-wieser May 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the optimization even worth it? Isn't isinstance a thin wrapper around PyType_Check anyway?

Would it be worth filing a issue against cython requesting for it to optimize this type of thing automatically?

These just seem like weird helper functions to add to me, and I the only justification I can think of would be upstream deficiencies - in which case I'd like to see links to open cython issues in our source code.

(This complaint applies equally to https://github.com/numpy/numpy/pull/16266/files#r429937827 from the previous patch, although the rest of it seemed fine)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a difference, PyObject_TypeCheck does not support metaclasses/__subclasscheck__ or __subclasshook__, so for this type of function it may actually make a real difference.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... except that the function calls isinstance(obj, bool) in the first half

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to summarize so far:

  • this optimization is worth it, but maybe belongs upsrtream in Cython, which doesn't seem easy to do
  • there was a suggestion to put these into a namespace, which seems non-trivial

which leaves us with accepting these functions as work-arounds, much like the python work-arounds we have in other places. Perhaps we can mark them as such with some comments and accept this as-is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this optimization is worth it, but maybe belongs upsrtream in Cython, which doesn't seem easy to do

not sure i understand this. My plan is that if/when this is merged, I'll make a PR with cython to make the same edits to their numpy/__init__.pxd file

Copy link
Member

@seberg seberg Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions to two things: First they do some optimizations that cython should do in isinstance(), but fails to do. Second, they make writing isinstance(value, (python_type, numpy_type)) slight shorter. I think what Matti is saying that the second point may not be worth it. But the first point is worth it until cython is capable of parsing the full isinstance check into something faster (or maybe providing an isinstancefast/hastype check with that signature). Once Cython has such a new feature, these functions could probably be considered deprecated.

EDIT: To be clear, that is my current opinion as wel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cython does translate isinstance(value, (python_type, numpy_type)) into the corresponding C-API calls, except for bool, as @jbrockmendel pointed out. Calling isinstance(obj, (float, floating)) should be just fine.

@jbrockmendel
Copy link
Contributor Author

CI failure looks like a mingw installation problem, presumably unrelated?

@jbrockmendel jbrockmendel force-pushed the cyapi2 branch 2 times, most recently from d2bc708 to d83d59a Compare June 17, 2020 01:58
@jbrockmendel
Copy link
Contributor Author

@mattip thoughts on how to get travis to green?

@seberg
Copy link
Member

seberg commented Jun 18, 2020

Don't worry about that particular run, its notorious. I restarted it, but chances are it will just fail again.

@jbrockmendel
Copy link
Contributor Author

if we're not worrying about travis, anything else here thats actionable?

@jbrockmendel
Copy link
Contributor Author

@seberg gentle ping

@seberg seberg self-requested a review July 8, 2020 17:27
@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Jul 16, 2020
@seberg
Copy link
Member

seberg commented Jul 16, 2020

OK, so I think the main reason this has stalled, is that these functions are clearly useful and simple. But also feel a bit out of place in some sense (and I suppose even slightly annoying in case we ever feel like updating any of the definitions). Partially because they are so simple probably.

I am +0 to putting it in right now. A quick show of hands (or just note that nobody is against it), would be helpful!

@mattip
Copy link
Member

mattip commented Jul 17, 2020

I would be more positive on this if the docstrings referenced an issue in Cython. Or a one-sentence summary of the long discussion of the problem in the docstring and why Cython cannot solve this cleanly.

@jbrockmendel
Copy link
Contributor Author

I would be more positive on this if the docstrings referenced an issue in Cython. Or a one-sentence summary of the long discussion of the problem in the docstring

Good idea, will update.

and why Cython cannot solve this cleanly

I'm not clear on what you're looking for here. If/when this is merged, I'll make a PR on cython porting the same functions to their numpy/init.pxd file, but my understanding is that that file is going away in favor of just having that live in numpy.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 17, 2020

and why Cython cannot solve this cleanly

I'm not clear on what you're looking for here.

Your docstrings say "Cython equivalent of isinstance(val, (float, np.float_))". The question is, "why isn't isinstance(val, (float, np.float_)) a good option in cython". The answer to this is "it does, but it doesnt optimize it into a C call".

So the follow up question is "why can't cython fix [optimizing it into a C call]?".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cython equivalent of `isinstance(val, (float, np.float_))`
Cython equivalent of `isinstance(val, (float, np.floating))`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Cython equivalent of `isinstance(val, (complex, np.complex_))`
Cython equivalent of `isinstance(val, (complex, np.complexfloating))`

@jbrockmendel
Copy link
Contributor Author

Updated per requests

@eric-wieser
Copy link
Member

Cython cannot optimize this into a C call on its own because np.integer is not available as a C type.

Obvious follow-up question: how can we / why can't we make np.integer available as a C type?

@jbrockmendel
Copy link
Contributor Author

Obvious follow-up question: how can we / why can't we make np.integer available as a C type?

I don't know/understand the numpy internals well enough to answer that. np.integer.__module__ shows "numpy", so it isnt clear where it is defined. Is it exposed in a .h file somewhere?

@eric-wieser
Copy link
Member

When you say "exposed as a C type", do you mean

  • Cython is not told that np.integer and &PyIntegerArrType_Type are identical
  • Cython is not told the underlying storage format of np.integer (it has none)
  • Both of the above

@eric-wieser
Copy link
Member

eric-wieser commented Jul 27, 2020

It sounds like all you need is

ctypedef class numpy.integer [object PyObject, type PyIntegerArrType_Type]:
    pass

and similar for the other types

@eric-wieser
Copy link
Member

eric-wieser commented Aug 12, 2020

What happens if you also omit the type argument?

@jbrockmendel
Copy link
Contributor Author

What happens if you also omit the type argument?

sorry, im not clear on what you have in mind. weve gone through a lot of iterations

@scoder
Copy link
Contributor

scoder commented Aug 23, 2020

What happens if you also omit the type argument?

sorry, im not clear on what you have in mind. weve gone through a lot of iterations

cdef extern from "numpy/scalartypes.h":
    ctypedef class numpy.floating [object PyObject]:
        pass

public exports things from your module, whereas extern imports/includes from other places, so both cannot go together. For an extern class, Cython executes a Python import at initialisation time to get at the 'real' type object (thus the need for the qualified module name). That conflicts with a C type name declaration (it thus rejects the type …). All it needs at compile time is the struct layout of the object, and that is just PyObject in this specific case.

@jbrockmendel
Copy link
Contributor Author

ctypedef class numpy.floating [object PyObject]:

Shouldn't PyFloatingArrType_Type be in here somewhere?

@eric-wieser
Copy link
Member

Shouldn't PyFloatingArrType_Type be in here somewhere?

There's no reason for it to be. At worse cython pays a one-time cost of calling getattr to find the PyTypeObject* pointer, but after that there's no difference.

@jbrockmendel
Copy link
Contributor Author

ctypedef class numpy.floating [object PyObject]:

no dice, see CI logs from most recent run.

At this point my feeling is that we have a working implementation and should go with that. If someone wants to try to figure out another solution more power to them, but it shouldn't be me.

/* NumPy API declarations from "numpy/__init__.pxd" */
"""

cdef extern from "numpy/scalartypes.h":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure is that this header doesn't exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdef extern from "numpy/scalartypes.h":
cdef extern from *:

@eric-wieser
Copy link
Member

eric-wieser commented Aug 23, 2020

If someone wants to try to figure out another solution more power to them, but it shouldn't be me.

That's fair. @scoder, would you have any interest in doing this? Otherwise, I can have a go in two weeks tomorrow.

@scoder
Copy link
Contributor

scoder commented Aug 24, 2020

The last change looks good to me, with the tweak by @eric-wieser. Could you apply that and let CI try again?

@eric-wieser
Copy link
Member

eric-wieser commented Aug 24, 2020

Also worth remembering: before merging, whatever change we decide on needs to be duplicated to numpy/__init__.cython-30.pxd

@eric-wieser
Copy link
Member

I went ahead and added definitions for all the abstract types in #17150. Assuming that gets merged, would you be interested in adding the concrete ones @jbrockmendel?

@jbrockmendel
Copy link
Contributor Author

I went ahead and added definitions for all the abstract types in #17150. Assuming that gets merged, would you be interested in adding the concrete ones

Neat!

Maybe? I'm not sure what the "concrete ones" are in this context.

Since the CI says that ctypedef class numpy.floating [object PyObject]: works, I'm convinced but still confused. How does cython find that this corresponds to PyFloatingArrType_Type?

@eric-wieser
Copy link
Member

eric-wieser commented Aug 25, 2020

How does cython find that this corresponds to PyFloatingArrType_Type?

Simple: it doesn't. cython emits code in the module startup like:

PyTypeObject *the_type_for_numpy_floating = PyObject_GetAttr(numpy_module, "floating");

As it turns out, the_type_for_numpy_floating == &PyFloatingArrType_Type; but cython doesn't need to know that, all it needs to know is that it can use the_type_for_numpy_floating wherever it sees cnp.floating.

@eric-wieser
Copy link
Member

Maybe? I'm not sure what the "concrete ones" are in this context.

The ones you can actually create instances of, rather than the abstract base classes.

@eric-wieser eric-wieser added the 57 - Close? Issues which may be closable unless discussion continued label Sep 13, 2020
@eric-wieser
Copy link
Member

Since #17150 is merged, are you happy to close this @jbrockmendel?

@jbrockmendel
Copy link
Contributor Author

I think the is_integer_object check is worthwhile, as it is distinct from just the isinstance check.

I'll run a performance check on the others and see if it makes a difference.

-----
This does not counts np.timedelta64 objects as integers.
"""
return (not PyBool_Check(obj) and PyArray_IsIntegerScalar(obj)
Copy link
Member

@eric-wieser eric-wieser Sep 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the definition of this function is just:

#define PyArray_IsIntegerScalar(obj) (PyLong_Check(obj) \
|| PyArray_IsScalar((obj), Integer))

aka isinstance(obj, (int, np.integer))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, its excluding bool and td64 that makes this useful

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but after that change it no longer uses any numpy C API functions, so it probably doesn't belong in numpy/__init__.pxd any more.

@jbrockmendel
Copy link
Contributor Author

OK, ill close this and can revisit if performance discrepancies popup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

01 - Enhancement 57 - Close? Issues which may be closable unless discussion continued component: build component: numpy._core triaged Issue/PR that was discussed in a triage meeting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants