ENH: include dt64/td64 isinstance checks in __init__.pxd#16266
ENH: include dt64/td64 isinstance checks in __init__.pxd#16266mattip merged 9 commits intonumpy:masterfrom
__init__.pxd#16266Conversation
numpy/__init__.pxd
Outdated
There was a problem hiding this comment.
I am not sure this can be nogil. It is defined as
#define PyObject_TypeCheck(ob, tp) \
(Py_IS_TYPE(ob, tp) || PyType_IsSubtype(Py_TYPE(ob), (tp)))
Where PyType_IsSubtype is a function call
PyAPI_FUNC(int) PyType_IsSubtype(PyTypeObject *, PyTypeObject *);
and I think that function needs to hold the GIL.since it traverses the type object's mro. Thoughts?
There was a problem hiding this comment.
and I think that function needs to hold the GIL.since it traverses the type object's mro. Thoughts?
I can't speak to most of this, but pandas has used this verbatim for a couple years now without any trouble.
There was a problem hiding this comment.
Unlike IsSubclass, the IsSubtype code looks safe. But its not documented as such, so unless pandas uses it without the GIL, maybe its better to just remove it?
There was a problem hiding this comment.
so unless pandas uses it without the GIL, maybe its better to just remove it?
I just tried removing the "nogil" from these declarations in pandas (and for the 4 other isinstance-like functions not included in this PR) and didn't get any cythonize-time build errors, so it looks like we don't call it without the GIL. That said, ill do some %timeits to check if this change affects perf.
There was a problem hiding this comment.
I don't think it can affect performance, since Cython does not release the GIL unless you ask it to AFAIK. In fact, if it did release the GIL here, that would be a performance bug, since releasing the GIL is unnecessary overhead.
There was a problem hiding this comment.
Well, I looked at the CPython code, and at this time it looks absolutely safe to me. Maybe we can open a bug at python asking to document it as being safe (or the opposite). I am not sure how smart cython is, does it re-grab the GIL when calling a non-gil function from within a GIL context?
There was a problem hiding this comment.
To be honest, I wonder if thats just a fluke in timing, but not sure. In any case, I don't have a huge problem with keeping it around, although nogil should as far as I know make no difference at all, except causing errors when the function is used in a with nogil: block.
There was a problem hiding this comment.
ill double-check with some more timing results.
Suggestions on how/where to implement tests?
There was a problem hiding this comment.
ill double-check with some more timing results.
Looking at these again, the performance looks indistinguishable
There was a problem hiding this comment.
Thanks for checking. I would prefer without the nogil since this is a function call and not a macro, who knows what mischief the implementation could do when it scans the type's mro.
As for tests: that is a good question. We should be testing all of the content of this file. Perhaps a new test file numpy/core/tests/test_cython.py which would call cython similar to what numpy/random/tests/test_extending.test_cython does.
|
Thanks. I agree this code should not be maintained by pandas. I am not a big fan of exposing numpy internals to downstream libraries. I understand this code already is in use in Pandas (likewise get_timedelta64). It seems these uses are
Additionally, there is code in pandas to read/write a I wonder if we already have appropriate and fast APIs for handling these cases that we could add to our |
The main use case for this is to implement an analogue to
pandas used to use a copy/pasted version that called it pandas_datetimestruct but I changed it to use the upstream version to de-duplicate. Reverting that would not be that troublesome (though I'd rather share more rather than less). |
|
One other related issue that might be worth addressing in this PR: a while back I determined that cython's |
Indeed. There is confusion there. It should be mapped with aliases (like for itemsize and "elsize")
|
huh, looking at the pandas version, the main difference seems to be that we have "object fields" instead of "dict fields" |
|
updated to 1) remove |
mattip
left a comment
There was a problem hiding this comment.
Thanks for getting the testing started. I took the liberty of fixing a few things, there are still some failing tests
|
|
||
| config.add_subpackage('tests') | ||
| config.add_data_dir('tests/data') | ||
| config.add_data_dir('tests/examples') |
There was a problem hiding this comment.
This was the missing piece to get tests going. I also checked that the new files are included in the sdist, we do not need to change MANIFEST.in
|
@mattip thanks for helping out with the tests; this is a really non-standard test setup. Looks like the remaining CI failure is an unrelated HTTP error |
|
LGTM. I think we should put this in:
CI test failure is not related to this PR. Thanks @jbrockmendel |
|
If the tests slow down CI we might want to mark them as |
|
awesome, thanks for walking me through the process |
| cdef inline bint is_timedelta64_object(object obj): | ||
| """ | ||
| Cython equivalent of `isinstance(obj, np.timedelta64)` | ||
|
|
||
| Parameters | ||
| ---------- | ||
| obj : object | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| """ | ||
| return PyObject_TypeCheck(obj, &PyTimedeltaArrType_Type) | ||
|
|
||
|
|
||
| cdef inline bint is_datetime64_object(object obj): | ||
| """ | ||
| Cython equivalent of `isinstance(obj, np.datetime64)` | ||
|
|
||
| Parameters | ||
| ---------- | ||
| obj : object | ||
|
|
||
| Returns | ||
| ------- | ||
| bool | ||
| """ | ||
| return PyObject_TypeCheck(obj, &PyDatetimeArrType_Type) |
| cdef inline NPY_DATETIMEUNIT get_datetime64_unit(object obj) nogil: | ||
| """ | ||
| returns the unit part of the dtype for a numpy datetime64 object. | ||
| """ | ||
| return <NPY_DATETIMEUNIT>(<PyDatetimeScalarObject*>obj).obmeta.base |
There was a problem hiding this comment.
This is risky - I can't think of any way of using this function that doesn't produce dangerously incorrect results for np.datetime64("now", "20s") as input.
__init__.pxd
There are a handful of checks related to datetime64/timedelta64 objects that pandas implements but would make more sense to have directly in numpy. This PR ports those checks over and puts them in the
__init__.pxdfile. If this PR is accepted, I'll make the same PR on the cython version of this file.Suggestions on where tests for these belong?