BUG: introduce PyArray_SafeCast to fix issues around stringdtype views#26147
BUG: introduce PyArray_SafeCast to fix issues around stringdtype views#26147ngoldbaum merged 7 commits intonumpy:mainfrom
Conversation
| if ((type1 != type2) && (type1->kind == 'T')) { | ||
| return 0; | ||
| } | ||
| return 1; |
There was a problem hiding this comment.
Need to dig slightly deeper and use this pattern:
numpy/numpy/_core/src/umath/ufunc_object.c
Lines 790 to 812 in 51fd714
Making this not be string dtype specific!
There was a problem hiding this comment.
ah nice and then I can control how view_offset gets set in string_to_string_resolve_descriptors, that seems much more idiomatic
There was a problem hiding this comment.
So this idea doesn't quite work as cleanly as I'd hoped. If I set view_offset != 0 in string_to_string_resolve_descriptors, there's an assert that triggers if casting == NPY_NO_CASTING:
numpy/numpy/_core/src/multiarray/convert_datatype.c
Lines 465 to 482 in e1bf1d6
In the latest version I'm about to push I guard that assert with an if statement that checks for stringdtype, but that seems unsatisfying. Maybe the correct thing is for stringdtype to set e.g. NPY_EQUIV_CASTING for casts with distinct allocators, but when I tried doing that it broke reductions and I couldn't see a clear path forward on fixing that, because the check for valid reductions relies on PyArray_EquivTypes, which enfoces NPY_NO_CASTING as the minimum casting level:
numpy/numpy/_core/src/umath/ufunc_object.c
Lines 2394 to 2411 in e1bf1d6
Maybe we need a new casting level between NO_CASTING and EQUIV_CASTING for this case where data can live outside the array buffer on the descriptor?
There was a problem hiding this comment.
I think we can probably just remove the assert, just need to make sure to not document anywhere that no-casting implies view (I doubt we use it anywhere, but not sure). It used to, for these strings it can't since they use a whole new mechanism for storage... that seems fine.
(you could keep the assert together with checking that the dtype implements the hook for fixing the dtype in the array creation call, given that reason.)
| * We ignore that possibility for simplicity; it really is not our bug. | ||
| */ | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
Is this still necessary here and in EquivTypes?
There was a problem hiding this comment.
I think we can remove it, as it I am very certain it was fixed in boost python a while ago.
| return 0; | ||
| } | ||
| /* If casting is "no casting" this dtypes are considered equivalent. */ | ||
| return PyArray_MinCastSafety(safety, NPY_NO_CASTING) == NPY_NO_CASTING; |
There was a problem hiding this comment.
Note that most of this implementation is copied from PyArray_EquivTypes, to avoid calling GetCastInfo twice. If I just called EquivTypes, there would be an unnecessary second call to GetCastInfo to get the view_offset.
There was a problem hiding this comment.
Fair, could add that as a code comment if you like.
|
What is the status of this? |
|
I think it just needs another round of review. It’s not critical for the RC but it would be good to have it in. |
seberg
left a comment
There was a problem hiding this comment.
This is good, but the view offset needs fixing. I cannot review if other places using eqivtypes shojld be using this meaning, right now.
Its maybe subtle but only really affects strings, so with that fix happy to see it in. (Other comments could wait).
| * We ignore that possibility for simplicity; it really is not our bug. | ||
| */ | ||
| return 0; | ||
| } |
There was a problem hiding this comment.
I think we can remove it, as it I am very certain it was fixed in boost python a while ago.
| return 0; | ||
| } | ||
| /* If casting is "no casting" this dtypes are considered equivalent. */ | ||
| return PyArray_MinCastSafety(safety, NPY_NO_CASTING) == NPY_NO_CASTING; |
There was a problem hiding this comment.
Fair, could add that as a code comment if you like.
|
|
||
| *view_offset = 0; | ||
| // views are only legal between descriptors that share allocators (e.g. the same object) | ||
| *view_offset = descr0->allocator != descr1->allocator; |
There was a problem hiding this comment.
This is wrong. You must set it to 0 or leave it unchanged! (Intp min, I think)
| * is true as well. | ||
| */ | ||
| NPY_NO_EXPORT unsigned char | ||
| PyArray_ViewableTypes(PyArray_Descr *type1, PyArray_Descr *type2) |
There was a problem hiding this comment.
Maybe we can encode the equiv in the name? Or pass in the cast safety? Passing it in would make it work in the ufunc code I think.
But if not its OK.
| /* One more chance for faster exit if user specified the dtype. */ | ||
| oldtype = PyArray_DESCR(oparr); | ||
| if (PyArray_EquivTypes(oldtype, dtype)) { | ||
| unsigned char viewable = PyArray_ViewableTypes(oldtype, dtype); |
There was a problem hiding this comment.
The equiv cast may not matter, but then a single field struct to non-struct could return views... not today, I guess
|
Thanks for the prompt to refactor so I could use it in Now instead of |
|
Even though sebastian approved this, probably worth one more round of review before merging after the last commit. |
mhvk
left a comment
There was a problem hiding this comment.
Looks good. Mostly just queries/requests for clarification
numpy/_core/src/multiarray/ctors.c
Outdated
| ((flags & NPY_ARRAY_WRITEABLE) && | ||
| (!(arrflags & NPY_ARRAY_WRITEABLE))) || | ||
| !PyArray_EquivTypes(oldtype, newtype); | ||
| !view_safe; |
There was a problem hiding this comment.
Since the actual work has now been done before defining copy, I think it should be the first item for copy, i.e.,
copy = !view_safe ||
...
Alternatively, maybe better, since this now makes every case where a copy was already required slower, one can remove the last bit, and do the calculation like,
if (!copy) { /* Check a view is in fact possible */
npy_intp view_offset
npy_intp is_safe = ...
copy = copy && is_safe && (view_offset == 0);
}
| * *minimum_safety* casting level. Sets the view_offset if that is set | ||
| * for the cast. If ignore_error is 1, errors in cast setup are ignored. | ||
| */ | ||
| NPY_NO_EXPORT npy_intp |
There was a problem hiding this comment.
The comment should perhaps add
... are ignored;
otherwise the error is kept and -1 is returned
There was a problem hiding this comment.
Thanks, remnant of an older version that unconditionally ignored errors.
numpy/_core/src/multiarray/ctors.c
Outdated
| arrflags = PyArray_FLAGS(arr); | ||
| npy_intp view_offset; | ||
| npy_intp is_safe = PyArray_SafeCast(oldtype, newtype, &view_offset, NPY_NO_CASTING, 1); | ||
| npy_intp view_safe = (is_safe && (view_offset == 0)); |
There was a problem hiding this comment.
Probably fine as is, but in the code proper, impossible views are marked with view_offset = NPY_MIN_INTP
| if (PyArray_EquivTypes(oldtype, dtype)) { | ||
| npy_intp view_offset; | ||
| npy_intp is_safe = PyArray_SafeCast(oldtype, dtype, &view_offset, NPY_NO_CASTING, 1); | ||
| npy_intp view_safe = (is_safe && (view_offset == 0)); |
There was a problem hiding this comment.
As above, should one check view_offset != NPY_MIN_INTP instead?
| NPY_VISIBILITY_HIDDEN extern PyObject * npy_ma_str_cpu; | ||
| NPY_VISIBILITY_HIDDEN extern PyObject * npy_ma_str_array_err_msg_substr; | ||
|
|
||
| NPY_NO_EXPORT npy_intp |
There was a problem hiding this comment.
Is this the right module to define it in? I know PyArray_EquivTypes is here, but maybe casts.c makes more sense?
|
All review comments should be resolved now |
mhvk
left a comment
There was a problem hiding this comment.
Looks good! A suggestion for a tiny further improvement in-line, though probably barely worth the CI since the compiler will likely do it already.
numpy/_core/src/multiarray/ctors.c
Outdated
| if (!copy) { | ||
| npy_intp view_offset; | ||
| npy_intp is_safe = PyArray_SafeCast(oldtype, newtype, &view_offset, NPY_NO_CASTING, 1); | ||
| copy = copy || !(is_safe && (view_offset != NPY_MIN_INTP)); |
There was a problem hiding this comment.
My suggestion was imperfect: this can be copy = !(is_safe... since we know copy is false.
|
Pulling this one in, thanks for the review! |
Fixes #26140.
This is a minimal implementation of a new function that generalizes
PyArray_EquivTypes.