-
-
Notifications
You must be signed in to change notification settings - Fork 12k
ENH: add a casting option 'same_value' and use it in np.astype #29129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is pretty cool that we can do it! That said, I would prefer some more tinkering about the approach.
Basically, I think making it work everywhere is not nearly as hard as it may seem and so I think we have to at least try.
I think the only good way to do this is that the new cast level(s) are reported by the loops (via resolve_descriptors) and integrated into PyArray_MinCastSafety. By doing so, we will achieve two things:
- Things should just work everywhere as long as you also set the
context.flagcorrectly everywhere (should also not be too much). - I am very sure that currently things are broken for datatypes that don't implement this flag yet. And unfortunately, that is not great. If a user does
casting="same_value"a"safe"cast is fine but we need to reject"unsafe"and"same_kind"casts.
So, I think we really need something for 2, and honestly think the easiest way to solve it is this same path that gives us full support everywhere.
(And yes, I don't love having two "same_value" levels, but I doubt it can be helped.)
(All other things are just nitpicks at this time.)
| #define _NPY_METH_static_data 10 | ||
|
|
||
| /* | ||
| * Constants for same_value casting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be constant for special array method return values, I think (i.e. not just applicable to casts).
Is it really worth to just keep this inside the inner-loop? Yes, you need to grab the GIL there, but on the plus side, you could actually consider reporting the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the point of having a return value from the inner loop function is so we can use it to report errors. In general I prefer a code style that tries as much as possible to separate programming metaphors: keep the python c-api calls separate from the "pure" C functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can be convinced to add this. But it is public API and we call these loops from a lot of places so we need machinery to make sure that all of these places give the same errors.
That may be a great refactor! E.g. we could have a single function that does HandleArrayMethodError("name", method_result, method_flags).
But I guess without such a refactor it feels very bolted on to me, because whether or not we check for this return depends on where we call/expect it.
EDIT: I.e. basically, wherever we have PyUFunc_GiveFloatinpointErrors() we would also pass the actual return value and do the needed logic there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to do this as a separate PR.
jorenham
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add this option to
Line 969 in c6eed9a
| _CastingKind: TypeAlias = L["no", "equiv", "safe", "same_kind", "unsafe"] |
|
It seems I touched something in the float -> complex
gives me
|
Maybe the compiler just didn't decide to lift the branch for some reason? I have to say that complex -> real casts, wouldn't be my biggest concern, there is this weird |
…e' in raw_array_assign_scalar and raw_array_wheremasked_assign_scalar
|
Somehow this caused Edit: it was due to the debug build exposing a missing initialization of |
|
CI is passing |
|
Hmm. I wonder what we should do with casting float-to-int or float-to-smaller-float? I think the second example should raise, but what about the third? |
|
I had a list of 5-6 different possible definitions, but for now skipping it. I am tempted to go with your Unless we go all out and say that a
It may actually be worth bringing it up on the mailing list and meeting and sketching at least the main alternatives. (I don't usually expect much input, but sketching the alternatives tends to clarify things a bit either way.) I'll try to look over the recent changes here soon, but ping me here or on slack if I forget! |
|
I think it makes sense to do the most conservative thing and require "ability to accurately round-trip" for |
I suppose we have at least a brief window where we could still broaden the definition slightly. That is currently same-kind I think. In that context I think it is best to forgive all float inaccuracies (including over and underflows). But, I don't want to get lost too much in this. Unless we are sure that we are only interested in one of these two use-cases for floats (or float to int), it seems fine to just focus on one now and add the other if we want to later. (I suppose I wouldn't mind asking around if there is a preference about the insufficient mantissa for |
|
I ran a benchmark where I disabled the internal dispatching in each call to the function, and the results showed that this is the cause of the slowdown: static GCC_CAST_OPT_LEVEL int
@prefix@_cast_@name1@_to_@name2@(
PyArrayMethod_Context *context, char *const *args,
const npy_intp *dimensions, const npy_intp *strides,
NpyAuxData *data)
{
-#if !@is_bool2@
+#if 0 && !@is_bool2@
int same_value_casting = ((context->flags & NPY_SAME_VALUE_CONTEXT_FLAG) == NPY_SAME_VALUE_CONTEXT_FLAG);
if (same_value_casting) {
return @prefix@_cast_@name1@_to_@name2@_same_value(context, args, dimensions, strides, data);
} else {
#else
{
#endif
return @prefix@_cast_@name1@_to_@name2@_no_same_value(context, args, dimensions, strides, data);
}}So that would indicate that a path to restore the degradation in performance would be to avoid the |
|
There is still a slowdown, even when disabling the I also noticed the benchmarks are using Edit: we cannot inline things like the float16 conversion routines, since they come from |
|
Even if I replace lowlevel_strided_loops.c.src with the one from main, I still see a slowdown. But if I also remove the |
This comment has been minimized.
This comment has been minimized.
2 similar comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
…only on specific platforms
This comment has been minimized.
This comment has been minimized.
|
Running benchmarks locally on a Ryzen AMD machine (the AsType benchmarks now run all possible pairings of int/float/complex dtypes) there is still some slowdowns and some speedups. I think this is acceptable? |
|
Diff from mypy_primer, showing the effect of this PR on type check results on a corpus of open source code: spark (https://github.com/apache/spark)
- python/pyspark/ml/functions.py:244: note: def [_ScalarT: generic[Any]] vstack(tup: Sequence[_SupportsArray[dtype[_ScalarT]] | _NestedSequence[_SupportsArray[dtype[_ScalarT]]]], *, dtype: None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
+ python/pyspark/ml/functions.py:244: note: def [_ScalarT: generic[Any]] vstack(tup: Sequence[_SupportsArray[dtype[_ScalarT]] | _NestedSequence[_SupportsArray[dtype[_ScalarT]]]], *, dtype: None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'same_value', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
- python/pyspark/ml/functions.py:244: note: def [_ScalarT: generic[Any]] vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[_ScalarT] | dtype[_ScalarT] | _SupportsDType[dtype[_ScalarT]], casting: Literal['no', 'equiv', 'safe', 'same_kind', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
+ python/pyspark/ml/functions.py:244: note: def [_ScalarT: generic[Any]] vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[_ScalarT] | dtype[_ScalarT] | _SupportsDType[dtype[_ScalarT]], casting: Literal['no', 'equiv', 'safe', 'same_kind', 'same_value', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[_ScalarT]]
- python/pyspark/ml/functions.py:244: note: def vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[Any] | dtype[Any] | _SupportsDType[dtype[Any]] | tuple[Any, Any] | list[Any] | _DTypeDict | str | None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[Any]]
+ python/pyspark/ml/functions.py:244: note: def vstack(tup: Sequence[Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str]], *, dtype: type[Any] | dtype[Any] | _SupportsDType[dtype[Any]] | tuple[Any, Any] | list[Any] | _DTypeDict | str | None = ..., casting: Literal['no', 'equiv', 'safe', 'same_kind', 'same_value', 'unsafe'] = ...) -> ndarray[tuple[Any, ...], dtype[Any]]
|
|
Extending the benchmarks for all combinations of dtypes in |
|
Any more thoughts here? |
seberg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I am happy with it, and don't think there were many new things. I just scrolled through, but may not have looked at the most recent changes very closely, but I think they were all re-organizations.
Unless @mhvk/someone speaks up very soon about concerns, please go ahead and merge when you are happy @mattip (and then open a follow up issue).
| #else | ||
| return (double)(ToDoubleBits(h)); | ||
| #endif | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if that can't be in the header then, but maybe it doens't matter (and I guess that header may be public), but not sure.
|
Don't really have the time for another review, but I was already happy, and you addressed all my more immediate comments. Please do raise follow-up issues about things not done, such as the various casting flags and how they are used and passed on. (@seberg might this also be relevant for how we are trying to implement sorting flags -- a general mechanism to pass on flags may be useful... but maybe I'm missing the boat completely here, don't really have the time to look properly now...) |
I think it is almost the same! But here, it seems right -- or at least OK -- to apply this flag to (almost) all ArrayMethods in the future (even ufuncs in principle, that right now are allowed to absorb casts even if there is no such ufunc inside or outside of NumPy, I am sure). While for sorts it could look basically the same, but the flag and flag space must be specific to the sort gufunc, I think. |
…#29129) * start implementing same_value casting * work through more places that check 'cast', add a TODO * add a test, percolate casting closer to inner loops * use SAME_VALUE_CAST flag for one inner loop variant * aligned test of same_value passes. Need more tests * handle unaligned casting with 'same_value' * extend tests to use source-is-complex * fix more interfaces to pass casting around, disallow using 'same_value' in raw_array_assign_scalar and raw_array_wheremasked_assign_scalar * raise in places that have a kwarg casting, besides np.astype * refactor based on review comments * CHAR_MAX,MIN -> SCHAR_MAX,MIN * copy context flags * add 'same_value' to typing stubs * document new feature * test, check exact float->int casting: refactor same_value check into a function * enable astype same_value casting for scalars * typo * fix ptr-to-src_value -> value casting errors * fix linting and docs, ignore warning better * gcc warning is different * fixes from review, typos * fix compile warning ignore and make filter in tests more specific, disallow non-numeric 'same_value' * fix warning filters * emit PyErr inside the loop * macOS can emit FPEs when touching NAN * Fix can-cast logic everywhere for same-value casts (only allow numeric) * reorder and simplify, from review * revert last commit and remove redundant checks * gate and document SAME_VALUE_CASTING for v2.4 * make SAME_VALUE a flag, not an enum * fixes from review * fixes from review * inline inner casting loop and float16 -> float64 calls * typo * fixes for AVX512 and debug builds, inline npy_halfbits_to_doublebits only on specific platforms * use optimize instead of inline --------- Co-authored-by: Matti Picus <matti.picus#@gmail.com> Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
This commit implements follow-up work for same-value casting (PR numpy#29129): 1. **Enhanced Documentation**: - Added comprehensive documentation for NPY_SAME_VALUE_CONTEXT_FLAG - Documented the flag's purpose, usage, and relationship to casting system - Added examples and cross-references to related functionality 2. **Extended np.can_cast Support**: - Modified can_cast to use PyArray_CastingConverterSameValue instead of PyArray_CastingConverter - Updated can_cast docstring to include 'same_value' as valid casting option - Added examples demonstrating same_value casting usage - Added version note for same_value support 3. **Updated Tests**: - Modified test_conversion_utils.py to expect same_value support - Added comprehensive test suite for same_value casting in can_cast - Created test cases for identity casts, widening casts, and edge cases - Added parameter validation tests 4. **Implementation Details**: - Changed multiarraymodule.c to use PyArray_CastingConverterSameValue for can_cast - This enables same_value casting support without breaking existing functionality - Type hints already supported same_value through _CastingKind This addresses the main items from issue numpy#29765 regarding extending same_value casting beyond ndarray.astype() to other NumPy operations. Fixes: numpy#29765
…#29129) * start implementing same_value casting * work through more places that check 'cast', add a TODO * add a test, percolate casting closer to inner loops * use SAME_VALUE_CAST flag for one inner loop variant * aligned test of same_value passes. Need more tests * handle unaligned casting with 'same_value' * extend tests to use source-is-complex * fix more interfaces to pass casting around, disallow using 'same_value' in raw_array_assign_scalar and raw_array_wheremasked_assign_scalar * raise in places that have a kwarg casting, besides np.astype * refactor based on review comments * CHAR_MAX,MIN -> SCHAR_MAX,MIN * copy context flags * add 'same_value' to typing stubs * document new feature * test, check exact float->int casting: refactor same_value check into a function * enable astype same_value casting for scalars * typo * fix ptr-to-src_value -> value casting errors * fix linting and docs, ignore warning better * gcc warning is different * fixes from review, typos * fix compile warning ignore and make filter in tests more specific, disallow non-numeric 'same_value' * fix warning filters * emit PyErr inside the loop * macOS can emit FPEs when touching NAN * Fix can-cast logic everywhere for same-value casts (only allow numeric) * reorder and simplify, from review * revert last commit and remove redundant checks * gate and document SAME_VALUE_CASTING for v2.4 * make SAME_VALUE a flag, not an enum * fixes from review * fixes from review * inline inner casting loop and float16 -> float64 calls * typo * fixes for AVX512 and debug builds, inline npy_halfbits_to_doublebits only on specific platforms * use optimize instead of inline --------- Co-authored-by: Matti Picus <matti.picus#@gmail.com> Co-authored-by: Sebastian Berg <sebastianb@nvidia.com>
Users have asked for an option to raise an error when casting overflows. This is hard to do with the NEP-50 based casting, which only offers blanket rules for casting one dtype to another, without taking the actual values into account. This PR adds a new
same_valueargument to thecastingkwarg (implemented inPyArray_CastingConverter), and extends the casting loop functions to raise aValueErrorifsame_valuecasting and the value would be changed by the cast. So far this is only implemented forndarray.astype, i.e.np.array([1000]).astype(np.int8, casting='same_value')will now raise an error.Performance: since the PR touches the assignment deep in the inner loop, I checked early on that it does not impact performance when not using
same_valuecasting. The loop pseudo code now looks like this, and when compiled with-O3(via a gcc pragma specific to the loop), itseems the compiler is smart enough to make theseems there is a small performance impact for some casting functionsifcondition not impact performance.Protection: This PR only changes
ndarray.astype. Future PRs may implement more. I tried to guard the places that usePyArray_CastingConverterto raise an error if'same_value'is passed in:array_datetime_as_string(by disallowing'same_value'incan_cast_datetime64_units)array_copytoarray_concatenate(by disallowing'same_value'inPyArray_ConcatenateInto)array_einsum(by disallowing'same_value' inPyArray_EinsteinSum)ufunc_generic_fastcall'same_value'is allowed inarray_astype(the whole goal of this PR)array_can_cast_safelynpyiter_init(I am pretty sure that is OK?)NpyIter_NestedIters(I am pretty sure that is OK?)Testing: I added tests for ndarray.astype() covering all combinations of built-in types, and also some tests for properly blocking
casting='same_value'.TODO:
astype(..., casting='same_value')withdatetime64or user-defined dtypes?main