-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11950: [C++][Compute] Add unary negative kernel #10016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following are pending details to be resolved with this PR:
|
bkietz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking good. Please add tests for negate_checked
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This alias doesn't add anything, please revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| T result = 0; | |
| // NOTE [EPM]: Check this edge case of overflow. What are we trying to check here? | |
| if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) { | |
| ctx->SetStatus(Status::Invalid("overflow")); | |
| } | |
| return result; | |
| if (arg == std::numeric_limits<T>::min()) { | |
| // two's complement can represent a negative number which has no corresponding positive, | |
| // for example int8_t(-128) cannot be negated since 128 is not respresentable in int8_t | |
| ctx->SetStatus(Status::Invalid("overflow")); | |
| return 0; | |
| } | |
| return -arg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) { | |
| static enable_if_signed_integer<T> Call(KernelContext* ctx, Arg0 arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that it's not the negate kernel's responsibility to coerce -0 to 0.
For follow up work: it might be useful to have another kernel which normalizes floating point values by replacing NaNs with nulls, ensuring only positive 0s, etc
| // NOTE [EPM]: Discuss on 0 vs. -0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing to coerce indeed, the FPU should do its job correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think promotion to signed is the correct way to handle this. Only kernels for signed integer types will be included, and when negating an unsigned integer an implicit cast to the next largest signed integer must be performed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, numpy preserves the dtype for unsigned integers:
In [14]: arr = np.array([0, 255], dtype="uint8")
In [15]: -arr
Out[15]: array([0, 1], dtype=uint8)
In [16]: np.negative(arr)
Out[16]: array([0, 1], dtype=uint8)
(not sure that's very useful, though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After careful deliberation on this topic, I think negate should preserve data type. Also, in a mathematical context, negation is not supported for unsigned integrals, so I do not think kernels should be available for the "checked" kernels. For default kernels behavior is to wrap around (apply two's complement in a safe manner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-0 is not distinct from 0 for integral types
| CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]")); | |
| CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, 0]"), ArrayFromJSON(ty, "[0, 0, 0]")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MakeScalar decides the DataType of the scalar based on its argument type, and decltype(-int8_max) is 32 bit signed integer. Adding an explicit cast to 8 bit should fix it
| // NOTE [EPM]: Why do these fail? The expected result is promoted to int32. | |
| // auto int8_max = std::numeric_limits<int8_t>::max(); | |
| // CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max)); | |
| // auto int16_max = std::numeric_limits<int16_t>::max(); | |
| // CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max)); | |
| auto int8_max = std::numeric_limits<int8_t>::max(); | |
| CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(static_cast<int8_t>(-int8_max))); | |
| auto int16_max = std::numeric_limits<int16_t>::max(); | |
| CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(static_cast<int16_t>(-int16_max))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Insertion of implicit casts is accomplished by overriding Function::DispatchBest. For example, to ensure that unsigned types are supported by casting to a compatible unsigned type, use:
| struct UnaryArithmeticFunction : ScalarFunction { | |
| using ScalarFunction::ScalarFunction; | |
| Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override { | |
| RETURN_NOT_OK(CheckArity(*values)); | |
| using arrow::compute::detail::DispatchExactImpl; | |
| if (auto kernel = DispatchExactImpl(this, *values)) return kernel; | |
| EnsureDictionaryDecoded(values); | |
| if (auto type = CommonNumeric({values->at(0), int8()})) { | |
| ReplaceTypes(type, values); | |
| } | |
| if (auto kernel = DispatchExactImpl(this, *values)) return kernel; | |
| return arrow::compute::detail::NoMatchingKernel(this, *values); | |
| } | |
| }; |
(UnaryScalarFunction will replace ScalarFunction below in auto func = std::make_shared<ScalarFunction>(name, Arity::Unary(), doc);)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need UnaryScalarFunction and can't use ScalarFunction as is. Why the CommonNumeric is using int8()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ScalarFunction does not provide implicit casts, such as from unsigned to signed integers. UnaryScalarFunction is provided to add implicit casts including:
uint8 -> int16
uint16 -> int32
uint32 -> int64
uint64 -> int64
dictionary<int32, float> -> float
//...
The call to CommonNumeric with int8 ensures that the output type is signed, with no more widening than necessary. Insertion of implicit casts is tested for the other arithmetic functions using CheckDispatchBest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, nice trick!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please flesh these out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) { | |
| static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg0 arg) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please check inf and NaN (they should work implicitly, but who knows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good corner cases, thanks!
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add questions to the documentation. The documentation is meant to inform users, not to collect TODOs for development.
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tables are alphabetically-ordered, it would be nice to keep them like that.
This draft PR adds unary scalar arithmetic kernels for the negation operation on integral and floating-point types. The kernels are described in the compute package as Negate and NegateChecked structs, and registered with respective names of "negate" and "negate_checked".
@bkietz please review