ARROW-11950: [C++][Compute] Add unary negative kernel #10016

edponce · 2021-04-13T19:16:16Z

This draft PR adds unary scalar arithmetic kernels for the negation operation on integral and floating-point types. The kernels are described in the compute package as Negate and NegateChecked structs, and registered with respective names of "negate" and "negate_checked".

@bkietz please review

edponce · 2021-04-13T19:36:20Z

The following are pending details to be resolved with this PR:

How to handle 0, +0, -0?
- IEEE754 defines signed/unsigned FP zero and although they should be logically equal, they can produce different results on certain operations. For example, 1/-0 = -inf and 1/+0 = +inf.
- Integral signed/unsigned zero?
How to handle unsigned integers?
- Wrap around as described in https://en.cppreference.com/w/cpp/language/implicit_conversion
Test cases for int8 and int16 fail because expected result is implicitly promoted to int32. Not sure if this promotion occurs in testing framework or C++ rules.

@bkietz @pitrou

github-actions · 2021-04-14T01:42:55Z

https://issues.apache.org/jira/browse/ARROW-11950

bkietz

Overall this is looking good. Please add tests for negate_checked

bkietz · 2021-04-14T15:34:03Z

cpp/src/arrow/compute/kernels/codegen_internal.h

This alias doesn't add anything, please revert it.

bkietz · 2021-04-14T15:43:56Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Suggested change

T result = 0;

// NOTE [EPM]: Check this edge case of overflow. What are we trying to check here?

if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) {

ctx->SetStatus(Status::Invalid("overflow"));

}

return result;

if (arg == std::numeric_limits<T>::min()) {

// two's complement can represent a negative number which has no corresponding positive,

// for example int8_t(-128) cannot be negated since 128 is not respresentable in int8_t

ctx->SetStatus(Status::Invalid("overflow"));

return 0;

}

return -arg;

bkietz · 2021-04-14T15:45:12Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Suggested change

static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) {

static enable_if_signed_integer<T> Call(KernelContext* ctx, Arg0 arg) {

bkietz · 2021-04-14T15:50:12Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

I would say that it's not the negate kernel's responsibility to coerce -0 to 0.
For follow up work: it might be useful to have another kernel which normalizes floating point values by replacing NaNs with nulls, ensuring only positive 0s, etc

Suggested change

// NOTE [EPM]: Discuss on 0 vs. -0.

There is nothing to coerce indeed, the FPU should do its job correctly.

bkietz · 2021-04-14T15:57:46Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

I think promotion to signed is the correct way to handle this. Only kernels for signed integer types will be included, and when negating an unsigned integer an implicit cast to the next largest signed integer must be performed first.

For reference, numpy preserves the dtype for unsigned integers:

In [14]: arr = np.array([0, 255], dtype="uint8") In [15]: -arr Out[15]: array([0, 1], dtype=uint8) In [16]: np.negative(arr) Out[16]: array([0, 1], dtype=uint8)

(not sure that's very useful, though)

After careful deliberation on this topic, I think negate should preserve data type. Also, in a mathematical context, negation is not supported for unsigned integrals, so I do not think kernels should be available for the "checked" kernels. For default kernels behavior is to wrap around (apply two's complement in a safe manner).

bkietz · 2021-04-14T16:23:01Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

-0 is not distinct from 0 for integral types

Suggested change

CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]"));

CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, 0]"), ArrayFromJSON(ty, "[0, 0, 0]"));

bkietz · 2021-04-14T16:27:05Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

MakeScalar decides the DataType of the scalar based on its argument type, and decltype(-int8_max) is 32 bit signed integer. Adding an explicit cast to 8 bit should fix it

Suggested change

// NOTE [EPM]: Why do these fail? The expected result is promoted to int32.

// auto int8_max = std::numeric_limits<int8_t>::max();

// CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max));

// auto int16_max = std::numeric_limits<int16_t>::max();

// CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max));

auto int8_max = std::numeric_limits<int8_t>::max();

CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(static_cast<int8_t>(-int8_max)));

auto int16_max = std::numeric_limits<int16_t>::max();

CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(static_cast<int16_t>(-int16_max)));

bkietz · 2021-04-14T16:33:23Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Insertion of implicit casts is accomplished by overriding Function::DispatchBest. For example, to ensure that unsigned types are supported by casting to a compatible unsigned type, use:

Suggested change

struct UnaryArithmeticFunction : ScalarFunction {

using ScalarFunction::ScalarFunction;

Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override {

RETURN_NOT_OK(CheckArity(*values));

using arrow::compute::detail::DispatchExactImpl;

if (auto kernel = DispatchExactImpl(this, *values)) return kernel;

EnsureDictionaryDecoded(values);

if (auto type = CommonNumeric({values->at(0), int8()})) {

ReplaceTypes(type, values);

}

if (auto kernel = DispatchExactImpl(this, *values)) return kernel;

return arrow::compute::detail::NoMatchingKernel(this, *values);

}

};

(UnaryScalarFunction will replace ScalarFunction below in auto func = std::make_shared<ScalarFunction>(name, Arity::Unary(), doc);)

Not sure why we need UnaryScalarFunction and can't use ScalarFunction as is. Why the CommonNumeric is using int8()?

ScalarFunction does not provide implicit casts, such as from unsigned to signed integers. UnaryScalarFunction is provided to add implicit casts including:

uint8 -> int16 uint16 -> int32 uint32 -> int64 uint64 -> int64 dictionary<int32, float> -> float //...

The call to CommonNumeric with int8 ensures that the output type is signed, with no more widening than necessary. Insertion of implicit casts is tested for the other arithmetic functions using CheckDispatchBest

Got it, nice trick!

bkietz · 2021-04-14T16:35:05Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

Please flesh these out

bkietz · 2021-04-14T16:37:54Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Suggested change

static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) {

static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg0 arg) {

pitrou · 2021-04-14T17:03:03Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

Also please check inf and NaN (they should work implicitly, but who knows).

Good corner cases, thanks!

pitrou · 2021-04-14T17:03:48Z

docs/source/cpp/compute.rst

Please don't add questions to the documentation. The documentation is meant to inform users, not to collect TODOs for development.

pitrou · 2021-04-14T17:04:22Z

docs/source/cpp/compute.rst

These tables are alphabetically-ordered, it would be nice to keep them like that.

github-actions bot added the Component: C++ label Apr 14, 2021

bkietz self-requested a review April 14, 2021 15:32

bkietz requested changes Apr 14, 2021

View reviewed changes

pitrou reviewed Apr 14, 2021

View reviewed changes

docs/source/cpp/compute.rst Outdated

Copy link

Member

pitrou Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tables are alphabetically-ordered, it would be nice to keep them like that.

bkietz marked this pull request as ready for review April 16, 2021 15:00

edponce closed this Apr 20, 2021

edponce force-pushed the master branch from b4806a0 to 930c381 Compare April 20, 2021 14:32

cyb70289 mentioned this pull request Apr 21, 2021

ARROW-11950: [C++][Compute] Add unary negative kernel #10113

Closed

asfimport mentioned this pull request May 12, 2021

[C++][Compute] Add unary negative kernel #27785

Closed

-    T result = 0;
-    // NOTE [EPM]: Check this edge case of overflow. What are we trying to check here?
-    if (ARROW_PREDICT_FALSE(SubtractWithOverflow(0, arg, &result))) {
-      ctx->SetStatus(Status::Invalid("overflow"));
-    }
-    return result;
+    if (arg == std::numeric_limits<T>::min()) {
+      // two's complement can represent a negative number which has no corresponding positive,
+      // for example int8_t(-128) cannot be negated since 128 is not respresentable in int8_t
+      ctx->SetStatus(Status::Invalid("overflow"));
+      return 0;
+    }
+    return -arg;

	static enable_if_integer<T> Call(KernelContext* ctx, Arg0 arg) {
	static enable_if_signed_integer<T> Call(KernelContext* ctx, Arg0 arg) {

	CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, -0]"), ArrayFromJSON(ty, "[0, -0, 0]"));
	CheckScalarUnary("negate", ArrayFromJSON(ty, "[0, 0, 0]"), ArrayFromJSON(ty, "[0, 0, 0]"));

-  // NOTE [EPM]: Why do these fail? The expected result is promoted to int32.
-  // auto int8_max = std::numeric_limits<int8_t>::max();
-  // CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(-int8_max));
-  // auto int16_max = std::numeric_limits<int16_t>::max();
-  // CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(-int16_max));
+  auto int8_max = std::numeric_limits<int8_t>::max();
+  CheckScalarUnary("negate", MakeScalar(int8_max), MakeScalar(static_cast<int8_t>(-int8_max)));
+  auto int16_max = std::numeric_limits<int16_t>::max();
+  CheckScalarUnary("negate", MakeScalar(int16_max), MakeScalar(static_cast<int16_t>(-int16_max)));

+struct UnaryArithmeticFunction : ScalarFunction {
+  using ScalarFunction::ScalarFunction;
+  Result<const Kernel*> DispatchBest(std::vector<ValueDescr>* values) const override {
+    RETURN_NOT_OK(CheckArity(*values));
+    using arrow::compute::detail::DispatchExactImpl;
+    if (auto kernel = DispatchExactImpl(this, *values)) return kernel;
+    EnsureDictionaryDecoded(values);
+    if (auto type = CommonNumeric({values->at(0), int8()})) {
+      ReplaceTypes(type, values);
+    }
+    if (auto kernel = DispatchExactImpl(this, *values)) return kernel;
+    return arrow::compute::detail::NoMatchingKernel(this, *values);
+  }
+};

	static constexpr enable_if_integer<T> Call(KernelContext*, Arg0 arg) {
	static constexpr enable_if_signed_integer<T> Call(KernelContext*, Arg0 arg) {

ARROW-11950: [C++][Compute] Add unary negative kernel #10016

ARROW-11950: [C++][Compute] Add unary negative kernel #10016

Uh oh!

Conversation

edponce commented Apr 13, 2021

Uh oh!

edponce commented Apr 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 14, 2021

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

edponce commented Apr 13, 2021 •

edited

Loading