ARROW-12861: [C++][Compute] Add sign function kernels #10395

edponce · 2021-05-25T06:29:35Z

This PR adds the sign function to the compute layer as a unary scalar function.

Numeric inputs result in any of (-1,0,1)
+/-0 input returns 0
Infinity is treated as a signed number
NaN input returns NaN

github-actions · 2021-05-25T06:42:18Z

https://issues.apache.org/jira/browse/ARROW-12861

edponce · 2021-05-27T09:23:31Z

The sign function is an arithmetic function, therefore, I added its option allow_signed_zero to the ArithmeticOptions class.
Is this valid? given that sign does not overflows so it will not make use of check_overflow and currently, other compute functions do not use the allow_signed_zero option.

edponce · 2021-05-27T10:43:35Z

The test for double type failed in AMD64-MinGW32 with a RapidJSON error message "Number too big to be stored in double". I am not the first to encounter this error from RapidJSON. I disabled the tests that use std::numeric_limits<double>::max() to check if this is the triggering case.

Indeed this was the case. Not sure of a current solution for this.

bkietz · 2021-05-27T10:57:09Z

I'd say that even if this could be called an arithmetic function it's not ideal to require ArithmeticOptions to carry fields which only apply to sign. Please write SignOptions instead

pitrou · 2021-05-27T11:05:51Z

Can you explain in which context this function is useful? Furthermore, can you explain what the purpose is of having two different variants?

edponce · 2021-05-27T11:22:30Z

While working on the rounding functions, I make use of the signedness of the input for specific rounding modes (e.g., round-half-away-from-zero). After making sign utility functions, I found that many database systems provide such a function (although I do not have a particular use case). The option for treating zero as a signed value was suggested in issue discussion.

pitrou · 2021-05-27T11:26:45Z

@nealrichardson What do you think? Is this useful?

nealrichardson · 2021-05-28T15:37:23Z

I wouldn't say it's the most useful kernel to be adding, if I were prioritizing kernels to write, but it is something that people do. In R for example there is a sign() function: https://stat.ethz.ch/R-manual/R-devel/library/base/html/sign.html, and it's a thing in SQL too.

ianmcook · 2021-06-01T14:05:57Z

It's a thing in NumPy too https://numpy.org/doc/stable/reference/generated/numpy.sign.html

pitrou · 2021-06-07T16:40:46Z

Ok, all three versions of the sign function (Python, R, SQL) return (0,-1,+1), so this function should probably do the same.

pitrou · 2021-06-07T16:46:32Z

docs/source/cpp/compute.rst

Should add a note explaining how the output is computed. You can look at other sections for examples.

I added a "Notes" column to the table, and included notes describing the output values and special case of 0.

pitrou · 2021-06-07T16:47:27Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

These tests which tweak the validity bits by hand are only meaningful if the kernel may return an error, which is not the case here.

Thanks for the explanation. These tests were removed.

pitrou · 2021-06-07T16:47:54Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

Please remove commented out code.

edponce · 2021-06-12T08:30:24Z

The Sign kernels were created with a fixed output type (Int8) using a new kernel generator dispatcher but the tests for DispatchBest fail. The output value from DispatchBest is the same as its input and the input/output values for DispatchExact are always int8. Could someone help with this issue?

pitrou · 2021-06-15T08:27:17Z

@bkietz may have the answer to your question, @edponce .

bkietz · 2021-06-15T19:02:34Z

docs/source/cpp/compute.rst

Suggested change

* \(2) Similar to ``sign`` but zero input is considered as a signed number.

* \(2) Similar to ``sign`` but zero is considered signed for floating point inputs.

The sign_with_signed_zero variant treats 0 as a positive value for integral types, but either -0 or +0 for floating-point types.

Hmm...I agree with this change. The sign_with_signed_zero variant only applies to floating-point numbers resulting in either (-1,1). For integral values, the result will be either of (-1,0,1).

bkietz · 2021-06-15T19:08:39Z

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc

This seems unnecessarily general, especially given we have only "sign". Additionally, note that CheckDispatchBest does not assert the output type: it asserts the type to which arguments must be implicitly cast:

Suggested change

// Functions with fixed output type

struct FuncAndOutType {

std::string name;

std::shared_ptr<DataType> out_type;

};

FuncAndOutType funcs[] = {{"sign", int8()}};

for (const auto& func : funcs) {

for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),

uint64(), float32(), float64()}) {

CheckDispatchBest(func.name, {ty}, {func.out_type});

CheckDispatchBest(func.name, {dictionary(int8(), ty)}, {func.out_type});

}

}

// Sign always outputs to int8

for (std::string name : {"sign", "sign_with_negative_zero"}) {

for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),

uint64(), float32(), float64()}) {

CheckDispatchBest(name, {ty}, {ty});

CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});

}

}

In the example above, CheckDispatchBest(name, {ty}, {int8()}); would imply that we must first cast ty -> int8() inputs before invoking the kernel, which is incorrect. Compare this with CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty}); which asserts that we can't directly extract the sign bit from dictionary encoded data; encoded arrays must first be decoded for the kernel to operate on them.

pitrou · 2021-06-22T09:29:12Z

cpp/src/arrow/compute/api_scalar.cc

I'm still not sure why we need both variants. Is this a user request? @ianmcook ?

Note that we could expose a copysign instead of sign_with_signed_zero, which would be more generic.
(sign_with_signed_zero(x) would simply be copysign(x, 1), AFAICT)

@pitrou The sign kernel is to be used by the rounding function. I agree with the fact that sign_with_signed_zero is equivalent to copysign. Actually, copysign is the variant needed for rounding function.

Nevertheless, sign is a common function exposed in frontend tools, so I consider it worthwhile to provide.

Also, I consider that copysign should not be exposed as a compute function but rather used internally, and instead we should expose sign and sign_with_signed_zero (implementations may use copysign(x, 1). The rationale for this is that copysign(x, 1) is not common in any SQL, dataframes library but sign is.

A use case of sign is to build a custom ranking based on the signedness of values.
For example, consider ranking values relative to a constant value (e.g., 3):

SELECT CASE WHEN @a < 3 THEN 0 WHEN @a = 3 THEN 1 WHEN @a > 3 THEN 2 END

is not supported because expressions are not allowed in CASE-WHEN constructs, so a workaround is

SELECT CASE SIGN(@a - 3) WHEN -1 THEN 0 WHEN 0 THEN 1 WHEN 1 THEN 2 END

Please, can we answer the original question? Why are two variants needed?

After careful thought, there is not a general need for having two variants. The only case where I see that sign_with_signed_zero can be relevant is if data stored is used for calculus-related operations (e.g., integral limits).
I will remove the sign_with_signed_zero variant.

…rror

bkietz

LGTM, thanks!

github-actions bot added the Component: C++ label May 25, 2021

edponce force-pushed the ARROW-12861-Compute-Add-sign-function-kernels branch from f9798bf to 7e3bb1b Compare May 27, 2021 09:09

edponce marked this pull request as ready for review May 27, 2021 09:11

edponce changed the title ~~ARROW-12861: [C++][Compute] Add sign function and tests~~ ARROW-12861: [C++][Compute] Add sign function kernels May 27, 2021

pitrou reviewed Jun 7, 2021

View reviewed changes

cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc Outdated

Copy link

Member

pitrou Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove commented out code.

edponce reacted with thumbs up emoji

edponce force-pushed the ARROW-12861-Compute-Add-sign-function-kernels branch from 447c88b to 236741c Compare June 11, 2021 08:17

bkietz reviewed Jun 15, 2021

View reviewed changes

edponce force-pushed the ARROW-12861-Compute-Add-sign-function-kernels branch from dcfcf61 to 14f3eee Compare June 16, 2021 06:14

edponce requested review from bkietz and pitrou June 16, 2021 18:12

bkietz approved these changes Jun 16, 2021

View reviewed changes

edponce force-pushed the ARROW-12861-Compute-Add-sign-function-kernels branch 2 times, most recently from 39b1b0e to fa4334f Compare June 21, 2021 20:36

edponce mentioned this pull request Jun 22, 2021

ARROW-13096: [C++] Implement logarithm compute functions #10567

Closed

pitrou reviewed Jun 22, 2021

View reviewed changes

edponce force-pushed the ARROW-12861-Compute-Add-sign-function-kernels branch from 8a1acf7 to 1e29103 Compare July 14, 2021 14:35

edponce requested review from bkietz and pitrou July 15, 2021 15:59

edponce added 23 commits July 15, 2021 12:28

add sign function and tests

a50942e

update C++/Python compute docs

41e72dd

add GD for arithmetic kernels with fixed output type

de79512

improve variants of sign function and tests

1052f04

disabled min/max FP tests to check error in AMD64-WinGW32

584a4ee

enable max() test again

df87f08

add SignOptions and update kernel def

0fadccc

update documentation

f7229ea

debugging tests (unstable

50c2c1f

fix kernel and tests

770d6ef

fix lint errors

c1366cf

update based on reviews

6e7bcab

remove JSON format for min/max due to AMD64 Win MinGW32 error

19243a0

change FP test for min/max

089a693

change GD template to use Op instead of ...Args

d158391

clang format

0025fe0

rename generator dispatcher and kernel exec generator

053f468

remove sign_with_signed_zero variant

1c92d27

fix lint error

2f971e0

add support for NaN

9d6a1be

update docs

292a060

change MakeArray to MakeScalar to bypass RapidJSON 'Number too big' e…

f099034

…rror

change XXX comment tag to N.B.

ea19dc6

edponce force-pushed the ARROW-12861-Compute-Add-sign-function-kernels branch from 2d8bab5 to ea19dc6 Compare July 15, 2021 16:28

bkietz approved these changes Jul 15, 2021

View reviewed changes

bkietz closed this in f345287 Jul 15, 2021

asfimport mentioned this pull request Sep 13, 2021

[C++][Compute] Add sign function kernels #28594

Closed

	* \(2) Similar to ``sign`` but zero input is considered as a signed number.
	* \(2) Similar to ``sign`` but zero is considered signed for floating point inputs.

-  // Functions with fixed output type
-  struct FuncAndOutType {
-    std::string name;
-    std::shared_ptr<DataType> out_type;
-  };
-  FuncAndOutType funcs[] = {{"sign", int8()}};
-  for (const auto& func : funcs) {
-    for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),
-                           uint64(), float32(), float64()}) {
-      CheckDispatchBest(func.name, {ty}, {func.out_type});
-      CheckDispatchBest(func.name, {dictionary(int8(), ty)}, {func.out_type});
-    }
-  }
+  // Sign always outputs to int8
+  for (std::string name : {"sign", "sign_with_negative_zero"}) {
+    for (const auto& ty : {int8(), int16(), int32(), int64(), uint8(), uint16(), uint32(),
+                           uint64(), float32(), float64()}) {
+      CheckDispatchBest(name, {ty}, {ty});
+      CheckDispatchBest(name, {dictionary(int8(), ty)}, {ty});
+    }
+  }

ARROW-12861: [C++][Compute] Add sign function kernels #10395

ARROW-12861: [C++][Compute] Add sign function kernels #10395

Uh oh!

Conversation

edponce commented May 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 25, 2021

Uh oh!

edponce commented May 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edponce commented May 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkietz commented May 27, 2021

Uh oh!

pitrou commented May 27, 2021

Uh oh!

edponce commented May 27, 2021

Uh oh!

pitrou commented May 27, 2021

Uh oh!

nealrichardson commented May 28, 2021

Uh oh!

ianmcook commented Jun 1, 2021

Uh oh!

pitrou commented Jun 7, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

edponce commented Jun 12, 2021

Uh oh!

pitrou commented Jun 15, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edponce commented May 25, 2021 •

edited

Loading

edponce commented May 27, 2021 •

edited

Loading

edponce commented May 27, 2021 •

edited

Loading