-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-13072: [C++] Add bit-wise arithmetic kernels #10530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9bfb1e9 to
be4fb7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it's not a checked kernel, I think it's still necessary to make sure no UB can happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For unsigned char 128, arithmetic right shift 1 bit results to 192, looks a bit surprising.
Looks arithmetic right shift is more meaningful for signed integers, and logical right shift for unsigned integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I cannot understand the usefulness of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I wanted to have it explicit which type of shift was being done but I can let it instead depend on the signedness of the input and condense it into a single kernel.
cpp/src/arrow/compute/api_scalar.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since logical and arithmetic left shift are the same, is it useful to precise which one this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why lhs cannot be negative? Left-shifting a negative number works fine (unless the value overflows, of course).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A left shift is undefined on a negative number.
if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined
C++11 standard 5.8.2 (page 118) or see the MSVC page
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yawn. The C++ spec is completely stupid on this (just like the C spec).
Left-shifting of negative numbers is well-defined in Python:
>>> -5 << 1
-10
>>> -5 << 3
-40Also in Java, where the definition is based on two's complement representation:
https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.19
The Rust reference doesn't say anything about left-shifting of negative numbers (either here or there), but a quick test seems to work. (edit: I tried to check for UB using Miri, but failed to install it)
C# does not say anything specific about left-shifting of negative numbers, so I presume it's well defined and falls under the general definition.
Same for EcmaScript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could cast to and from unsigned to do the shift; that would sidestep the undefinedness and give the equivalent result (assuming a two's complement machine which I think we are ok assuming).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In any case, I think we should support a well-defined, obvious, definition of left-shifting negative numbers just like most modern programming languages. The implementation should just be a matter of using the corresponding unsigned operation, AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also explicitly mask the shift amount (as Java specifies, and as x86 does - ARM does not)? Then the only check would be effectively that the shift amount is not negative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems be no consensus among SQL databases. Let's err on the side of caution and return an error if the shift value is out bounds (including negative).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just sizeof(CType)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a code organization POV, I think it would be more helpful to gather all shift tests close to each other.
cyb70289
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the error message more precise? A priori, it's not obvious which argument overflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any point in mentioning that the function never fails? This is true of several other functions as well.
c4b19ae to
8f47f9f
Compare
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, will merge if green
|
I feel like I've been seeing that datasets test flake. I filed ARROW-13198. |
No description provided.