Skip to content

Conversation

@lidavidm
Copy link
Member

No description provided.

@github-actions
Copy link

@lidavidm lidavidm force-pushed the arrow-13072 branch 2 times, most recently from 9bfb1e9 to be4fb7f Compare June 14, 2021 21:02
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it's not a checked kernel, I think it's still necessary to make sure no UB can happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For unsigned char 128, arithmetic right shift 1 bit results to 192, looks a bit surprising.
Looks arithmetic right shift is more meaningful for signed integers, and logical right shift for unsigned integers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I cannot understand the usefulness of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I wanted to have it explicit which type of shift was being done but I can let it instead depend on the signedness of the input and condense it into a single kernel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since logical and arithmetic left shift are the same, is it useful to precise which one this is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why lhs cannot be negative? Left-shifting a negative number works fine (unless the value overflows, of course).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A left shift is undefined on a negative number.

if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined

C++11 standard 5.8.2 (page 118) or see the MSVC page

Copy link
Member

@pitrou pitrou Jun 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yawn. The C++ spec is completely stupid on this (just like the C spec).

Left-shifting of negative numbers is well-defined in Python:

>>> -5 << 1
-10
>>> -5 << 3
-40

Also in Java, where the definition is based on two's complement representation:
https://docs.oracle.com/javase/specs/jls/se7/html/jls-15.html#jls-15.19

The Rust reference doesn't say anything about left-shifting of negative numbers (either here or there), but a quick test seems to work. (edit: I tried to check for UB using Miri, but failed to install it)

C# does not say anything specific about left-shifting of negative numbers, so I presume it's well defined and falls under the general definition.

Same for EcmaScript.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could cast to and from unsigned to do the shift; that would sidestep the undefinedness and give the equivalent result (assuming a two's complement machine which I think we are ok assuming).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I think we should support a well-defined, obvious, definition of left-shifting negative numbers just like most modern programming languages. The implementation should just be a matter of using the corresponding unsigned operation, AFAICT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also explicitly mask the shift amount (as Java specifies, and as x86 does - ARM does not)? Then the only check would be effectively that the shift amount is not negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems be no consensus among SQL databases. Let's err on the side of caution and return an error if the shift value is out bounds (including negative).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just sizeof(CType)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a code organization POV, I think it would be more helpful to gather all shift tests close to each other.

Copy link
Contributor

@cyb70289 cyb70289 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the error message more precise? A priori, it's not obvious which argument overflows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any point in mentioning that the function never fails? This is true of several other functions as well.

@lidavidm lidavidm force-pushed the arrow-13072 branch 3 times, most recently from c4b19ae to 8f47f9f Compare June 24, 2021 12:38
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, will merge if green

@lidavidm
Copy link
Member Author

I feel like I've been seeing that datasets test flake. I filed ARROW-13198.

@pitrou pitrou closed this in e9fa304 Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants