Skip to content

Conversation

@wmalpica
Copy link
Contributor

This PR adds the ability for the StringConverter to parse hex strings of the form:
0x123abc to an integer.
The strings must start with "0x" case insensitive.
The values ABCDEF are case insensitive.

Note that there was already a Hex Parsing function here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/string.cc#L76
Which was not really flexible enough for what was needed.
Not sure it it would be best to replace it with the new functionality added by this PR. I am open to suggestions

This work was originally done here #11064 but had to create a new PR due to a rebase issue

@github-actions
Copy link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case for hex with negative sign (eg., -0x34) and another for multiple prefixed zeros (eg., 00000x7f).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those cases are explicitly not allowed. Do you mean that I should add them to CheckCastFails?
A negative before a hex does not make sense. If the value can be negative (signed integer) then you should use the binary representation of that negative number, for example for int8 0xFF is -1 and 0xF0 is -128.
For the case of multiple prefixed zeros, I opted for not allowing that to be allowed, since allowing it would make the logic more complex (probably slower) and it seems like it would be a very odd way for a hex string to be formatted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, sorry...I meant as failing test cases. It is just a way to ensure more code coverage in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify to return ARROW_PREDICT_TRUE(ParseHex(s, length, out));.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@edponce
Copy link
Contributor

edponce commented Sep 15, 2021

LGTM. I added some final minor suggestions.

Comment on lines +284 to +294
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lookup table approach may have better performance.
We can leave it as a followup task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth investigating, but I would request we leave it as followup task.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wmalpica. This looks good to me, will merge if CI passes.

@pitrou
Copy link
Member

pitrou commented Sep 16, 2021

The LZ4 build failures on MinGW are unrelated.

@pitrou pitrou closed this in 012248a Sep 16, 2021
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
This PR adds the ability for the StringConverter to parse hex strings of the form:
0x123abc to an integer.
The strings must start with "0x" case insensitive.
The values ABCDEF are case insensitive.

Note that there was already a Hex Parsing function here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/string.cc#L76
Which was not really flexible enough for what was needed.
Not sure it it would be best to replace it with the new functionality added by this PR. I am open to suggestions

This work was originally done here apache#11064 but had to create a new PR due to a rebase issue

Closes apache#11161 from wmalpica/ARROW-12657_3

Lead-authored-by: William Malpica <16705032+wmalpica@users.noreply.github.com>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants