-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-12657: [C++] Adding String hex to numeric conversion #11161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a test case for hex with negative sign (eg., -0x34) and another for multiple prefixed zeros (eg., 00000x7f).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those cases are explicitly not allowed. Do you mean that I should add them to CheckCastFails?
A negative before a hex does not make sense. If the value can be negative (signed integer) then you should use the binary representation of that negative number, for example for int8 0xFF is -1 and 0xF0 is -128.
For the case of multiple prefixed zeros, I opted for not allowing that to be allowed, since allowing it would make the logic more complex (probably slower) and it seems like it would be a very odd way for a hex string to be formatted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, sorry...I meant as failing test cases. It is just a way to ensure more code coverage in tests.
cpp/src/arrow/util/value_parsing.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify to return ARROW_PREDICT_TRUE(ParseHex(s, length, out));.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
LGTM. I added some final minor suggestions. |
cpp/src/arrow/util/value_parsing.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lookup table approach may have better performance.
We can leave it as a followup task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth investigating, but I would request we leave it as followup task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wmalpica. This looks good to me, will merge if CI passes.
e0e3f9f to
18e8e96
Compare
|
The LZ4 build failures on MinGW are unrelated. |
This PR adds the ability for the StringConverter to parse hex strings of the form: 0x123abc to an integer. The strings must start with "0x" case insensitive. The values ABCDEF are case insensitive. Note that there was already a Hex Parsing function here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/string.cc#L76 Which was not really flexible enough for what was needed. Not sure it it would be best to replace it with the new functionality added by this PR. I am open to suggestions This work was originally done here apache#11064 but had to create a new PR due to a rebase issue Closes apache#11161 from wmalpica/ARROW-12657_3 Lead-authored-by: William Malpica <16705032+wmalpica@users.noreply.github.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>
This PR adds the ability for the StringConverter to parse hex strings of the form:
0x123abc to an integer.
The strings must start with "0x" case insensitive.
The values ABCDEF are case insensitive.
Note that there was already a Hex Parsing function here:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/string.cc#L76
Which was not really flexible enough for what was needed.
Not sure it it would be best to replace it with the new functionality added by this PR. I am open to suggestions
This work was originally done here #11064 but had to create a new PR due to a rebase issue