-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6000: [Python] Add support for LargeString and LargeBinary types #4927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ursabot build |
3bdd5c9 to
63b4dfa
Compare
Codecov Report
@@ Coverage Diff @@
## master #4927 +/- ##
==========================================
+ Coverage 87.5% 87.93% +0.43%
==========================================
Files 998 908 -90
Lines 141869 133088 -8781
Branches 1418 1418
==========================================
- Hits 124139 117036 -7103
+ Misses 17368 16042 -1326
+ Partials 362 10 -352
Continue to review full report at Codecov.
|
63b4dfa to
022173e
Compare
022173e to
aaf009e
Compare
aaf009e to
9672ca4
Compare
|
@jorisvandenbossche Want to take a look? |
|
Yes, was planning to do that, thanks for the ping! I played a bit with the branch (mainly with large_string), some things I noticed (to be clear: I suppose most of them are expected to not yet working, so not wanting to say they should be fixed in this PR, but some items might be worth JIRA issues):
Will take a look at the actual code now. |
|
Ok, some answers:
|
|
Ah, I see you already opened https://issues.apache.org/jira/browse/ARROW-6071 for the casting yesterday
Yes, that's something I could look into. |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already went through the python part (need to run now), which looks good.
| Create large variable-length binary type | ||
| This data type may not be supported by all Arrow implementations. Unless | ||
| you need to represent data larger than 2GB, you should prefer binary(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we clarify this is about the total size of a single array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, forget, a single value bigger than 2GB of course implies an array that is bigger than 2GB .. ;), so the distinction does not matter much.
|
Ok, I've pushed a fix for the |
|
Will merge soon if no more comments. |
| public: | ||
| static constexpr Type::type type_id = Type::STRING; | ||
| static constexpr bool is_utf8 = true; | ||
| using EquivalentBinaryType = BinaryType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fsaintjacques Any issue with the naming?
Have to see a bit with my priorities, but at least created an issue for it: https://issues.apache.org/jira/browse/ARROW-6115 |
Also fix a bug in Take / Filter for large binary types.