-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6084: [Python] Support LargeList #4979
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyarrow/tests/strategies.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kszucs Does this look right?
8ccd61f to
b5ae30c
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
python/pyarrow/scalar.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not possible to have this subclass ListValue to reduce code duplication? (getitem, iter, as_py look the same)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the C++ types are different (e.g. ListArray vs. LargeListArray), and those need to be compile-time constants for Cython.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code like this makes me wish for some kind of macro system in Cython.
|
Again, while trying it out this branch, I noticed some oddities. But now they are certainly not related to the code in this PR, as they also hold for non-large ListType. When creating an array from offsets and values in python, there is no validation of the offsets that it starts with 0 and ends with the length of the array (but is that required? the docs seem to indicate that: https://github.com/apache/arrow/blob/master/docs/source/format/Layout.rst#list-type ("The first value in the offsets array is 0, and the last element is the length of the values array."). I understand that in C++ the main constructors are not safe, and as the caller you need to ensure that the data is correct or call a safe (slower) constructor. But do we want to use the unsafe / fast constructors without validation in Python as default as well? |
|
Ah, I see that in Python a validation function is exposed, and this actually raises: Do we want to call that by default in the creation method? |
|
On the C++ side indeed, calling validation for each constructor would be expensive. We can adopt a different strategy for Python. Can you open a JIRA about that? |
e87eef5 to
3c38574
Compare
|
@wesm Do you want to take a look here or can I merge as-is? (assuming the R AppVeyor failure isn't related) |
3c38574 to
4266ea2
Compare
|
Rebased. |
Codecov Report
@@ Coverage Diff @@
## master #4979 +/- ##
==========================================
+ Coverage 87.57% 89.17% +1.6%
==========================================
Files 1005 727 -278
Lines 143560 103008 -40552
Branches 1418 0 -1418
==========================================
- Hits 125720 91862 -33858
+ Misses 17478 11146 -6332
+ Partials 362 0 -362
Continue to review full report at Codecov.
|
|
Taking a look |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
python/pyarrow/scalar.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code like this makes me wish for some kind of macro system in Cython.
No description provided.