GH-21761: [Python] accept pyarrow scalars in array constructor#36162
Conversation
|
I am not sure why linter is failing on this PR. I will try to rebase and see if it helps. |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
08297d0 to
c84c3a4
Compare
|
The tests failing already have an open issue: #35728, so I think this PR is ready for review. One missing thing are the test for sets where the elements in the set get reordered when getting passed to the arrow/python/pyarrow/tests/test_convert_builtin.py Lines 2448 to 2449 in 3da9ba6 |
|
The failing tests are all due to a known issue with |
|
Will merge this today as the CI is green 👍 |
|
Thanks Alenka! |
|
Conbench analyzed the 6 benchmark runs on commit There were 4 benchmark results indicating a performance regression:
The full Conbench report has more details. |
It looks like soon after I started investigating scalar conversions for #14121 (but well before I made the PR) a major underlying hole was plugged in pyarrow via apache/arrow#36162. Most of #14121 was created to give us a way to handle scalars from pyarrow generically in libcudf. Now that pyarrow scalars can be easily tossed into arrays, we no longer really need separate scalar functions in libcudf; we can simply create an array from the scalar, put it into a table, and then call the table function. Additionally, arrow also has a function for creating an array from a scalar. This function is not new but [was previously undocumented](apache/arrow#40373). The builder code added to libcudf in #14121 can be removed and replaced with that factory. The scalar conversion is as simple as calling that arrow function and then using our preexisting `from_arrow` function on the resulting array. For now this PR is just a simplification of internals. Future PRs will remove the scalar API once we have a more standard path for the conversion of arrays via the C Data Interface. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Yunsong Wang (https://github.com/PointKernel) - David Wendt (https://github.com/davidwendt) - Bradley Dice (https://github.com/bdice) URL: #15213
Rationale for this change
Currently,
pyarrow.arraydoesn't accept list of pyarrow Scalars and this PR adds a check to allow that.