-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
I am trying to join 2 Arrow tables where some columns are of list<float> data type. Note that my join columns/keys are primitive data types and some my non-join columns/keys are of {}list<float>{}. But, PyArrow join() cannot join such as table, although pandas can. It says
ArrowInvalid: Data type list<item: float> is not supported in join non-key field
when I execute this piece of code
joined_table = table_1.join(table_2, ['k1', 'k2', 'k3'])
A stackoverflow response pointed out that Arrow currently cannot handle non-fixed types for joins. Can this be fixed ? Or is this intentional ?
Reporter: Jayjeet Chakraborty / @JayjeetAtGithub
Related issues:
- [C++][Compute] Add scalar_hash function (relates to)
Note: This issue was originally created as ARROW-17216. Please see the migration documentation for further details.