Skip to content

[C++] Enable joins when data contains a list column #31180

@asfimport

Description

@asfimport

Currently Arrow joins with data that contain a list column errors, even when the list column is not a join key. Here's an example using the R bindings:

library(arrow)
library(dplyr)

jedi <- data.frame(name = c("C-3PO", "Luke Skywalker"),
                   jedi = c(FALSE, TRUE))

arrow_table(starwars) %>%
  left_join(jedi) %>%
  collect()
#> Error in `handle_csv_read_error()`:
#> ! Invalid: Data type list<item: string> is not supported in join non-key field

The ability to join would be a useful enhancement for workflows with tabular data where list columns can be common, and for geospatial workflows where geometry columns are stored as list or fixed_size_list (thanks @paleolimbot for mentioning that use case).

Related discussion here: ARROW-14519

 

Reporter: Stephanie Hazlitt / @stephhazlitt

Related issues:

Note: This issue was originally created as ARROW-15731. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions