-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Hi
It would be desirable to have the ability to obtain a data frame with the unique combinations, say
open_dataset("sitc-rev2/parquet/",
partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
select(Year, `Reporter ISO`) %>%
filter(Year >= 1988 & Year <= 1994) %>%
distinct() %>%
collect()However, in the current development version of the Arrow package (installed from GitHub), we get this error for the last expression
Error in UseMethod("distinct") :
no applicable method for 'distinct' applied to an object of class "arrow_dplyr_query"This works
reporters_1 <- open_dataset("sitc-rev2/parquet/",
partitioning = c("Year", "Trade Flow", "Reporter ISO")) %>%
select(Year, `Reporter ISO`) %>%
filter(Year >= 1988 & Year <= 1994) %>%
collect() %>%
distinct()Reporter: Mauricio 'Pachá' Vargas Sepúlveda / @pachadotdev
Related issues:
- [R] Support for dplyr::distinct() (duplicates)
Note: This issue was originally created as ARROW-13107. Please see the migration documentation for further details.