-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem or challenge?
@Omega359 asked on discord: https://discord.com/channels/885562378132000778/1166447479609376850/1207458257874984970
Q: Is there a way to write out a dataframe to parquet with hive-style partitioning without having to create a table provider? I am pretty sure that a ListingTableProvider or a custom table provider will work but that seems like a ton of config for this
Describe the solution you'd like
I would like to be able to use DataFrame::write_parquet and the other APIs to write partitioned files
I suggest adding the table_partition_cols from ListingOptions as one of the options on https://docs.rs/datafusion/latest/datafusion/dataframe/struct.DataFrameWriteOptions.html
So way to specify partition information would be as described on ListingOptions::with_table_partition_cols
So that would look something like
let options = DataFrameWriteOptions::new()
.with_table_partition_cols(vec![
("col_a".to_string(), DataType::Utf8),
]);
// write the data frame to parquet
// producing files like
// /tmp/my_table/col_a=foo/12345.parquet (data with 'foo' in column a)
// ..
// /tmp/my_table/col_a=zoo/12345.parquet (data with 'zoo' in column a)
df.write_parquet("/tmp/my_table", &options, None).await?Describe alternatives you've considered
No response
Additional context
Possibly related to #8493