Re-integrate `pyarrow` `write_to_dataset` instead of `arrow._write_partitioned` in `dask`

**Describe the issue**:

Because of https://github.com/apache/arrow/issues/24440 (https://issues.apache.org/jira/browse/ARROW-8244), dask is using its own code to write a table to a partitioned pyarrow dataset. The code lives here:

https://github.com/dask/dask/blob/3124376d68bd3a15d381ce803ca066c7eef4c24f/dask/dataframe/io/parquet/arrow.py#L89-L101

Since this code was [added in March 2020](https://github.com/dask/dask/pull/6023), `pyarrow` came a long way. The original issue was [fixed in pyarrow in April 2020](https://github.com/apache/arrow/commit/ac3bfe47821cb8368f657860f115e88077eaf64d). There's a TODO in Dask code to re-integrate back `write_to_dataset`, we still need to do it. The custom code already required multiple bugfixes, and we can't be sure that it's up to date with `pyarrow`.

- Dask version: 2023.2.0+11.g0890b96b

cc @rjzamora .

	def _write_partitioned(
	table,
	df,
	root_path,
	filename,
	partition_cols,
	fs,
	pandas_to_arrow_table,
	preserve_index,
	index_cols=(),
	return_metadata=True,
	**kwargs,
	):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-integrate `pyarrow` `write_to_dataset` instead of `arrow._write_partitioned` in `dask` #9968

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Re-integrate pyarrow write_to_dataset instead of arrow._write_partitioned in dask #9968

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Re-integrate `pyarrow` `write_to_dataset` instead of `arrow._write_partitioned` in `dask` #9968