Skip to content

Python: Write Parquet file using PyArrow #7256

@Fokko

Description

@Fokko

Feature Request / Improvement

When writing data using Python, we want to leverage external dependencies that write the actual data. This is because there are many libraries out there that do a great job, and we don't want to reinvent the wheel.

I think we can mirror the read path, where we use pq.read_table, we can use pq.write_table, and leverage the metadata_collector to collect the statistics that need to be stored in the ManifestEntry.

To make the PR small, I would be to start with unpartitioned tables only, to avoid having to construct the right path.

Query engine

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions