Skip to content

[Python][Dataset] Python / Cython interface to C++ arrow::dataset::Partitioning::Format #43684

@Feiyang472

Description

@Feiyang472

Describe the enhancement requested

Hi Arrow team
We use pyarrow for dataset partitioning. We want to find the relative paths on the filesystem for respective partitioning schemes and segment encodings.

For example, if using hive partitioning, given a filter ("key", "=", "value value"), we would like /key=value value/
another example, if using hive partitioning, and uri segment encoding, given a filter ("key", "=", "value value"), we would like /key=value%20value/
another example, if using directorypartitioning, given a filter ("key", "=", "value value"), we would like /value value/. We are currently composing these paths by hand, but we would like to be resilient to changes/inheritances in arrow implementation.

To achieve this, we would really appreciate if the C++ API

arrow::dataset::Partitioning:Format

could be exposed via cython

CResult[CExpression] Parse(const c_string & path) const

def parse(self, path):

like the arrow::dataset::Partitioning:Parse method.

Thanks in advance for any help or discussion!

Component(s)

Integration, Parquet, Python

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions