-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Python][Dataset] Python / Cython interface to C++ arrow::dataset::Partitioning::Format #43684
Description
Describe the enhancement requested
Hi Arrow team
We use pyarrow for dataset partitioning. We want to find the relative paths on the filesystem for respective partitioning schemes and segment encodings.
For example, if using hive partitioning, given a filter ("key", "=", "value value"), we would like /key=value value/
another example, if using hive partitioning, and uri segment encoding, given a filter ("key", "=", "value value"), we would like /key=value%20value/
another example, if using directorypartitioning, given a filter ("key", "=", "value value"), we would like /value value/. We are currently composing these paths by hand, but we would like to be resilient to changes/inheritances in arrow implementation.
To achieve this, we would really appreciate if the C++ API
arrow::dataset::Partitioning:Format
could be exposed via cython
| CResult[CExpression] Parse(const c_string & path) const |
arrow/python/pyarrow/_dataset.pyx
Line 2492 in 712cfe6
| def parse(self, path): |
like the
arrow::dataset::Partitioning:Parse method.
Thanks in advance for any help or discussion!
Component(s)
Integration, Parquet, Python