-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
dataframeioneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.parquet
Description
dask.dataframe.read_parquet accepts a list of paths as input. It uses natural_sort_key under the hood to sort the paths in several places. In my case parquet files are not in the natural order and I'd like to sort them outside and keep the same order inside the read_parquet, so I do monkey patching like this to make it work:
with (mock.patch.object(dask.dataframe.io.parquet.core, 'natural_sort_key', new=lambda p: p),
mock.patch.object(dask.dataframe.io.parquet.utils, 'natural_sort_key', new=lambda p: p)):
return dd.read_parquet(paths, index='index', calculate_divisions=True)
It would be great if there was a parameter to control the sort key (disable natural_sort_key or provide my own).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataframeioneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.parquet