Skip to content

Use parquet read speed-ups from fastparquet.api.paths_to_cats.#5821

Merged
TomAugspurger merged 3 commits intodask:masterfrom
ig248:faster-fastparquet
Feb 5, 2020
Merged

Use parquet read speed-ups from fastparquet.api.paths_to_cats.#5821
TomAugspurger merged 3 commits intodask:masterfrom
ig248:faster-fastparquet

Conversation

@ig248
Copy link
Contributor

@ig248 ig248 commented Jan 22, 2020

This PR propagates parquet partition read speed-ups merged in dask/fastparquet#471

Before this change, code was duplicated from fastparquet.
After this change, an optimized function is imported from fastparquet.

Since fastparquet versions range is not explicit in dask,
the import is for now made optional, reverting to existing implementation if
using older fastparquet.

@rjzamora I am not entirely happy with "try-catch imports" and would rather have fastparquet as e.g. an optional dependency with well-specced version ranges, but for now this could be a work-around.

  • Tests added / passed
  • Passes black dask / flake8 dask

Potentially related issues

#5272
#4701

Before this change, code was duplicated from fastparquet.
After this change, an optimized function is imported from fastparquet following
dask/fastparquet#471

Since fastparquet versions range is not explicit in dask,
the import is for now made optional, reverting to existing implementation if
using older fastparquet.
@ig248 ig248 requested a review from rjzamora January 22, 2020 16:41
@ig248 ig248 changed the title Import optimized fastparquet.api.paths_to_cats. Use parquet read speed-ups from fastparquet.api.paths_to_cats. Jan 22, 2020
Copy link
Member

@rjzamora rjzamora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - Thanks for the nice contribution here @ig248

@TomAugspurger
Copy link
Member

OK, I've released fastparquet 0.3.3 to PyPI. CF will start building in a bit.

For now, let's merge this. I opened #5865 as a followup for removing this compat code.

@TomAugspurger TomAugspurger merged commit 2986f75 into dask:master Feb 5, 2020
@TomAugspurger
Copy link
Member

Thanks @ig248!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants