Skip to content

Conversation

@xhochy
Copy link
Contributor

@xhochy xhochy commented Oct 26, 2017

I'm not happy with the lines added to _read_arrow_parquet_piece. We may need to add tests for MultiIndex in the future to ensure that they also work.

  • Tests added / passed
  • Passes flake8 dask
  • Fully documented, including docs/source/changelog.rst for all changes
    and one of the docs/source/*-api.rst files for new API

cc @wesm @fjetter

pandas_metadata = json.loads(schema.metadata[b'pandas'].decode('utf8'))
index_col = pandas_metadata.get('index_columns', None)
else:
index_col = index
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can any of this metadata wrangling be pushed down into pyarrow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retrieval of the index column should be definitely pushed down to pyarrow. Currently there is no such functionality on the ParquetDataset. I can add that for future releases. This PR mainly covers the parts where we can satisfy the tests without developing new features in pyarrow.

@mrocklin
Copy link
Member

From my perspective this looks good. I suspect that @martindurant would have a more discerning eye here.

@martindurant
Copy link
Member

Not much to say, looks fair enough.
Do we test multi-column indexes at all? I notice you added a specific provision for this.

@xhochy
Copy link
Contributor Author

xhochy commented Oct 30, 2017

@martindurant No, I have not seen any multi-column index tests yet in the Parquet engine tests. We should have a look at them in a follow-up PR.

@mrocklin
Copy link
Member

I plan to merge this tomorrow if there are no further comments.

@mrocklin mrocklin merged commit f0f0c86 into dask:master Nov 1, 2017
@mrocklin
Copy link
Member

mrocklin commented Nov 1, 2017

Merged. Thanks @xhochy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants