Skip to content

Handle unnamed pandas RangeIndex in fastparquet engine#6350

Merged
TomAugspurger merged 3 commits intodask:masterfrom
rjzamora:fix-6348
Jun 29, 2020
Merged

Handle unnamed pandas RangeIndex in fastparquet engine#6350
TomAugspurger merged 3 commits intodask:masterfrom
rjzamora:fix-6348

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented Jun 26, 2020

Closes #6348

Modifies FastParquetEngine to use the "pandas metadata" to handle round-tripping unnamed RangeIndex columns from pandas.

  • Tests added / passed
  • Passes black dask / flake8 dask

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @martindurant. Merging later today if there's no objections.

rg_piece = pf.row_groups[piece]
pf.fmd.key_value_metadata = None
if null_index_name:
if "__index_level_0__" in pf.columns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may wish to add comments here, to explain this.
It may be nice to upstream this to fastparquet, not urgent.

@martindurant
Copy link
Member

+1

@rjzamora
Copy link
Member Author

Hmm - Seems that CI broke after a comment-only commit here.

@TomAugspurger
Copy link
Member

Strange. I restarted that job.

@TomAugspurger TomAugspurger merged commit ad5006f into dask:master Jun 29, 2020
@TomAugspurger
Copy link
Member

Thanks @rjzamora!

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
* handle null-named rangeindex in fastparquet
@rjzamora rjzamora deleted the fix-6348 branch May 21, 2024 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in index when using fastparquet

3 participants