Skip to content

ARROW-8244: [Python] Fix parquet.write_to_dataset to set file path in metadata_collector#6797

Closed
jorisvandenbossche wants to merge 2 commits intoapache:masterfrom
jorisvandenbossche:ARROW-8244
Closed

ARROW-8244: [Python] Fix parquet.write_to_dataset to set file path in metadata_collector#6797
jorisvandenbossche wants to merge 2 commits intoapache:masterfrom
jorisvandenbossche:ARROW-8244

Conversation

@jorisvandenbossche
Copy link
Member

This explores a potential fix for ARROW-8244, it seems rather straightforward to set the file path in write_to_dataset (write_table does not do this, because there the user passes a full path, so no relative path is known).

cc @rjzamora does this look the correct logic?

@github-actions
Copy link

github-actions bot commented Apr 1, 2020

@rjzamora
Copy link
Contributor

rjzamora commented Apr 1, 2020

cc @rjzamora does this look the correct logic?

Thank you for working on this @jorisvandenbossche!

Yes, these changes are exactly what I had in mind. I don't think there is any danger in setting the file_path here, because the returned metadata will never be included in the file where the data is stored (the only case where the value of None is technically correct). Leaving the field empty in write_table makes sense, because the user is already specifying the full path and may want to manually construct a partitioned dataset. However, for write_to_dataset, I would argue that the only correct behavior is to set the file_path by default, and allow the user to modify it if desired/necessary (which is what you have in this PR).

@jorisvandenbossche
Copy link
Member Author

@rjzamora Thanks for the feedback! I agree that just setting the file path is probably the only sensible behaviour, so we can simply change that.

I added a test for the non-partitioned case as well.

Copy link
Contributor

@fsaintjacques fsaintjacques left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants