-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Revert "fix: create file for empty stream" #16682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @brunal
Can you please also add a test showing the problem you encountered (aka a test for writing an empty record batch stream to a parquet file) so we don't accidentally reintroduce the same problem?
cc @chenkovsky
|
Looks like there may be a test in |
a4dc97c to
f5a77b1
Compare
|
Indeed! I have just updated the revert commit with this test. |
|
The CI tests seem to have some problems unfortunately |
f5a77b1 to
f93a15c
Compare
This reverts commit 4084894. It adds a test showcasing the functionality that the commit above broke: writing a parquet file from an empty RecordBatch.
f93a15c to
ab5e7a3
Compare
|
Apologies for that; it's now fixed. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @brunal
I pushed a commit to also verify the schema of the batch that is read from the empty parquet file
@chenkovsky can you help me review this PR and maybe help find a fix that works with csv and other formats?
Thank you @brunal and @chenkovsky |
Reverts #16342
After that change, one cannot write an empty RecordBatch with a schema to parquet anymore.
Indeed, the logic added tries to write an empty recordbatch when num_rows == 0.
So if you tried to write an empty recordbatch, you end up writing it twice => it fails.