-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Optionally coerce names of maps and lists to match Parquet specification #6828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Note this would supersede #6814. |
parquet/src/file/properties.rs
Outdated
| /// Writers have the option to coerce these into native Parquet types. Type | ||
| /// coercion allows for meaningful representations that do not require | ||
| /// downstream readers to consider the embedded Arrow schema. However, type | ||
| /// Also, for [`List`] and [`Map`] types, Parquet expects certain schema elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should most of this verbiage move to set_coerce_types on the builder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or set_coerce_types could link to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or set_coerce_types could link to here
I already made that change. My concern is that most of the deep documentation is currently on the setters rather than the getters...this one is an outlier.
Edit: I went ahead and moved the bulk of the documentation to the builder. I think more eyes will find it there.
tustvold
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a test using ArrowWriter to check everything is hooked up correctly might be valuable
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /// This type coercion allows for meaningful representations that do not require | ||
| /// downstream readers to consider the embedded Arrow schema, and can allow for greater | ||
| /// compatibility with other Parquet implementations. However, type | ||
| /// coercion also prevents the data from being losslessly round-tripped. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this wording too
|
I just restarted the failed jobs and will try and merge this PR in when complete |
|
Thanks @etseidl |
|
Thanks @etseidl for your work!!! This is gonna be published in the next minor release or it's considered a major change and will be out in v54? Given the fact it's an optional behaviour (we need to call |
The next release is scheduled to be a major one (54) so I expect this to be out in 54 and thus I haven't thought carefully if it would be classified as major/minor |
Which issue does this PR close?
Closes #6213.
Closes #6733.
Rationale for this change
Allow for forcing use of compliant names for Parquet
LISTandMAPfields.What changes are included in this PR?
Uses the
coerce_typesboolean property added in #6313 to control naming of elements ofLISTandMAPstructures. Also adds acoerce_typesflag toparquet-rewrite.Are there any user-facing changes?
No API changes, but change to behavior when
coerce_typesistrue.