PARQUET-2414: Extend BYTE_STREAM_SPLIT to support INT32, INT64 and FIXED_LEN_BYTE_ARRAY data#229
Merged
pitrou merged 2 commits intoapache:masterfrom Mar 18, 2024
Merged
Conversation
wgtmac
reviewed
Jan 28, 2024
…XED_LEN_BYTE_ARRAY data
3fb6ab9 to
88d3cbd
Compare
wgtmac
approved these changes
Jan 29, 2024
|
|
||
| #### New Feature | ||
|
|
||
| * [PARQUET-2414](https://issues.apache.org/jira/browse/PARQUET-2414) - Extend BYTE_STREAM_SPLIT to support INT32, INT64 and FIXED_LEN_BYTE_ARRAY data |
mapleFU
approved these changes
Feb 8, 2024
Contributor
|
+1 I think this is great. Are PoCs needed for this? I'm interested in seeing how well this works as a DELTA_BINARY_PACKED replacement for my data. |
Member
Author
|
@etseidl I've written the implementation for Parquet C++ here: apache/arrow#40094 I was planning to implement it for Parquet Java, but you may want to do it as well. |
Contributor
Sounds good. I'll put it in my queue. I'll check out your arrow implementation to see if there are any pitfalls to avoid. Thanks! |
3 tasks
|
Thank you @pitrou for investigating this! Extending |
Merged
3 tasks
rapids-bot bot
pushed a commit
to rapidsai/cudf
that referenced
this pull request
Jun 24, 2024
…ders (#15832) BYTE_STREAM_SPLIT encoding was recently added to cuDF (#15311). The Parquet specification was recently changed (apache/parquet-format#229) to extend the datatypes that can be encoded as BYTE_STREAM_SPLIT, and this was only recently implemented in arrow (apache/arrow#40094). This PR adds a check that cuDF and arrow can produce compatible files using BYTE_STREAM_SPLIT encoding. Authors: - Ed Seidl (https://github.com/etseidl) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #15832
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.