Skip to content

Add test of interoperability of cuDF and arrow BYTE_STREAM_SPLIT encoders#15832

Merged
rapids-bot[bot] merged 10 commits intorapidsai:branch-24.08from
etseidl:arrow_cudf_byte_stream_split
Jun 24, 2024
Merged

Add test of interoperability of cuDF and arrow BYTE_STREAM_SPLIT encoders#15832
rapids-bot[bot] merged 10 commits intorapidsai:branch-24.08from
etseidl:arrow_cudf_byte_stream_split

Conversation

@etseidl
Copy link
Copy Markdown
Contributor

@etseidl etseidl commented May 22, 2024

Description

BYTE_STREAM_SPLIT encoding was recently added to cuDF (#15311). The Parquet specification was recently changed (apache/parquet-format#229) to extend the datatypes that can be encoded as BYTE_STREAM_SPLIT, and this was only recently implemented in arrow (apache/arrow#40094). This PR adds a check that cuDF and arrow can produce compatible files using BYTE_STREAM_SPLIT encoding.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@etseidl etseidl requested a review from a team as a code owner May 22, 2024 23:37
@etseidl etseidl requested review from isVoid and wence- May 22, 2024 23:37
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented May 22, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the Python Affects Python cuDF API. label May 22, 2024
@vyasr
Copy link
Copy Markdown
Contributor

vyasr commented May 23, 2024

/ok to test

@wence-
Copy link
Copy Markdown
Contributor

wence- commented Jun 13, 2024

/ok to test

@wence- wence- added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 13, 2024
Copy link
Copy Markdown
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one non-blocking suggestion to extend the test to try with more than one row group as well.

@wence-
Copy link
Copy Markdown
Contributor

wence- commented Jun 24, 2024

/ok to test

@wence-
Copy link
Copy Markdown
Contributor

wence- commented Jun 24, 2024

/merge

@wence-
Copy link
Copy Markdown
Contributor

wence- commented Jun 24, 2024

Thanks @etseidl

@rapids-bot rapids-bot bot merged commit ed41668 into rapidsai:branch-24.08 Jun 24, 2024
@etseidl etseidl deleted the arrow_cudf_byte_stream_split branch June 24, 2024 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants