Skip to content

[FEA] Add DELTA_BINARY_PACKED decoding support to Parquet reader#13637

Merged
rapids-bot[bot] merged 85 commits intorapidsai:branch-23.10from
etseidl:feature/delta_binary
Aug 23, 2023
Merged

[FEA] Add DELTA_BINARY_PACKED decoding support to Parquet reader#13637
rapids-bot[bot] merged 85 commits intorapidsai:branch-23.10from
etseidl:feature/delta_binary

Conversation

@etseidl
Copy link
Copy Markdown
Contributor

@etseidl etseidl commented Jun 29, 2023

Description

Part of #13501. This adds support for decoding Parquet pages that are DELTA_BINARY_PACKED.

In addition to adding delta support, this PR incorporates changes introduced in #13622, such as using a mask to determine which decoding kernels to run, and adding parameters to the page_state_buffers_s struct to reduce the amount of shared memory used.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rapids-bot
Copy link
Copy Markdown
Contributor

rapids-bot bot commented Jun 29, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write permissions or greater before CI can begin.

@github-actions github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Jun 29, 2023
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jun 29, 2023
@vuule vuule added feature request New feature or request cuIO cuIO issue non-breaking Non-breaking change labels Jun 30, 2023
@vuule
Copy link
Copy Markdown
Contributor

vuule commented Jun 30, 2023

/ok to test

Copy link
Copy Markdown
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick first pass. More to come.

@etseidl etseidl mentioned this pull request Aug 18, 2023
3 tasks
etseidl and others added 3 commits August 22, 2023 13:07
zhuoxunyi referenced this pull request Aug 23, 2023
Fixes: #13864 

This PR fixes an issue with `loc` indexer where some special handling needs to be done when `columns` is of type `MultiIndex`.

Authors:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #13929
Copy link
Copy Markdown
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CMake approval.

Copy link
Copy Markdown
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with a suggestion.

etseidl and others added 2 commits August 23, 2023 11:36
@vuule
Copy link
Copy Markdown
Contributor

vuule commented Aug 23, 2023

/ok to test

@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Aug 23, 2023
@vuule
Copy link
Copy Markdown
Contributor

vuule commented Aug 23, 2023

/merge

@rapids-bot rapids-bot bot merged commit c39c04d into rapidsai:branch-23.10 Aug 23, 2023
@etseidl etseidl deleted the feature/delta_binary branch August 26, 2023 18:22
rapids-bot bot pushed a commit that referenced this pull request Sep 13, 2023
#13637 added a static stream pool object for use by the Parquet reader. This PR expands upon that by:

- Moving the stream pool to the `cudf::detail` namespace.
- Adding a debugging implementation that always returns the default stream.
- Hiding implementation details behind a more streamlined interface.
- Using cuda events for synchronization.

Authors:
  - Ed Seidl (https://github.com/etseidl)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mark Harris (https://github.com/harrism)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - Vukasin Milovanovic (https://github.com/vuule)
  - Mark Harris (https://github.com/harrism)

URL: #13922
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge CMake CMake build issue cuIO cuIO issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants