Ensure columns have valid null counts in CUDF JNI.#13355
Merged
rapids-bot[bot] merged 13 commits intorapidsai:branch-23.06from May 18, 2023
Merged
Ensure columns have valid null counts in CUDF JNI.#13355rapids-bot[bot] merged 13 commits intorapidsai:branch-23.06from
rapids-bot[bot] merged 13 commits intorapidsai:branch-23.06from
Conversation
…t using the lazy computation
Co-authored-by: Nghia Truong <7416935+ttnghia@users.noreply.github.com>
Fixes rapidsai#13353. In preparation for rapidsai#11968, this change ensures that columns constructed from CUDF JNI do not have their null counts set to `UNKNOWN_NULL_COUNT` (i.e. `-1`). In cases where the caller invokes JNI functions with `UNKNOWN_NULL_COUNT`, the JNI layer computes the concrete null count from the validity mask, and sets this value in the column. The option to specify an optional null count through the Java API will likely be removed at a later date. Signed-off-by: MithunR <mythrocks@gmail.com>
Contributor
Author
|
This diff includes the changes in #13345, but it applies only to the Java API. |
jlowe
approved these changes
May 16, 2023
ttnghia
reviewed
May 16, 2023
ttnghia
reviewed
May 16, 2023
Contributor
|
#13345 is merged, so @mythrocks you should be able to rebase on the latest 23.06 now. @jlowe Mithun mentioned that for some partitioning applications Spark really doesn't need to know the null count of partitioned columns because the columns are ultimately reassembled and therefore only the total count/mask matters. As a future optimization, I noted that in those cases it may make sense to eventually switch to passing around the column's device buffers instead of the columns themselves if avoiding the extra intermediate null counts has a significant performance benefit. |
1. Use column::null_count() instead of computing explicitly. 2. Remove reference to UNKNOWN_NULL_COUNT.
mythrocks
commented
May 17, 2023
mythrocks
commented
May 17, 2023
Artifacts from merge.
ttnghia
approved these changes
May 17, 2023
mythrocks
added a commit
to mythrocks/spark-rapids-jni
that referenced
this pull request
May 17, 2023
This is in prep for rapidsai/cudf#11968 and rapidsai/cudf#13372. `libcudf` will soon require that all CUDF columns are created with a known null-count. `UNKNOWN_NULL_COUNT` will no longer be supported, or even available as a code constant. This change replicates part of rapidsai/cudf#13355, as it applies to `row_conversion.cu`. The (single) reference to the unknown-null-count is replaced with a pre-calculated value. Signed-off-by: MithunR <mythrocks@gmail.com>
Contributor
Author
|
/merge |
Contributor
Author
|
Thanks for the reviews, all. I've merged this change. |
mythrocks
added a commit
to NVIDIA/spark-rapids-jni
that referenced
this pull request
May 18, 2023
This is in prep for rapidsai/cudf#11968 and rapidsai/cudf#13372. `libcudf` will soon require that all CUDF columns are created with a known null-count. `UNKNOWN_NULL_COUNT` will no longer be supported, or even available as a code constant. This change replicates part of rapidsai/cudf#13355, as it applies to `row_conversion.cu`. The (single) reference to the unknown-null-count is replaced with a pre-calculated value. Signed-off-by: MithunR <mythrocks@gmail.com>
rapids-bot bot
pushed a commit
that referenced
this pull request
May 24, 2023
This is the final PR for removing `UNKNOWN_NULL_COUNT` and the implicit kernel launch in the `null_count` methods of `column` and `column_view`. Depends on #13355 and #13341. Closes #11968 Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - MithunR (https://github.com/mythrocks) - Nghia Truong (https://github.com/ttnghia) - Karthikeyan (https://github.com/karthikeyann) URL: #13372
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #13353.
Depends on #13345.
In preparation for #11968, this change ensures that columns constructed from CUDF JNI do not have their null counts set to
UNKNOWN_NULL_COUNT(i.e.-1). In cases where the caller invokes JNI functions withUNKNOWN_NULL_COUNT, the JNI layer computes the concrete null count from the validity mask, and sets this value in the column.The current Java API remains unchanged; there should be no impact to user code.
The option to specify an optional null count through the Java API will likely be removed at a later date.
Signed-off-by: MithunR mythrocks@gmail.com
Checklist