Skip to content

Remove UNKNOWN_NULL_COUNT#13372

Merged
rapids-bot[bot] merged 12 commits intorapidsai:branch-23.06from
vyasr:feat/remove_null_count_kernel
May 24, 2023
Merged

Remove UNKNOWN_NULL_COUNT#13372
rapids-bot[bot] merged 12 commits intorapidsai:branch-23.06from
vyasr:feat/remove_null_count_kernel

Conversation

@vyasr
Copy link
Copy Markdown
Contributor

@vyasr vyasr commented May 17, 2023

Description

This is the final PR for removing UNKNOWN_NULL_COUNT and the implicit kernel launch in the null_count methods of column and column_view.

Depends on #13355 and #13341.

Closes #11968

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@vyasr vyasr added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function breaking Breaking change labels May 17, 2023
@vyasr vyasr self-assigned this May 17, 2023
@github-actions github-actions bot added Java Affects Java cuDF API. Python Affects Python cuDF API. labels May 17, 2023
mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 17, 2023
This is in prep for rapidsai/cudf#11968 and
rapidsai/cudf#13372.

`libcudf` will soon require that all CUDF columns are created with a
known null-count. `UNKNOWN_NULL_COUNT` will no longer be supported,
or even available as a code constant.

This change replicates part of rapidsai/cudf#13355,
as it applies to `row_conversion.cu`. The (single) reference to
the unknown-null-count is replaced with a pre-calculated value.

Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks added a commit to NVIDIA/spark-rapids-jni that referenced this pull request May 18, 2023
This is in prep for rapidsai/cudf#11968 and
rapidsai/cudf#13372.

`libcudf` will soon require that all CUDF columns are created with a
known null-count. `UNKNOWN_NULL_COUNT` will no longer be supported,
or even available as a code constant.

This change replicates part of rapidsai/cudf#13355,
as it applies to `row_conversion.cu`. The (single) reference to
the unknown-null-count is replaced with a pre-calculated value.

Signed-off-by: MithunR <mythrocks@gmail.com>
@vyasr vyasr force-pushed the feat/remove_null_count_kernel branch from a2d444e to 85c146a Compare May 18, 2023 21:32
@vyasr vyasr force-pushed the feat/remove_null_count_kernel branch from 85c146a to 4fde571 Compare May 19, 2023 00:18
@vyasr vyasr marked this pull request as ready for review May 19, 2023 00:19
@vyasr vyasr requested a review from a team as a code owner May 19, 2023 00:19
@vyasr vyasr requested review from karthikeyann and mythrocks May 19, 2023 00:19
@vyasr
Copy link
Copy Markdown
Contributor Author

vyasr commented May 19, 2023

There are a couple of Java failures left that I can reproduce locally. Will attempt to debug them myself, but may need to pull in some advice.

@vyasr vyasr requested a review from a team as a code owner May 19, 2023 18:46
@vyasr vyasr requested a review from davidwendt May 19, 2023 18:53
@vyasr vyasr added DO NOT MERGE Hold off on merging; see PR for details 3 - Ready for Review Ready for review by team labels May 19, 2023
@vyasr
Copy link
Copy Markdown
Contributor Author

vyasr commented May 19, 2023

I'm blocking merging here until we've had a chance to check on the behavior of the Spark plugin with this change, but it is otherwise ready for review.

Copy link
Copy Markdown
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +1 on the JNI changes.

Note that spark-rapids does not depend on this code. row_conversion.cu will be removed as part of #13374. The splash damage here should be limited.

mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 19, 2023
This is a followup to NVIDIA#1148.

`row_conversion.cu` was modified in rapidsai/cudf#13372
to explicitly calculate null-counts for output columns.

This commit replicates the changes in cudf/pull/13372, and adds explicit
null-count calculation for the string offsets column.

Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks added a commit to mythrocks/spark-rapids-jni that referenced this pull request May 22, 2023
This is a followup to NVIDIA#1148.

`row_conversion.cu` was modified in rapidsai/cudf#13372
to explicitly calculate null-counts for output columns.

This commit replicates the changes in cudf/pull/13372, and adds explicit
null-count calculation for the string offsets column.

Signed-off-by: MithunR <mythrocks@gmail.com>
mythrocks added a commit to NVIDIA/spark-rapids-jni that referenced this pull request May 23, 2023
* Followup for null count fixup in row_conversion.cu.

This is a followup to #1148.

`row_conversion.cu` was modified in rapidsai/cudf#13372
to explicitly calculate null-counts for output columns.

This commit replicates the changes in cudf/pull/13372, and adds explicit
null-count calculation for the string offsets column.

Signed-off-by: MithunR <mythrocks@gmail.com>
@vyasr vyasr added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team DO NOT MERGE Hold off on merging; see PR for details labels May 23, 2023
@vyasr
Copy link
Copy Markdown
Contributor Author

vyasr commented May 23, 2023

/merge

@github-actions github-actions bot removed the Python Affects Python cuDF API. label May 23, 2023
@rapids-bot rapids-bot bot merged commit 56150d9 into rapidsai:branch-23.06 May 24, 2023
@vyasr vyasr deleted the feat/remove_null_count_kernel branch May 24, 2023 00:50
bdice added a commit to bdice/cudf that referenced this pull request Jun 1, 2023
rapids-bot bot pushed a commit that referenced this pull request Feb 27, 2026
Adds the `null_count()` member function back to the `cudf::column_device_view` class.
This member function was removed in #4799 because the null count was unreliable since it could be set to `UNKNOWN_NULL_COUNT`. The `UNKNOWN_NULL_COUNT`  was removed in #13372 so this should no longer be an issue.

Although the value is now more reliable it is just a copy at the time the `cudf::column_device_view` was created so it is not without caveats though these are easily explained and hopefully understood. The value is much less reliable on `cudf::mutable_column_device_view` where the null-mask is more likely modified rendering the count invalid. Therefore, the `null_count()` is added only to the immutable `cudf::column_device_view`.

The growing number of changes to move template parameters like `has_nulls` to runtime parameters reduces unneeded kernel compilations. Here is one example where having this member function would've helped to reduce the function parameters and more clearly identify the `has_nulls` parameter directly correlates to the `input` parameter state:
https://github.com/rapidsai/cudf/blob/609b08e2c8075cad12347cede69d195fa5b186f1/cpp/src/rolling/detail/rolling_operators.cuh#L399-L416

Note the `has_nulls` parameter would not be needed since the `input` could contain this information and more clearly show what the parameter is based on.

Authors:
  - David Wendt (https://github.com/davidwendt)

Approvers:
  - Tianyu Liu (https://github.com/kingcrimsontianyu)
  - Nghia Truong (https://github.com/ttnghia)
  - Muhammad Haseeb (https://github.com/mhaseeb123)

URL: #21430
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge breaking Breaking change improvement Improvement / enhancement to an existing function Java Affects Java cuDF API. libcudf Affects libcudf (C++/CUDA) code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Require null counts on column construction

5 participants