Skip to content

Fix bug in new shuffle-based groupby implementation#11836

Merged
raydouglass merged 5 commits intorapidsai:branch-22.10from
rjzamora:fix-shuffle-groupby
Sep 30, 2022
Merged

Fix bug in new shuffle-based groupby implementation#11836
raydouglass merged 5 commits intorapidsai:branch-22.10from
rjzamora:fix-shuffle-groupby

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented Sep 30, 2022

Description

This PR fixes a subtle bug introduced in #11800. While working on the corresponding dask-cuda benchmark for that PR rapidsai/dask-cuda#979, we discovered that non-deterministic column ordering in _groupby_partition_agg and _tree_node_agg can trigger metadata-enforcement errors in follow-up operations. This PR simply sorts the output column ordering in those functions (so that the column ordering is always deterministic).

Note that this bug is difficult to reproduce in a pytest, because it rarely occurs with a smaller number of devices (I need to use a full dgx machine to consistently trigger the error).

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora added bug Something isn't working 3 - Ready for Review Ready for review by team dask Dask issue non-breaking Non-breaking change labels Sep 30, 2022
@rjzamora rjzamora requested review from a team as code owners September 30, 2022 16:51
@github-actions github-actions bot added CMake CMake build issue conda Java Affects Java cuDF API. Python Affects Python cuDF API. libcudf Affects libcudf (C++/CUDA) code. labels Sep 30, 2022
@github-actions github-actions bot removed CMake CMake build issue gpuCI libcudf Affects libcudf (C++/CUDA) code. Java Affects Java cuDF API. labels Sep 30, 2022
@rjzamora rjzamora removed request for a team September 30, 2022 17:00
@rjzamora rjzamora removed request for a team, bdice and ttnghia September 30, 2022 17:00
@codecov
Copy link

codecov bot commented Sep 30, 2022

Codecov Report

Base: 87.46% // Head: 87.49% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (a705bda) compared to base (920b58f).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-22.10   #11836      +/-   ##
================================================
+ Coverage         87.46%   87.49%   +0.03%     
================================================
  Files               133      133              
  Lines             21792    21792              
================================================
+ Hits              19060    19067       +7     
+ Misses             2732     2725       -7     
Impacted Files Coverage Δ
python/cudf/cudf/core/dataframe.py 93.77% <0.00%> (+0.04%) ⬆️
python/cudf/cudf/core/column/string.py 88.55% <0.00%> (+0.12%) ⬆️
python/cudf/cudf/core/groupby/groupby.py 91.22% <0.00%> (+0.21%) ⬆️
python/cudf/cudf/core/column/numerical.py 95.49% <0.00%> (+0.28%) ⬆️
python/cudf/cudf/core/tools/datetimes.py 84.49% <0.00%> (+0.30%) ⬆️
python/cudf/cudf/core/column/lists.py 93.75% <0.00%> (+0.96%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@galipremsagar galipremsagar added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Sep 30, 2022
@raydouglass raydouglass merged commit 3f9b3fe into rapidsai:branch-22.10 Sep 30, 2022
@rjzamora rjzamora deleted the fix-shuffle-groupby branch September 30, 2022 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working dask Dask issue non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants