Skip to content

[Data][Cherry-pick] Prevent Limit from getting pushed past map_groups (#60881)#60893

Merged
aslonnie merged 2 commits intoreleases/2.54.0from
cp-60881
Feb 10, 2026
Merged

[Data][Cherry-pick] Prevent Limit from getting pushed past map_groups (#60881)#60893
aslonnie merged 2 commits intoreleases/2.54.0from
cp-60881

Conversation

@bveeramani
Copy link
Copy Markdown
Member

Cherry-pick of #60881

This PR updates `map_groups` to assume that the UDF might change the row
count. This change is necessary to fix a bug where `Limit` gets
incorrectly pushed past the `map_groups` (fixes
#60872).

For more context, see:
* #60448
* #60756

---------

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner February 9, 2026 23:23
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in Ray Data's optimizer where a Limit operation could be incorrectly pushed down past a map_groups operation. The map_groups operation was incorrectly marked as not modifying row counts, which is now fixed by setting udf_modifying_row_count=True. This ensures that optimizations like limit pushdown are not applied when they could lead to incorrect results. A new test case has been added to verify that the Limit operation is no longer pushed past map_groups by default, confirming the fix. The changes are correct and well-tested.

@bveeramani bveeramani added the go add ONLY when ready to merge, run all tests label Feb 9, 2026
@aslonnie aslonnie self-requested a review February 9, 2026 23:49
@aslonnie aslonnie merged commit e94871d into releases/2.54.0 Feb 10, 2026
5 of 6 checks passed
@aslonnie aslonnie deleted the cp-60881 branch February 10, 2026 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants