Skip to content

[Data] [2/n] - Add predicate expressions to filter api#56365

Closed
goutamvenkat-anyscale wants to merge 6 commits intoray-project:masterfrom
goutamvenkat-anyscale:09-08-_data_2_n_add_predicate_expressions_to_filter_api
Closed

[Data] [2/n] - Add predicate expressions to filter api#56365
goutamvenkat-anyscale wants to merge 6 commits intoray-project:masterfrom
goutamvenkat-anyscale:09-08-_data_2_n_add_predicate_expressions_to_filter_api

Conversation

@goutamvenkat-anyscale
Copy link
Copy Markdown
Contributor

@goutamvenkat-anyscale goutamvenkat-anyscale commented Sep 9, 2025

Why are these changes needed?

For dataset.filter() add support for predicate expressions, which is part of Ray Data's expression system and will soon replace the fn and string based expr that gets evaluated as a Pyarrow expression.

Building on top of: #56313

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Goutam V. <goutam@anyscale.com>
Signed-off-by: Goutam V. <goutam@anyscale.com>
@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner September 9, 2025 01:42
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for predicate expressions in Dataset.filter(), a significant enhancement to the Ray Data expression system. The changes are well-implemented, adding new expression types like UnaryExpr and PredicateExpr, expanding the set of supported operations, and integrating them into the evaluation and planning logic. The refactoring of Dataset.filter improves code clarity, and the comprehensive test suite is a great addition. I have one suggestion regarding a test condition that appears to be a typo.

Comment on lines +805 to +808
@pytest.mark.skipif(
get_pyarrow_version() < parse_version("20.0.0"),
reason="predicate expressions require PyArrow >= 20.0.0",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The skipif condition get_pyarrow_version() < parse_version("20.0.0") will cause these tests to be skipped, as the current PyArrow version is much lower than 20.0.0. This appears to be a typo. Could you please verify the required PyArrow version and update this condition? If there's no specific version dependency, this condition should be removed to ensure these important tests are executed.

@ray-gardener ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues labels Sep 9, 2025
Signed-off-by: Goutam V. <goutam@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues docs An issue or change related to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant