[Data] - [1/n] - Move expression_evaluator to internal directory#57778
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the codebase by moving the expression evaluator to an internal directory, which improves code organization. It also introduces StarExpr to represent all columns in projections. The changes are mostly about moving code and updating imports, with the addition of the new expression type. My review focuses on the newly located expression_evaluator.py file, where I've identified a few areas for improvement, including fixing a problematic raise statement, correcting a type hint, and clarifying a docstring.
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Show resolved
Hide resolved
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Show resolved
Hide resolved
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Outdated
Show resolved
Hide resolved
14578fe to
724ef4a
Compare
724ef4a to
37c2bd4
Compare
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Outdated
Show resolved
Hide resolved
| """ | ||
| return self.block[expr.name] | ||
|
|
||
| def visit_literal(self, expr: LiteralExpr) -> Any: |
There was a problem hiding this comment.
Let's make
- Visitor a generic parameterized by T, where T is gonna be its return type
- All methods of the visitor have to return the same value (ie no surprising polymorphism)
There was a problem hiding this comment.
Union[BlockColumn, Any] is exactly polymorphism we need to get rid of, right?
There was a problem hiding this comment.
Discussed offline, and we decided to create a generic scalartype.
For reference if I do cast it as a pyarrow scalar type, operations on pandas dfs fail since there's no guarantee on the dtype_backend. The backends could be numpy, pyarrow etc.
25345aa to
5de3eed
Compare
5de3eed to
224c5e8
Compare
Signed-off-by: Goutam <goutam@anyscale.com>
224c5e8 to
a3f529d
Compare
| # 3. Converting resolved expressions to PA ones | ||
| # Need to break up the abstraction provided by ExpressionEvaluator. | ||
|
|
||
| ScalarType = TypeVar("ScalarType") |
There was a problem hiding this comment.
Can we make this a Union? Or it will need to include Py object?
Anyways add a comment to explain what this could and could not be
…-project#57778) <!-- Thank you for contributing to Ray! 🚀 --> <!-- Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete --> ## Description 1. Moves Ray Data's Expression Evaluator into internal directory 2. Adds `StarExpr` 3. Updates the imports ## Related issues <!-- Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234" --> ## Types of change - [ ] Bug fix 🐛 - [ ] New feature ✨ - [ ] Enhancement 🚀 - [x] Code refactoring 🔧 - [ ] Documentation update 📖 - [ ] Chore 🧹 - [ ] Style 🎨 ## Checklist **Does this PR introduce breaking changes?** - [ ] Yes⚠️ - [x] No <!-- If yes, describe what breaks and how users should migrate --> **Testing:** - [x] Added/updated tests for my changes - [ ] Tested the changes manually - [ ] This PR is not tested ❌ _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context <!-- Optional: Add screenshots, examples, performance impact, breaking change details --> Signed-off-by: Goutam <goutam@anyscale.com>
…-project#57778) <!-- Thank you for contributing to Ray! 🚀 --> <!-- Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete --> ## Description 1. Moves Ray Data's Expression Evaluator into internal directory 2. Adds `StarExpr` 3. Updates the imports ## Related issues <!-- Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234" --> ## Types of change - [ ] Bug fix 🐛 - [ ] New feature ✨ - [ ] Enhancement 🚀 - [x] Code refactoring 🔧 - [ ] Documentation update 📖 - [ ] Chore 🧹 - [ ] Style 🎨 ## Checklist **Does this PR introduce breaking changes?** - [ ] Yes⚠️ - [x] No <!-- If yes, describe what breaks and how users should migrate --> **Testing:** - [x] Added/updated tests for my changes - [ ] Tested the changes manually - [ ] This PR is not tested ❌ _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context <!-- Optional: Add screenshots, examples, performance impact, breaking change details --> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: xgui <xgui@anyscale.com>
) <!-- Thank you for contributing to Ray! 🚀 --> <!-- Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete --> ## Description 1. Moves Ray Data's Expression Evaluator into internal directory 2. Adds `StarExpr` 3. Updates the imports ## Related issues <!-- Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234" --> ## Types of change - [ ] Bug fix 🐛 - [ ] New feature ✨ - [ ] Enhancement 🚀 - [x] Code refactoring 🔧 - [ ] Documentation update 📖 - [ ] Chore 🧹 - [ ] Style 🎨 ## Checklist **Does this PR introduce breaking changes?** - [ ] Yes⚠️ - [x] No <!-- If yes, describe what breaks and how users should migrate --> **Testing:** - [x] Added/updated tests for my changes - [ ] Tested the changes manually - [ ] This PR is not tested ❌ _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context <!-- Optional: Add screenshots, examples, performance impact, breaking change details --> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…-project#57778) <!-- Thank you for contributing to Ray! 🚀 --> <!-- Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete --> ## Description 1. Moves Ray Data's Expression Evaluator into internal directory 2. Adds `StarExpr` 3. Updates the imports ## Related issues <!-- Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234" --> ## Types of change - [ ] Bug fix 🐛 - [ ] New feature ✨ - [ ] Enhancement 🚀 - [x] Code refactoring 🔧 - [ ] Documentation update 📖 - [ ] Chore 🧹 - [ ] Style 🎨 ## Checklist **Does this PR introduce breaking changes?** - [ ] Yes⚠️ - [x] No <!-- If yes, describe what breaks and how users should migrate --> **Testing:** - [x] Added/updated tests for my changes - [ ] Tested the changes manually - [ ] This PR is not tested ❌ _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context <!-- Optional: Add screenshots, examples, performance impact, breaking change details --> Signed-off-by: Goutam <goutam@anyscale.com>
…-project#57778) <!-- Thank you for contributing to Ray! 🚀 --> <!-- Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete --> ## Description 1. Moves Ray Data's Expression Evaluator into internal directory 2. Adds `StarExpr` 3. Updates the imports ## Related issues <!-- Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234" --> ## Types of change - [ ] Bug fix 🐛 - [ ] New feature ✨ - [ ] Enhancement 🚀 - [x] Code refactoring 🔧 - [ ] Documentation update 📖 - [ ] Chore 🧹 - [ ] Style 🎨 ## Checklist **Does this PR introduce breaking changes?** - [ ] Yes⚠️ - [x] No <!-- If yes, describe what breaks and how users should migrate --> **Testing:** - [x] Added/updated tests for my changes - [ ] Tested the changes manually - [ ] This PR is not tested ❌ _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context <!-- Optional: Add screenshots, examples, performance impact, breaking change details --> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…-project#57778) <!-- Thank you for contributing to Ray! 🚀 --> <!-- Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- 💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete --> ## Description 1. Moves Ray Data's Expression Evaluator into internal directory 2. Adds `StarExpr` 3. Updates the imports ## Related issues <!-- Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to ray-project#1234" --> ## Types of change - [ ] Bug fix 🐛 - [ ] New feature ✨ - [ ] Enhancement 🚀 - [x] Code refactoring 🔧 - [ ] Documentation update 📖 - [ ] Chore 🧹 - [ ] Style 🎨 ## Checklist **Does this PR introduce breaking changes?** - [ ] Yes⚠️ - [x] No <!-- If yes, describe what breaks and how users should migrate --> **Testing:** - [x] Added/updated tests for my changes - [ ] Tested the changes manually - [ ] This PR is not tested ❌ _(please explain why)_ **Code Quality:** - [x] Signed off every commit (`git commit -s`) - [x] Ran pre-commit hooks ([setup guide](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) **Documentation:** - [ ] Updated documentation (if applicable) ([contribution guide](https://docs.ray.io/en/latest/ray-contribute/docs.html)) - [ ] Added new APIs to `doc/source/` (if applicable) ## Additional context <!-- Optional: Add screenshots, examples, performance impact, breaking change details --> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Description
StarExprRelated issues
Types of change
Checklist
Does this PR introduce breaking changes?
Testing:
Code Quality:
git commit -s)Documentation:
doc/source/(if applicable)Additional context