[Data] - Add support for callable classes for UDFExpr#56725
[Data] - Add support for callable classes for UDFExpr#56725bveeramani merged 33 commits intoray-project:masterfrom
Conversation
Signed-off-by: Goutam V. <goutam@anyscale.com>
There was a problem hiding this comment.
Code Review
This pull request adds support for using callable classes as User-Defined Functions (UDFs) with the @udf decorator in Ray Data expressions. This is a great enhancement as it allows stateful UDFs to be used with expressions, for example in conjunction with actors. The changes primarily involve modifying the udf decorator to wrap callable classes with logic to handle expression-based calls. The implementation is solid and includes good documentation updates and a comprehensive test case. I have a couple of suggestions to improve the robustness and maintainability of the implementation.
Signed-off-by: Goutam V. <goutam@anyscale.com>
omatthew98
left a comment
There was a problem hiding this comment.
Seems super reasonable, thanks for getting this up so quickly. OOC have you tested this in a multinode cluster? It seems like this should work, but we should verify that the instances are getting instantiated in the correct places (I think that is part of why we have this code here).
I haven't. I'll test it out. If bugs arise I'll tackle that as a separate PR, if that's alright |
Yeah I think that makes sense to just keep iterating on this. Might be good to add a warning or TODO that we need to add support / do more testing for multinode cases? Will defer to you what to do. |
Signed-off-by: Goutam V. <goutam@anyscale.com>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
|
This pull request has been automatically closed because there has been no more activity in the 14 days Please feel free to reopen or open a new pull request if you'd still like this to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for your contribution! |
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
| result = expr.fn(*args, **kwargs) | ||
|
|
||
| # Try to call from actor context (handles both sync and async UDFs) | ||
| result = call_udf_from_actor_context(expr.fn_constructor_class, args, kwargs) |
There was a problem hiding this comment.
is expr.fn_constructor_class here because of the scheduling on worker nodes?
There was a problem hiding this comment.
Yes we need to evaluate the UDF expression here (which happens on the worker)
python/ray/data/expressions.py
Outdated
| def __call__(self, *call_args, **call_kwargs): | ||
| # Build a lazy callable that instantiates on the worker | ||
| # the first time it's invoked during expression evaluation. | ||
| def _lazy_impl(*args, **kwargs): |
There was a problem hiding this comment.
why is the lazy instantiation needed here? I feel like it should be unified with the how we normally create actors, right? Will this work with autoscaling actor pools?
There was a problem hiding this comment.
So we don't want to do the instantiation on the driver (it could be an expensive model that's being loaded for example). The lazy_impl is for executing with task pool strategy.
There was a problem hiding this comment.
can you document this? I had the same question.
There was a problem hiding this comment.
Actually I can get rid of this. Apparently in get_compute_strategy, task pool strategy in combination with actors is forbidden
| return inspect.iscoroutinefunction(fn) or inspect.isasyncgenfunction(fn) | ||
|
|
||
|
|
||
| def _call_udf_instance_with_async_bridge( |
There was a problem hiding this comment.
what happened to _generate_transform_fn_for_async_map, and why is this needed?
There was a problem hiding this comment.
It's used by the new expression evaluation system via call_udf_from_actor_context. When expressions contain callable class UDFs (like col("x").apply(MyAsyncUDF)), the expression evaluator needs to:
- Look up the UDF instance from actor context
- Call it with the evaluated arguments
- Handle async/sync bridging
There was a problem hiding this comment.
visit_udf() → call_udf_from_actor_context() → _call_udf_instance_with_async_bridge()
Signed-off-by: Goutam <goutam@anyscale.com>
b536956 to
fef026c
Compare
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Outdated
Show resolved
Hide resolved
|
/gemini summary |
Summary of ChangesThis pull request significantly enhances Ray Data's expression capabilities by integrating callable classes as UDFs. This change allows users to define stateful UDFs that can maintain context across data batches, which is crucial for many machine learning and data processing workflows. The system intelligently manages the execution strategy, automatically opting for actor-based computation when callable class UDFs are detected, while also providing options for manual override. The underlying UDF execution framework has been streamlined to support these new capabilities, ensuring robust and consistent behavior for both synchronous and asynchronous UDFs. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
|
python/ray/data/_internal/planner/plan_expression/expression_evaluator.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
Signed-off-by: Goutam <goutam@anyscale.com>
) ## Why are these changes needed? With this change the `@udf` decorator can be added to callable classes. Allows for expressions to be used in conjunction with callable classes. ## Related issue number Closes ray-project#56529 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: jasonwrwang <jasonwrwang@tencent.com>
) ## Why are these changes needed? With this change the `@udf` decorator can be added to callable classes. Allows for expressions to be used in conjunction with callable classes. ## Related issue number Closes ray-project#56529 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com>
) ## Why are these changes needed? With this change the `@udf` decorator can be added to callable classes. Allows for expressions to be used in conjunction with callable classes. ## Related issue number Closes ray-project#56529 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com>
) ## Why are these changes needed? With this change the `@udf` decorator can be added to callable classes. Allows for expressions to be used in conjunction with callable classes. ## Related issue number Closes ray-project#56529 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com>
) ## Why are these changes needed? With this change the `@udf` decorator can be added to callable classes. Allows for expressions to be used in conjunction with callable classes. ## Related issue number Closes ray-project#56529 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com>
) ## Why are these changes needed? With this change the `@udf` decorator can be added to callable classes. Allows for expressions to be used in conjunction with callable classes. ## Related issue number Closes ray-project#56529 ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Goutam V. <goutam@anyscale.com> Signed-off-by: Goutam <goutam@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Why are these changes needed?
With this change the
@udfdecorator can be added to callable classes. Allows for expressions to be used in conjunction with callable classes.Related issue number
Closes #56529
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.