[Data] Remove old instance-based execution callback API#62055
Conversation
Remove the deprecated instance-based callback registration functions (get_execution_callbacks, add_execution_callback, remove_execution_callback) in favor of the class-based execution_callback_classes API on DataContext. Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Code Review
This pull request refactors the execution callback mechanism in Ray Data. It removes the old function-based API for managing callbacks and replaces it with a class-based approach, where callback classes are directly stored and managed via DataContext.custom_execution_callback_classes. This change simplifies callback registration and retrieval, removing the need for explicit initialization logic. The associated tests have been updated to reflect this new API and are also refactored for clarity. A review comment suggests refactoring duplicated CustomExecutionCallback class definitions in the tests into a helper function or pytest fixture for improved maintainability.
| # we use get_execution_callbacks() here so that callbacks registered via | ||
| # add_execution_callback() are not silently dropped during execution. | ||
| callbacks = list(get_execution_callbacks(logical_plan.context)) | ||
| callbacks = [cls() for cls in logical_plan.context.execution_callback_classes] |
There was a problem hiding this comment.
im a lil bit confused -- comparing the old code to this line, it appears we are using the same logical_plan.context: DataContext. How exactly are separating out each DataContext per dataset?
There was a problem hiding this comment.
Discussed over Slack:
get_execution_callbacksreturns a list of already-constructed callbacks. These stateful callbacks can be shared across executions
execution_callback_classesjust returns the stateless callback types and we re-construct them each time
…62055) ## Description In this PR, I've removed old callback registration APIs (`get_execution_callbacks`, `add_execution_callback`, `remove_execution_callback`) in favor of the new `execution_callback_classes` property on `DataContext`. This is part of a larger effort to avoid storing callback instances in `DataContext` since they might be shared across dataset executions. The new implementation re-constructs the callbacks for each execution and directly passes them into the executor as a constructor argument. This makes the data flow much more explicit and easier to reason about. ## Related issues Follow-up to ray-project#61405. Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Description
In this PR, I've removed old callback registration APIs (
get_execution_callbacks,add_execution_callback,remove_execution_callback) in favor of the newexecution_callback_classesproperty onDataContext.This is part of a larger effort to avoid storing callback instances in
DataContextsince they might be shared across dataset executions. The new implementation re-constructs the callbacks for each execution and directly passes them into the executor as a constructor argument. This makes the data flow much more explicit and easier to reason about.Related issues
Follow-up to #61405.