Skip to content

[Data] Remove old instance-based execution callback API#62055

Merged
bveeramani merged 1 commit intoray-project:masterfrom
bveeramani:bveeramani/remove-old-execution-callback-api
Mar 25, 2026
Merged

[Data] Remove old instance-based execution callback API#62055
bveeramani merged 1 commit intoray-project:masterfrom
bveeramani:bveeramani/remove-old-execution-callback-api

Conversation

@bveeramani
Copy link
Copy Markdown
Member

@bveeramani bveeramani commented Mar 25, 2026

Description

In this PR, I've removed old callback registration APIs (get_execution_callbacks, add_execution_callback, remove_execution_callback) in favor of the new execution_callback_classes property on DataContext.

This is part of a larger effort to avoid storing callback instances in DataContext since they might be shared across dataset executions. The new implementation re-constructs the callbacks for each execution and directly passes them into the executor as a constructor argument. This makes the data flow much more explicit and easier to reason about.

Related issues

Follow-up to #61405.

Remove the deprecated instance-based callback registration functions
(get_execution_callbacks, add_execution_callback, remove_execution_callback)
in favor of the class-based execution_callback_classes API on DataContext.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner March 25, 2026 18:29
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the execution callback mechanism in Ray Data. It removes the old function-based API for managing callbacks and replaces it with a class-based approach, where callback classes are directly stored and managed via DataContext.custom_execution_callback_classes. This change simplifies callback registration and retrieval, removing the need for explicit initialization logic. The associated tests have been updated to reflect this new API and are also refactored for clarity. A review comment suggests refactoring duplicated CustomExecutionCallback class definitions in the tests into a helper function or pytest fixture for improved maintainability.

@bveeramani bveeramani requested a review from iamjustinhsu March 25, 2026 18:36
# we use get_execution_callbacks() here so that callbacks registered via
# add_execution_callback() are not silently dropped during execution.
callbacks = list(get_execution_callbacks(logical_plan.context))
callbacks = [cls() for cls in logical_plan.context.execution_callback_classes]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

im a lil bit confused -- comparing the old code to this line, it appears we are using the same logical_plan.context: DataContext. How exactly are separating out each DataContext per dataset?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed over Slack:

get_execution_callbacks returns a list of already-constructed callbacks. These stateful callbacks can be shared across executions
execution_callback_classes just returns the stateless callback types and we re-construct them each time

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Mar 25, 2026
@bveeramani bveeramani enabled auto-merge (squash) March 25, 2026 19:27
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Mar 25, 2026
@bveeramani bveeramani merged commit 7dd6606 into ray-project:master Mar 25, 2026
6 of 7 checks passed
JasonLi1909 pushed a commit to JasonLi1909/ray that referenced this pull request Mar 26, 2026
…62055)

## Description

In this PR, I've removed old callback registration APIs
(`get_execution_callbacks`, `add_execution_callback`,
`remove_execution_callback`) in favor of the new
`execution_callback_classes` property on `DataContext`.

This is part of a larger effort to avoid storing callback instances in
`DataContext` since they might be shared across dataset executions. The
new implementation re-constructs the callbacks for each execution and
directly passes them into the executor as a constructor argument. This
makes the data flow much more explicit and easier to reason about.

## Related issues

Follow-up to ray-project#61405.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants