Skip to content

[Data] Make execution callback dataflow explicit to prevent state leakage#61405

Merged
bveeramani merged 25 commits intoray-project:masterfrom
limarkdcunha:task/planner-refactor-execution-callback-refactor
Mar 25, 2026
Merged

[Data] Make execution callback dataflow explicit to prevent state leakage#61405
bveeramani merged 25 commits intoray-project:masterfrom
limarkdcunha:task/planner-refactor-execution-callback-refactor

Conversation

@limarkdcunha
Copy link
Copy Markdown
Contributor

Larger Issue
Refactored the execution callback system from dynamic, per-job injection to static, eager initialization. DataContext now serves as a stateless registry of callback factories, enabling the StreamingExecutor to load all callbacks at startup. Each callback then self-configures or no-ops based on the runtime environment.

Context
This is Part 3 of a four-part change set intended to resolve the issue described above.

Part 1 diff: #60480
Part 2 diff: #61293

Closes #60279.

Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully refactors the execution callback system to make the dataflow more explicit, moving from dynamic injection to eager initialization. The changes are consistent across the modified files, with the planner now instantiating and returning the list of callbacks, which is then propagated up to the executor. The overall implementation is clean and achieves the stated goal. I have one minor suggestion to improve code clarity in plan.py.

@limarkdcunha limarkdcunha marked this pull request as ready for review March 1, 2026 18:23
@limarkdcunha limarkdcunha requested a review from a team as a code owner March 1, 2026 18:23
@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Mar 1, 2026
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
limarkdcunha and others added 2 commits March 1, 2026 16:10
@limarkdcunha
Copy link
Copy Markdown
Contributor Author

@bveeramani will need your review on this

@bveeramani bveeramani self-assigned this Mar 4, 2026
Copy link
Copy Markdown
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@bveeramani bveeramani changed the title Make execution callback dataflow explicit to prevent state leakage [Data] Make execution callback dataflow explicit to prevent state leakage Mar 18, 2026
@bveeramani bveeramani removed their assignment Mar 18, 2026
limarkdcunha and others added 2 commits March 18, 2026 22:43
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
…efactor

Signed-off-by: Limark Dcunha <83493294+limarkdcunha@users.noreply.github.com>
limarkdcunha and others added 3 commits March 18, 2026 23:05
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
limarkdcunha and others added 6 commits March 19, 2026 10:25
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
Signed-off-by: Limark Dcunha <limarkdcunha@gmail.com>
@bveeramani bveeramani enabled auto-merge (squash) March 23, 2026 20:06
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Mar 23, 2026
@bveeramani bveeramani disabled auto-merge March 23, 2026 20:06
@bveeramani bveeramani merged commit 82c1051 into ray-project:master Mar 25, 2026
8 checks passed
bveeramani added a commit that referenced this pull request Mar 25, 2026
…62053)

## Description

`planner.plan()` now returns a tuple `(physical_plan, callbacks)` after
#61405. In this PR, I've updates the planner unit tests to unpack the
tuple instead of indexing into the result.

I'm doing this to make the return type a bit more explicit and readable.

## Related issues

Related to #61405

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
bveeramani added a commit that referenced this pull request Mar 25, 2026
## Description

In this PR, I've removed old callback registration APIs
(`get_execution_callbacks`, `add_execution_callback`,
`remove_execution_callback`) in favor of the new
`execution_callback_classes` property on `DataContext`.

This is part of a larger effort to avoid storing callback instances in
`DataContext` since they might be shared across dataset executions. The
new implementation re-constructs the callbacks for each execution and
directly passes them into the executor as a constructor argument. This
makes the data flow much more explicit and easier to reason about.

## Related issues

Follow-up to #61405.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
JasonLi1909 pushed a commit to JasonLi1909/ray that referenced this pull request Mar 26, 2026
…62055)

## Description

In this PR, I've removed old callback registration APIs
(`get_execution_callbacks`, `add_execution_callback`,
`remove_execution_callback`) in favor of the new
`execution_callback_classes` property on `DataContext`.

This is part of a larger effort to avoid storing callback instances in
`DataContext` since they might be shared across dataset executions. The
new implementation re-constructs the callbacks for each execution and
directly passes them into the executor as a constructor argument. This
makes the data flow much more explicit and easier to reason about.

## Related issues

Follow-up to ray-project#61405.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Simplify execution callback lifecycle

2 participants