-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Description
Description
Remove all top-level imports of ray.data from the ray.train module. Imports needed only for type annotations should be guarded behind if TYPE_CHECKING:. Imports needed at runtime should be moved inline (lazy imports within functions/methods).
Background
Ray Train currently has direct top-level imports of ray.data scattered throughout the codebase. These imports create a hard coupling between the two modules—importing ray.train transitively imports ray.data.
The TYPE_CHECKING pattern allows type annotations to reference external types without runtime imports:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from ray.data import Dataset
def my_function(ds: "Dataset") -> None: # Note: string annotation
...For runtime usage, lazy/inline imports defer the import until the code path is actually executed:
def my_function():
from ray.data import Dataset # Only imported when function is called
...Motivation
We want to stop running all Ray Train tests on Ray Data premerge. To ensure Ray Train tests (excluding Data integration tests) don't actually depend on Ray Data, we need to be able to run them without the ray.data source code present. Removing top-level imports makes this dependency explicit and enforceable by the system. Ray Core did something similar to decouple its modules.
Implementation Boundaries & Constraints
Files requiring changes (source files only, excludes tests):
python/ray/train/**/*.py
Key considerations:
- When moving type annotations to
TYPE_CHECKING, use string quotes for forward references:def foo(ds: "Dataset")instead ofdef foo(ds: Dataset)