Skip to content

[Data] Encapsulate optimization rules, logical operators, and datasources within packages #60204

@bveeramani

Description

@bveeramani

Ray Data has subpackages for optimziation rules (ray.data._internal.logical.rules), logical operator (ray.data._internal.logical.operators), and datasources (ray.data._internal.datasource).

Despite defining these subpackages, we import from individual modules. This isn't ideal because the imports become really long and verbose (e.g., from ray.data._internal.logical.rules.inherit_batch_format import InheritBatchFormatRule rather than from ray.data._internal.logical.rules import InheritBatchFormatRule)

To simplify the imports, we can encapsulate all of the objects within a package. What does means is:

  1. Import all of the objects in the appropriate init
  2. Export the objects with all
  3. Update all references to import from the subpackage rather than individual modules
# ray.data._internal.logical.rules__init__`
from ray.data._internal.logical.rules.inherit_batch_format import InheritBatchFormatRule
...

__all__ = ["InheritBatchFormatRule", ...]

# Some place that imports InheritBatchFormatRule
from ray.data._internal.logical.rules import InheritBatchFormatRule

Contstraints

  • Be careful to not introduce circular dependencies. If you run into circular dependencies, try to reasonable resolve them, and if you're unsure what to do, ask @bveeramani .
  • Try to keep all sorted alphabetically
  • Do this in three separate PRs -- one for each subpackage

Metadata

Metadata

Assignees

Labels

P3Issue moderate in impact or severitydataRay Data-related issues

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions