Skip to content

[Data] Remove meta_provider parameter and tests#60379

Merged
bveeramani merged 8 commits intoray-project:masterfrom
Hyunoh-Yeo:in-progress-60310
Jan 22, 2026
Merged

[Data] Remove meta_provider parameter and tests#60379
bveeramani merged 8 commits intoray-project:masterfrom
Hyunoh-Yeo:in-progress-60310

Conversation

@Hyunoh-Yeo
Copy link
Copy Markdown
Contributor

Description

Remove the user-facing meta_provider parameter from all read APIs, its docstrings, and related tests while keeping the metadata provider implementations and logic.

Related issues

Closes #60310

Additional information

Deleted meta_provider parameter from all read APIs, its deprecation warnings, deleted tests that explicitly tests the parameter. I kept all metadata provider implementations DefaultFileMetadataProvider, BaseFileMetadataProvider, FileMetadataProvider and meta_provider internally being used such as subclasses of ray.data.datasource.Datasource. Tested remaining read API tests.

…aved metadata provider implementations and logic

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
@Hyunoh-Yeo Hyunoh-Yeo requested a review from a team as a code owner January 21, 2026 20:20
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively removes the user-facing meta_provider parameter from the read APIs, which simplifies the interface. The changes are consistent across the codebase, including function signatures, docstrings, and tests.

I've found one potential issue in read_images where the removal of the default ImageFileMetadataProvider might lead to incorrect in-memory size estimations for image datasets. I've left a specific comment with a suggestion on how to address this by moving the logic into the ImageDatasource constructor. Other than that, the changes look good.

I am having trouble creating individual review comments. Click here to see my feedback.

python/ray/data/read_api.py (1218-1219)

high

By removing the logic that sets ImageFileMetadataProvider as the default for read_images, the behavior of in-memory size estimation for image datasets will change. read_images will now fall back to DefaultFileMetadataProvider, which doesn't account for image decoding and resizing, potentially leading to inaccurate memory usage estimates and inefficient scheduling.

To fix this, the responsibility of setting the default meta_provider should be moved into the ImageDatasource constructor. This will ensure that ImageFileMetadataProvider is used by default for image datasets, preserving the specialized size estimation logic, while still removing the user-facing meta_provider parameter.

cursor[bot]

This comment was marked as outdated.

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
@Hyunoh-Yeo Hyunoh-Yeo marked this pull request as draft January 21, 2026 23:20
.
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
@Hyunoh-Yeo Hyunoh-Yeo marked this pull request as ready for review January 22, 2026 02:11
@ray-gardener ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues deprecation Scheduled for deprecation community-contribution Contributed by the community labels Jan 22, 2026
Copy link
Copy Markdown
Member

@bveeramani bveeramani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ty for the contribution!

@bveeramani bveeramani enabled auto-merge (squash) January 22, 2026 19:31
@github-actions github-actions bot added the go add ONLY when ready to merge, run all tests label Jan 22, 2026
@bveeramani bveeramani merged commit c67ec5a into ray-project:master Jan 22, 2026
8 checks passed
@Hyunoh-Yeo Hyunoh-Yeo deleted the in-progress-60310 branch January 22, 2026 20:30
jinbum-kim pushed a commit to jinbum-kim/ray that referenced this pull request Jan 29, 2026
## Description
Remove the user-facing `meta_provider` parameter from all read APIs, its
docstrings, and related tests while keeping the metadata provider
implementations and logic.

## Related issues
Closes ray-project#60310

## Additional information
Deleted `meta_provider` parameter from all read APIs, its deprecation
warnings, deleted tests that explicitly tests the parameter. I kept all
metadata provider implementations `DefaultFileMetadataProvider,
BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider`
internally being used such as subclasses of
`ray.data.datasource.Datasource`. Tested remaining read API tests.

---------

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: jinbum-kim <jinbum9958@gmail.com>
400Ping pushed a commit to 400Ping/ray that referenced this pull request Feb 1, 2026
## Description
Remove the user-facing `meta_provider` parameter from all read APIs, its
docstrings, and related tests while keeping the metadata provider
implementations and logic.

## Related issues
Closes ray-project#60310

## Additional information
Deleted `meta_provider` parameter from all read APIs, its deprecation
warnings, deleted tests that explicitly tests the parameter. I kept all
metadata provider implementations `DefaultFileMetadataProvider,
BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider`
internally being used such as subclasses of
`ray.data.datasource.Datasource`. Tested remaining read API tests.

---------

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: 400Ping <jiekaichang@apache.org>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description
Remove the user-facing `meta_provider` parameter from all read APIs, its
docstrings, and related tests while keeping the metadata provider
implementations and logic.

## Related issues
Closes ray-project#60310

## Additional information
Deleted `meta_provider` parameter from all read APIs, its deprecation
warnings, deleted tests that explicitly tests the parameter. I kept all
metadata provider implementations `DefaultFileMetadataProvider,
BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider`
internally being used such as subclasses of
`ray.data.datasource.Datasource`. Tested remaining read API tests.

---------

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
## Description
Remove the user-facing `meta_provider` parameter from all read APIs, its
docstrings, and related tests while keeping the metadata provider
implementations and logic.

## Related issues
Closes ray-project#60310

## Additional information
Deleted `meta_provider` parameter from all read APIs, its deprecation
warnings, deleted tests that explicitly tests the parameter. I kept all
metadata provider implementations `DefaultFileMetadataProvider,
BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider`
internally being used such as subclasses of
`ray.data.datasource.Datasource`. Tested remaining read API tests.

---------

Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues deprecation Scheduled for deprecation docs An issue or change related to documentation go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Data] Remove meta_provider parameter and tests

2 participants