[Data] Remove meta_provider parameter and tests#60379
[Data] Remove meta_provider parameter and tests#60379bveeramani merged 8 commits intoray-project:masterfrom
meta_provider parameter and tests#60379Conversation
…aved metadata provider implementations and logic Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request effectively removes the user-facing meta_provider parameter from the read APIs, which simplifies the interface. The changes are consistent across the codebase, including function signatures, docstrings, and tests.
I've found one potential issue in read_images where the removal of the default ImageFileMetadataProvider might lead to incorrect in-memory size estimations for image datasets. I've left a specific comment with a suggestion on how to address this by moving the logic into the ImageDatasource constructor. Other than that, the changes look good.
I am having trouble creating individual review comments. Click here to see my feedback.
python/ray/data/read_api.py (1218-1219)
By removing the logic that sets ImageFileMetadataProvider as the default for read_images, the behavior of in-memory size estimation for image datasets will change. read_images will now fall back to DefaultFileMetadataProvider, which doesn't account for image decoding and resizing, potentially leading to inaccurate memory usage estimates and inefficient scheduling.
To fix this, the responsibility of setting the default meta_provider should be moved into the ImageDatasource constructor. This will ensure that ImageFileMetadataProvider is used by default for image datasets, preserving the specialized size estimation logic, while still removing the user-facing meta_provider parameter.
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com>
bveeramani
left a comment
There was a problem hiding this comment.
LGTM! ty for the contribution!
## Description Remove the user-facing `meta_provider` parameter from all read APIs, its docstrings, and related tests while keeping the metadata provider implementations and logic. ## Related issues Closes ray-project#60310 ## Additional information Deleted `meta_provider` parameter from all read APIs, its deprecation warnings, deleted tests that explicitly tests the parameter. I kept all metadata provider implementations `DefaultFileMetadataProvider, BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider` internally being used such as subclasses of `ray.data.datasource.Datasource`. Tested remaining read API tests. --------- Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com> Signed-off-by: jinbum-kim <jinbum9958@gmail.com>
## Description Remove the user-facing `meta_provider` parameter from all read APIs, its docstrings, and related tests while keeping the metadata provider implementations and logic. ## Related issues Closes ray-project#60310 ## Additional information Deleted `meta_provider` parameter from all read APIs, its deprecation warnings, deleted tests that explicitly tests the parameter. I kept all metadata provider implementations `DefaultFileMetadataProvider, BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider` internally being used such as subclasses of `ray.data.datasource.Datasource`. Tested remaining read API tests. --------- Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com> Signed-off-by: 400Ping <jiekaichang@apache.org>
## Description Remove the user-facing `meta_provider` parameter from all read APIs, its docstrings, and related tests while keeping the metadata provider implementations and logic. ## Related issues Closes ray-project#60310 ## Additional information Deleted `meta_provider` parameter from all read APIs, its deprecation warnings, deleted tests that explicitly tests the parameter. I kept all metadata provider implementations `DefaultFileMetadataProvider, BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider` internally being used such as subclasses of `ray.data.datasource.Datasource`. Tested remaining read API tests. --------- Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
## Description Remove the user-facing `meta_provider` parameter from all read APIs, its docstrings, and related tests while keeping the metadata provider implementations and logic. ## Related issues Closes ray-project#60310 ## Additional information Deleted `meta_provider` parameter from all read APIs, its deprecation warnings, deleted tests that explicitly tests the parameter. I kept all metadata provider implementations `DefaultFileMetadataProvider, BaseFileMetadataProvider, FileMetadataProvider` and `meta_provider` internally being used such as subclasses of `ray.data.datasource.Datasource`. Tested remaining read API tests. --------- Signed-off-by: Hyunoh-Yeo <hyunoh.yeo@gmail.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
Description
Remove the user-facing
meta_providerparameter from all read APIs, its docstrings, and related tests while keeping the metadata provider implementations and logic.Related issues
Closes #60310
Additional information
Deleted
meta_providerparameter from all read APIs, its deprecation warnings, deleted tests that explicitly tests the parameter. I kept all metadata provider implementationsDefaultFileMetadataProvider, BaseFileMetadataProvider, FileMetadataProviderandmeta_providerinternally being used such as subclasses ofray.data.datasource.Datasource. Tested remaining read API tests.