[Data] Set default file_extensions for read_parquet#56481
[Data] Set default file_extensions for read_parquet#56481bveeramani merged 10 commits intomasterfrom
file_extensions for read_parquet#56481Conversation
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Code Review
This pull request correctly updates the default file_extensions for read_parquet from None to ["parquet"], making the previously warned-about change effective. The implementation is clean, removing the now-obsolete FutureWarning logic, which simplifies the codebase. The changes are straightforward and align perfectly with the stated goal. I have no further comments.
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
…56481) http://github.com/ray-project/ray/pull/50092 warned that we'd be changing the default `file_extensions` for Parquet from `None` to `[parquet]`. This was the motivation: > People often have non-Parquet files in their datasets (e.g., _SUCCESS or stale files). However, the default for file_extensions is None, so read_parquet tries reading the non-Parquet files. To avoid this issue, we'll change the default file extensions to something like ["parquet"]. This PR adds a warning for that change. This PR follows up on actually changes the default. --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
…56481) http://github.com/ray-project/ray/pull/50092 warned that we'd be changing the default `file_extensions` for Parquet from `None` to `[parquet]`. This was the motivation: > People often have non-Parquet files in their datasets (e.g., _SUCCESS or stale files). However, the default for file_extensions is None, so read_parquet tries reading the non-Parquet files. To avoid this issue, we'll change the default file extensions to something like ["parquet"]. This PR adds a warning for that change. This PR follows up on actually changes the default. --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…56481) http://github.com/ray-project/ray/pull/50092 warned that we'd be changing the default `file_extensions` for Parquet from `None` to `[parquet]`. This was the motivation: > People often have non-Parquet files in their datasets (e.g., _SUCCESS or stale files). However, the default for file_extensions is None, so read_parquet tries reading the non-Parquet files. To avoid this issue, we'll change the default file extensions to something like ["parquet"]. This PR adds a warning for that change. This PR follows up on actually changes the default. --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: Future-Outlier <eric901201@gmail.com>
…56481) http://github.com/ray-project/ray/pull/50092 warned that we'd be changing the default `file_extensions` for Parquet from `None` to `[parquet]`. This was the motivation: > People often have non-Parquet files in their datasets (e.g., _SUCCESS or stale files). However, the default for file_extensions is None, so read_parquet tries reading the non-Parquet files. To avoid this issue, we'll change the default file extensions to something like ["parquet"]. This PR adds a warning for that change. This PR follows up on actually changes the default. --------- Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu> Signed-off-by: peterxcli <peterxcli@gmail.com>
Why are these changes needed?
http://github.com/ray-project/ray/pull/50092 warned that we'd be changing the default
file_extensionsfor Parquet fromNoneto[parquet]. This was the motivation:This PR follows up on actually changes the default.
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.Note
Sets
read_parquetdefaultfile_extensionsto['parquet'], updates Parquet datasource accordingly, and adjusts tests/file names to use.parquet.read_parquet: defaultfile_extensionsnow['parquet'](wasNone).ParquetDatasource:_FILE_EXTENSIONS = ['parquet']and uses it as default..parquetextensions (e.g.,test_include_paths, null-first-file case).*.parquet.snappyto*.snappy.parquet, including smoke-test paths.Written by Cursor Bugbot for commit a774c4e. This will update automatically on new commits. Configure here.