[Data] Allow file extensions starting with '.'#58339
[Data] Allow file extensions starting with '.'#58339bveeramani merged 1 commit intoray-project:masterfrom
Conversation
2e8cac5 to
653177a
Compare
|
@bveeramani PTAL |
bveeramani
left a comment
There was a problem hiding this comment.
Hey @CowKeyMan , would you mind elaborating on the motivation for this change in the PR description?
653177a to
53df1b5
Compare
|
I added an example in the commit description |
53df1b5 to
20e009f
Compare
| if file_extensions is not None: | ||
| file_extensions = [x[1:] if x.startswith(".") else x for x in file_extensions] | ||
|
|
There was a problem hiding this comment.
Maybe move to _has_file_extension so it's colocated with the relevant code?
There was a problem hiding this comment.
Also, add a comment explaining why we have this logic?
There was a problem hiding this comment.
You are right, the code makes more sense in _has_file_extension. This has now been done
I also added another example to this method. Are these tested automatically with doctest? I am not sure where I should put the test (which file?)
There was a problem hiding this comment.
@CowKeyMan, I think the test for _has_file_extension would go here - python/ray/data/tests/test_path_util.py
030fefa to
ddeeb87
Compare
Signed-off-by: Daniel Cauchi <dancauchi1@gmail.com> It is sometimes intuitive for users to provide their extensions with '.' at the start. This PR takes care of that and removed the '.' when it is provided.
aebd6b0 to
3746cea
Compare
|
Test added, code moved, and I adjusted the comment as well |
It is sometimes intuitive for users to provide their extensions with '.'
at the start. This PR takes care of that and removed the '.' when it is
provided.
For example, when using `ray.data.read_parquet`, the parameter
`file_extensions` needs to be something like `['parquet']`. However,
intuitively some users may interpret this parameter as being able to use
`['.parquet']`.
This commit allows users to switch from:
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['parquet'],
)
```
to
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['.parquet'], # Now will read files, instead of silently not reading anything
)
```
It is sometimes intuitive for users to provide their extensions with '.'
at the start. This PR takes care of that and removed the '.' when it is
provided.
For example, when using `ray.data.read_parquet`, the parameter
`file_extensions` needs to be something like `['parquet']`. However,
intuitively some users may interpret this parameter as being able to use
`['.parquet']`.
This commit allows users to switch from:
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['parquet'],
)
```
to
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['.parquet'], # Now will read files, instead of silently not reading anything
)
```
It is sometimes intuitive for users to provide their extensions with '.'
at the start. This PR takes care of that and removed the '.' when it is
provided.
For example, when using `ray.data.read_parquet`, the parameter
`file_extensions` needs to be something like `['parquet']`. However,
intuitively some users may interpret this parameter as being able to use
`['.parquet']`.
This commit allows users to switch from:
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['parquet'],
)
```
to
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['.parquet'], # Now will read files, instead of silently not reading anything
)
```
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
It is sometimes intuitive for users to provide their extensions with '.'
at the start. This PR takes care of that and removed the '.' when it is
provided.
For example, when using `ray.data.read_parquet`, the parameter
`file_extensions` needs to be something like `['parquet']`. However,
intuitively some users may interpret this parameter as being able to use
`['.parquet']`.
This commit allows users to switch from:
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['parquet'],
)
```
to
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['.parquet'], # Now will read files, instead of silently not reading anything
)
```
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
It is sometimes intuitive for users to provide their extensions with '.'
at the start. This PR takes care of that and removed the '.' when it is
provided.
For example, when using `ray.data.read_parquet`, the parameter
`file_extensions` needs to be something like `['parquet']`. However,
intuitively some users may interpret this parameter as being able to use
`['.parquet']`.
This commit allows users to switch from:
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['parquet'],
)
```
to
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['.parquet'], # Now will read files, instead of silently not reading anything
)
```
It is sometimes intuitive for users to provide their extensions with '.'
at the start. This PR takes care of that and removed the '.' when it is
provided.
For example, when using `ray.data.read_parquet`, the parameter
`file_extensions` needs to be something like `['parquet']`. However,
intuitively some users may interpret this parameter as being able to use
`['.parquet']`.
This commit allows users to switch from:
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['parquet'],
)
```
to
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['.parquet'], # Now will read files, instead of silently not reading anything
)
```
Signed-off-by: Future-Outlier <eric901201@gmail.com>
It is sometimes intuitive for users to provide their extensions with '.'
at the start. This PR takes care of that and removed the '.' when it is
provided.
For example, when using `ray.data.read_parquet`, the parameter
`file_extensions` needs to be something like `['parquet']`. However,
intuitively some users may interpret this parameter as being able to use
`['.parquet']`.
This commit allows users to switch from:
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['parquet'],
)
```
to
```python
train_data = ray.data.read_parquet(
'example_parquet_folder/',
file_extensions=['.parquet'], # Now will read files, instead of silently not reading anything
)
```
Signed-off-by: peterxcli <peterxcli@gmail.com>
It is sometimes intuitive for users to provide their extensions with '.' at the start. This PR takes care of that and removed the '.' when it is provided.
For example, when using
ray.data.read_parquet, the parameterfile_extensionsneeds to be something like['parquet']. However, intuitively some users may interpret this parameter as being able to use['.parquet'].This commit allows users to switch from:
to