-
Notifications
You must be signed in to change notification settings - Fork 7.4k
[Data] read_parquet trigger serialization error with filesystem=HfFileSystem #59029
Copy link
Copy link
Closed
Closed
Copy link
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issues
Description
What happened + What you expected to happen
Serialization error happened because of this hf issue huggingface/huggingface_hub#3576
> python test_batch.py
...
"_repo_and_revision_exists_cache": deepcopy(self._repo_and_revision_exists_cache),
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 231, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 146, in deepcopy
y = copier(x, memo)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 211, in _deepcopy_tuple
y = [deepcopy(a, memo) for a in x]
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 211, in <listcomp>
y = [deepcopy(a, memo) for a in x]
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 172, in deepcopy
y = _reconstruct(x, memo, *rv)
File "/Users/youchenglin/miniconda3/envs/myenv/lib/python3.10/copy.py", line 265, in _reconstruct
y = func(*args)
TypeError: HfHubHTTPError.__init__() missing 1 required keyword-only argument: 'response'
...
Versions / Dependencies
- ray: master branch
- Hugging face:
1.1.5
Reproduction script
import ray
from huggingface_hub import HfFileSystem
hf_fs = HfFileSystem(token="YOUR_HF_TOKEN")
dataset_url = "hf://datasets/rotten_tomatoes"
ds = ray.data.read_parquet(
"hf://datasets/rotten_tomatoes",
file_extensions=["parquet"],
filesystem=hf_fs
)
ds.count()Issue Severity
High: It blocks me from completing my task.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tdataRay Data-related issuesRay Data-related issues