Improve instructions for reading Hugging Face datasets with Ray Data#58492
Merged
richardliaw merged 10 commits intoray-project:masterfrom Nov 19, 2025
Merged
Improve instructions for reading Hugging Face datasets with Ray Data#58492richardliaw merged 10 commits intoray-project:masterfrom
richardliaw merged 10 commits intoray-project:masterfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request updates the documentation for reading Hugging Face datasets, recommending the use of ray.data.read_parquet with HfFileSystem for better performance and scalability. The changes are a good improvement. I've provided a few suggestions to make the code examples more robust and clearer for users. Specifically, I've recommended using os.environ.get() to avoid KeyError when the Hugging Face token is not set, and suggested using a simpler dataset for the examples. For the second example, I've also proposed using a public API from the datasets library instead of internal ones to make the example more stable across library versions.
goutamvenkat-anyscale
approved these changes
Nov 11, 2025
76d5212 to
4daaab4
Compare
Contributor
|
tests failing |
55fbcd0 to
b991022
Compare
Signed-off-by: Robert Nishihara <rkn@anyscale.com>
Signed-off-by: Robert Nishihara <rkn@anyscale.com>
b991022 to
4ee1b52
Compare
Signed-off-by: Robert Nishihara <rkn@anyscale.com>
Signed-off-by: Robert Nishihara <rkn@anyscale.com>
648f138 to
d0cf6b5
Compare
400Ping
pushed a commit
to 400Ping/ray
that referenced
this pull request
Nov 21, 2025
…th Ray Data (ray-project#58492) This pull request updates the documentation for reading Hugging Face datasets, recommending the use of ray.data.read_parquet with HfFileSystem for better performance and scalability. --------- Signed-off-by: Robert Nishihara <rkn@anyscale.com>
ykdojo
pushed a commit
to ykdojo/ray
that referenced
this pull request
Nov 27, 2025
…th Ray Data (ray-project#58492) This pull request updates the documentation for reading Hugging Face datasets, recommending the use of ray.data.read_parquet with HfFileSystem for better performance and scalability. --------- Signed-off-by: Robert Nishihara <rkn@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen
pushed a commit
to SheldonTsen/ray
that referenced
this pull request
Dec 1, 2025
…th Ray Data (ray-project#58492) This pull request updates the documentation for reading Hugging Face datasets, recommending the use of ray.data.read_parquet with HfFileSystem for better performance and scalability. --------- Signed-off-by: Robert Nishihara <rkn@anyscale.com>
peterxcli
pushed a commit
to peterxcli/ray
that referenced
this pull request
Feb 25, 2026
…th Ray Data (ray-project#58492) This pull request updates the documentation for reading Hugging Face datasets, recommending the use of ray.data.read_parquet with HfFileSystem for better performance and scalability. --------- Signed-off-by: Robert Nishihara <rkn@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request updates the documentation for reading Hugging Face datasets, recommending the use of ray.data.read_parquet with HfFileSystem for better performance and scalability.