Skip to content

[docs][data.llm] simplify / add ray data.llm quickstart example#58330

Merged
kouroshHakha merged 2 commits intoray-project:masterfrom
nrghosh:nrghosh/vllm-data-quickstart
Nov 7, 2025
Merged

[docs][data.llm] simplify / add ray data.llm quickstart example#58330
kouroshHakha merged 2 commits intoray-project:masterfrom
nrghosh:nrghosh/vllm-data-quickstart

Conversation

@nrghosh
Copy link
Copy Markdown
Contributor

@nrghosh nrghosh commented Oct 30, 2025

LLM Data documentation jumps quickly into detailed / complex examples with lots of configuration and steps.

This PR adds a simpler minimal quick-start to the top of the documentation.

Note: will / can update after the larger ray data.llm api refactor is done (context: #58298)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@nrghosh nrghosh requested a review from a team as a code owner October 30, 2025 23:21
@nrghosh nrghosh requested review from a team and richardliaw October 30, 2025 23:21
cursor[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable minimal quickstart example for ray.data.llm, which significantly simplifies the learning curve for new users. The example is clear and well-documented. My feedback includes a couple of suggestions to enhance the example's robustness: one to make the Ray initialization more resilient for interactive sessions, and another to adjust the batch size for broader compatibility with different GPU configurations. These changes will help ensure a smoother out-of-the-box experience for users.

from ray.data.llm import vLLMEngineProcessorConfig, build_llm_processor

# Initialize Ray
ray.init()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a good practice in documentation examples to use ray.init(ignore_reinit_error=True). This prevents errors if a user runs the script multiple times in an interactive environment like a Jupyter notebook, where Ray might have already been initialized.

Suggested change
ray.init()
ray.init(ignore_reinit_error=True)

config = vLLMEngineProcessorConfig(
model_source="unsloth/Llama-3.1-8B-Instruct",
concurrency=1, # 1 vLLM engine replica
batch_size=32, # 32 samples per batch
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A batch_size of 32 might be too large for some GPUs when running an 8B model, potentially leading to out-of-memory errors. For a quickstart example, it's safer to start with a smaller batch size, for example 16, and let users increase it if their hardware allows.

Suggested change
batch_size=32, # 32 samples per batch
batch_size=16, # 16 samples per batch

Comment on lines +46 to 51

The processor expects input rows with a ``prompt`` field and outputs rows with both ``prompt`` and ``response`` fields. You can consume results using ``iter_rows()``, ``take()``, ``show()``, or save to files with ``write_parquet()``.

For more configuration options and advanced features, see the sections below.

.. _batch_inference_llm:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also deduplicate the content from the section below?

https://anyscale-ray--58330.com.readthedocs.build/en/58330/data/working-with-llms.html#perform-batch-inference-with-llms

Feel like there's some redundancy, like the installation and basic explanation of the configuration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done + sglang engine pointer added

@ray-gardener ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues llm labels Oct 31, 2025
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
@kouroshHakha kouroshHakha added the go add ONLY when ready to merge, run all tests label Nov 3, 2025
@kouroshHakha kouroshHakha enabled auto-merge (squash) November 3, 2025 23:26
@nrghosh nrghosh requested review from a team and angelinalg November 4, 2025 00:30
@kouroshHakha kouroshHakha merged commit ad838a3 into ray-project:master Nov 7, 2025
8 checks passed
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…-project#58330)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…-project#58330)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…-project#58330)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…-project#58330)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
Signed-off-by: peterxcli <peterxcli@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues docs An issue or change related to documentation go add ONLY when ready to merge, run all tests llm

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants