[docs][data.llm] simplify / add ray data.llm quickstart example by nrghosh · Pull Request #58330 · ray-project/ray

nrghosh · 2025-10-30T23:21:26Z

LLM Data documentation jumps quickly into detailed / complex examples with lots of configuration and steps.

This PR adds a simpler minimal quick-start to the top of the documentation.

Note: will / can update after the larger ray data.llm api refactor is done (context: #58298)

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

gemini-code-assist

Code Review

This pull request introduces a valuable minimal quickstart example for ray.data.llm, which significantly simplifies the learning curve for new users. The example is clear and well-documented. My feedback includes a couple of suggestions to enhance the example's robustness: one to make the Ray initialization more resilient for interactive sessions, and another to adjust the batch size for broader compatibility with different GPU configurations. These changes will help ensure a smoother out-of-the-box experience for users.

gemini-code-assist · 2025-10-30T23:23:44Z

doc/source/data/doc_code/working-with-llms/minimal_quickstart.py

+from ray.data.llm import vLLMEngineProcessorConfig, build_llm_processor
+
+# Initialize Ray
+ray.init()


It's a good practice in documentation examples to use ray.init(ignore_reinit_error=True). This prevents errors if a user runs the script multiple times in an interactive environment like a Jupyter notebook, where Ray might have already been initialized.

Suggested change

ray.init()

ray.init(ignore_reinit_error=True)

gemini-code-assist · 2025-10-30T23:23:44Z

doc/source/data/doc_code/working-with-llms/minimal_quickstart.py

+config = vLLMEngineProcessorConfig(
+    model_source="unsloth/Llama-3.1-8B-Instruct",
+    concurrency=1,  # 1 vLLM engine replica
+    batch_size=32,  # 32 samples per batch


A batch_size of 32 might be too large for some GPUs when running an 8B model, potentially leading to out-of-memory errors. For a quickstart example, it's safer to start with a smaller batch size, for example 16, and let users increase it if their hardware allows.

Suggested change

batch_size=32, # 32 samples per batch

batch_size=16, # 16 samples per batch

richardliaw · 2025-10-31T00:47:17Z

doc/source/data/working-with-llms.rst

+
+The processor expects input rows with a ``prompt`` field and outputs rows with both ``prompt`` and ``response`` fields. You can consume results using ``iter_rows()``, ``take()``, ``show()``, or save to files with ``write_parquet()``.
+
+For more configuration options and advanced features, see the sections below.
+
 .. _batch_inference_llm:


Can you also deduplicate the content from the section below?

https://anyscale-ray--58330.com.readthedocs.build/en/58330/data/working-with-llms.html#perform-batch-inference-with-llms

Feel like there's some redundancy, like the installation and basic explanation of the configuration.

done + sglang engine pointer added

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

simplify / add ray data.llm quickstart example

dafd0de

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

nrghosh requested a review from a team as a code owner October 30, 2025 23:21

nrghosh requested review from a team and richardliaw October 30, 2025 23:21

This comment was marked as outdated.

Sign in to view

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

richardliaw reviewed Oct 31, 2025

View reviewed changes

ray-gardener bot added docs An issue or change related to documentation data Ray Data-related issues llm labels Oct 31, 2025

wip - clean up, add sglang engine proc config to docs

2ccbd9c

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

kouroshHakha approved these changes Nov 3, 2025

View reviewed changes

kouroshHakha added the go add ONLY when ready to merge, run all tests label Nov 3, 2025

kouroshHakha enabled auto-merge (squash) November 3, 2025 23:26

nrghosh requested review from a team and angelinalg November 4, 2025 00:30

richardliaw approved these changes Nov 7, 2025

View reviewed changes

kouroshHakha merged commit ad838a3 into ray-project:master Nov 7, 2025
8 checks passed

YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025

[docs][data][llm] simplify / add ray data.llm quickstart example (ray…

4fde86f

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025

[docs][data][llm] simplify / add ray data.llm quickstart example (ray…

09e51f7

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025

[docs][data][llm] simplify / add ray data.llm quickstart example (ray…

5182eb5

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>

ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025

[docs][data][llm] simplify / add ray data.llm quickstart example (ray…

0f1193a

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>

SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025

[docs][data][llm] simplify / add ray data.llm quickstart example (ray…

9905feb

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>

peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026

[docs][data][llm] simplify / add ray data.llm quickstart example (ray…

563d166

…-project#58330) Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com> Signed-off-by: peterxcli <peterxcli@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs][data.llm] simplify / add ray data.llm quickstart example#58330

[docs][data.llm] simplify / add ray data.llm quickstart example#58330
kouroshHakha merged 2 commits intoray-project:masterfrom
nrghosh:nrghosh/vllm-data-quickstart

nrghosh commented Oct 30, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Uh oh!

gemini-code-assist bot Oct 30, 2025

Uh oh!

richardliaw Oct 31, 2025

Uh oh!

nrghosh Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	batch_size=32, # 32 samples per batch
	batch_size=16, # 16 samples per batch

Conversation

nrghosh commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

richardliaw Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

nrghosh Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nrghosh commented Oct 30, 2025 •

edited

Loading